[Glass] Should I be worried about symbols lookup performance?

Mariano Martinez Peck via Glass glass at lists.gemtalksystems.com
Tue Mar 31 10:57:49 PDT 2015


On Tue, Mar 31, 2015 at 2:20 PM, Dale Henrichs via Glass <
glass at lists.gemtalksystems.com> wrote:

>  Mariano,
>
> I think that you are headed in the right direction. Canonicalizing the
> unique id is the right approach and we do a pretty good job with Symbols.
> Talking with engineers here, you can expect pretty good performance from
> Symbols up to around 10 million Symbols ... then you might want to take a
> different approach.
>

Excellent! Yes, indeed, I got the same impression that the symbol lookup
was really fast.  Thanks for checking out with the engineers too.



>
> That different approach would involve using a StringKeyValueDictionary and
> bumping up the number of collision buckets beyond that done for Symbols
> (which has a fixed number of collision buckets). The downside of managing a
> StringKeyValueDictionary is that you'd have to worry about conflicts when
> multiple gems encounter the same id...
>

Ok, good to know. So...I think I will go with the symbols approach and
every in a while check how many symbol instances I have.


>
> You don't see the Symbol table, because the AllSymbols dictionary is
> managed by a separate gem to provide conflict free canonicalization of
> Symbols ... The AllSymbols dictionary is in the SPC and there are
> optimizations that let us add new Symbols very efficiently without worry
> about conflicts.
>

Ok, thanks for the explanation.


>
> In 3.3 we will be providing a canonicalization framework that would
> provide support for doing your own conflict  canonicalization at which time
> you would switch to the StringkeyValueDictionary approach...
>

That would be nice because I would like to take a similar approach with
dates.... I have TONS of equal dates (spread in many different collections)
that I would not care to manage them as #==  ...

BTW...I thought a regular Date would fit as immediate object but
re-watching James video about immediate objects, it doesn't seem the case.
Wouldn't it be interesting a SmallDate (if necessary) which would be
immutable and immediate? may this be useful? most financial apps have tons
of dates...


>
> Dale
>
>
> On 03/31/2015 08:32 AM, Mariano Martinez Peck via Glass wrote:
>
> Hi guys,
>
>  I am storing a lot of data internally in GemStone that I get from a
> third party lib. Right now I have 1MM objects but likely soon there will
> some more MM for sure. These objects are kind of "rows" from huge files I
> receive from this lib as if it were a "database". Anyway, these objects
> have a string code. These string code WON'T fit in 61 bits or so, so these
> will NOT be immediate objects.  However...each code is repeated in average
> 20 times. So..in 1MM rows, I could be storing 1MM strings, or... 50000
> symbols....
>
>  I expect this imported data to increase and increase everyday. In fact I
> am not even 100% sure the best solution is to store this inside GemStone,
> but that's a story for another day.
>
>  My question is...if I use symbols I will be saving (I guess) a lot of
> space, reducing a lot the number of objects, and likely the number of
> objects needed in memory (hence I hope I will need less memory/spc).   My
> only  worry is about the symbol lookup performance. I don't know how the
> "Symbol table" is implemented in GemStone.   From what I understand, when I
> CREATE a symbol I pay the lookup in the table but then my object reference
> that points to the new symbol will directly point to the symbol and not to
> the entry in the symbol table...so I don't have a new indirection each time
> I try to access my symbol instance, right?
>
>  However...if I get a very large table of symbols, I am afraid that ever
> single #asSymbol I do in my app (from any other use case..fully decouple
> from this one) would be slowed down.
> I cannot find AllSymbols dictionary and the deeper I could get to
> understand was #_existingWithAll:
>
>  I tried to do some bench and it seems the #asSymbol is still fast even
> with a much larger SymbolTable (or whatever GemStone equivalent).
>
>  This may be a tradeoff, but my gut feelings tell me that storing these
> as symbols is worth.
>
>  BTW... I guess I must put this to true: STN_SYMBOL_GC_ENABLED  also..
>
>  Do you have any suggestion or recommendation?
>
>  Thanks in advance,
>
>
>  --
> Mariano
> http://marianopeck.wordpress.com
>
>
> _______________________________________________
> Glass mailing listGlass at lists.gemtalksystems.comhttp://lists.gemtalksystems.com/mailman/listinfo/glass
>
>
>
> _______________________________________________
> Glass mailing list
> Glass at lists.gemtalksystems.com
> http://lists.gemtalksystems.com/mailman/listinfo/glass
>
>


-- 
Mariano
http://marianopeck.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/glass/attachments/20150331/67e8adc1/attachment.html>


More information about the Glass mailing list