[Glass] Should I be worried about symbols lookup performance?

Dale Henrichs via Glass glass at lists.gemtalksystems.com
Tue Mar 31 10:20:46 PDT 2015


Mariano,

I think that you are headed in the right direction. Canonicalizing the 
unique id is the right approach and we do a pretty good job with 
Symbols. Talking with engineers here, you can expect pretty good 
performance from Symbols up to around 10 million Symbols ... then you 
might want to take a different approach.

That different approach would involve using a StringKeyValueDictionary 
and bumping up the number of collision buckets beyond that done for 
Symbols (which has a fixed number of collision buckets). The downside of 
managing a StringKeyValueDictionary is that you'd have to worry about 
conflicts when multiple gems encounter the same id...

You don't see the Symbol table, because the AllSymbols dictionary is 
managed by a separate gem to provide conflict free canonicalization of 
Symbols ... The AllSymbols dictionary is in the SPC and there are 
optimizations that let us add new Symbols very efficiently without worry 
about conflicts.

In 3.3 we will be providing a canonicalization framework that would 
provide support for doing your own conflict  canonicalization at which 
time you would switch to the StringkeyValueDictionary approach...

Dale

On 03/31/2015 08:32 AM, Mariano Martinez Peck via Glass wrote:
> Hi guys,
>
> I am storing a lot of data internally in GemStone that I get from a 
> third party lib. Right now I have 1MM objects but likely soon there 
> will some more MM for sure. These objects are kind of "rows" from huge 
> files I receive from this lib as if it were a "database". Anyway, 
> these objects have a string code. These string code WON'T fit in 61 
> bits or so, so these will NOT be immediate objects.  However...each 
> code is repeated in average 20 times. So..in 1MM rows, I could be 
> storing 1MM strings, or... 50000 symbols....
>
> I expect this imported data to increase and increase everyday. In fact 
> I am not even 100% sure the best solution is to store this inside 
> GemStone, but that's a story for another day.
>
> My question is...if I use symbols I will be saving (I guess) a lot of 
> space, reducing a lot the number of objects, and likely the number of 
> objects needed in memory (hence I hope I will need less memory/spc).   
> My only  worry is about the symbol lookup performance. I don't know 
> how the "Symbol table" is implemented in GemStone.   From what I 
> understand, when I CREATE a symbol I pay the lookup in the table but 
> then my object reference that points to the new symbol will directly 
> point to the symbol and not to the entry in the symbol table...so I 
> don't have a new indirection each time I try to access my symbol 
> instance, right?
>
> However...if I get a very large table of symbols, I am afraid that 
> ever single #asSymbol I do in my app (from any other use case..fully 
> decouple from this one) would be slowed down.
> I cannot find AllSymbols dictionary and the deeper I could get to 
> understand was #_existingWithAll:
>
> I tried to do some bench and it seems the #asSymbol is still fast even 
> with a much larger SymbolTable (or whatever GemStone equivalent).
>
> This may be a tradeoff, but my gut feelings tell me that storing these 
> as symbols is worth.
>
> BTW... I guess I must put this to true: STN_SYMBOL_GC_ENABLED  also..
>
> Do you have any suggestion or recommendation?
>
> Thanks in advance,
>
>
> -- 
> Mariano
> http://marianopeck.wordpress.com
>
>
> _______________________________________________
> Glass mailing list
> Glass at lists.gemtalksystems.com
> http://lists.gemtalksystems.com/mailman/listinfo/glass

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/glass/attachments/20150331/f9d52c87/attachment.html>


More information about the Glass mailing list