[Glass] Large collection and common practice

Smalltalk via Glass glass at lists.gemtalksystems.com
Wed Jan 4 07:48:57 PST 2017


Dale,

Thanks for the answer.

Well, i will do more research on this scenario and then i will know what 
to do :)

thanks again

regards

bruno


El 04/01/2017 a las 12:38, Dale Henrichs escribió:
> Bruno,
>
> Again, I don't think there is a simple answer:)
>
> For an RcKeyValueDictionary the dictionary is rebuilt when any one 
> collision bucket is larger than 1/4 of the total size of the 
> dictionary ... so the rebuild frequency is function of the 
> distribution of the hash values of the keys as well as the number of 
> entries in the dictionary.
>
> As the size of the dictionary grows, the frequency of rebuilds will go 
> down, but the potential for having some very large collision bucket 
> goes up (depending upon the randomness of the hash function for the 
> key class). The RcKeyValueDictionary does a linear search through a 
> collision bucket, so large collision buckets can lead to slow query 
> times ...
>
> Another factor is of course the initial size of the dictionary ... if 
> you start with a dictionary sized to the expected number of elements 
> (and your hash function gives a good distribution) then you will not 
> encounter the "rebuild problem"
>
> I think the best approach is to run some scaling experiments yourself 
> with what you consider to be large collections ... the dictionary 
> rebuild time is roughly equivalent to the time it takes to build the 
> dictionary from scratch sizing the initial dictionary. Take a look at 
> the KeyValueDictionary>>statistics method to see if you are getting a 
> good distribution of entries and run some queries for the elements in 
> the largest collision buckets ...
>
> If you are not seeing any unreasonable performance then you are good 
> to go ...
>
> Dale
>
>
> On 1/4/17 6:23 AM, Smalltalk wrote:
>> Dale,
>>
>> Is there any measure of from what size a Dictionary could have this 
>> problem ? Maybe it also depends in the length of the key...
>>
>> (if the problem arise from 1.000.000 entries in a dictionary --> is 
>> not a problem for me right now)
>>
>> The keys of my Dictionary is an UUID like: 
>> 93b1205cec32b42993b9382a9b0d89046fd937c8
>>
>> At this stage i think i will use 2 Rc collections (one dictionary + 
>> one bag) in the future maybe this approach has to be changed...
>>
>> For now i will keep it simple !
>>
>> regards
>>
>> bruno
>>
>> El 03/01/2017 a las 17:46, Dale Henrichs escribió:
>>>
>>> As the size of the collections gets very large, you have to keep in 
>>> mind that Dictionary-based structures have to be rebuilt 
>>> periodically to keep the collision bucket size manageable and some 
>>> of the dictionaries like RcKeyValueDictionary will rebuild 
>>> automatically on insertion and depending upon the size of the 
>>> dictionary that could lead to long and unpredictable delays for end 
>>> users ... the btree structures used in GemStone indexes do splits 
>>> and merges on individual leaf nodes limiting the cost of insertions ... 
>>
>>
>> ---
>> El software de antivirus Avast ha analizado este correo electrónico 
>> en busca de virus.
>> https://www.avast.com/antivirus
>>
>
>


---
El software de antivirus Avast ha analizado este correo electrónico en busca de virus.
https://www.avast.com/antivirus



More information about the Glass mailing list