[Glass] Large collection and common practice

Dale Henrichs via Glass glass at lists.gemtalksystems.com
Wed Jan 4 07:38:21 PST 2017


Bruno,

Again, I don't think there is a simple answer:)

For an RcKeyValueDictionary the dictionary is rebuilt when any one 
collision bucket is larger than 1/4 of the total size of the dictionary 
... so the rebuild frequency is function of the distribution of the hash 
values of the keys as well as the number of entries in the dictionary.

As the size of the dictionary grows, the frequency of rebuilds will go 
down, but the potential for having some very large collision bucket goes 
up (depending upon the randomness of the hash function for the key 
class). The RcKeyValueDictionary does a linear search through a 
collision bucket, so large collision buckets can lead to slow query 
times ...

Another factor is of course the initial size of the dictionary ... if 
you start with a dictionary sized to the expected number of elements 
(and your hash function gives a good distribution) then you will not 
encounter the "rebuild problem"

I think the best approach is to run some scaling experiments yourself 
with what you consider to be large collections ... the dictionary 
rebuild time is roughly equivalent to the time it takes to build the 
dictionary from scratch sizing the initial dictionary. Take a look at 
the KeyValueDictionary>>statistics method to see if you are getting a 
good distribution of entries and run some queries for the elements in 
the largest collision buckets ...

If you are not seeing any unreasonable performance then you are good to 
go ...

Dale


On 1/4/17 6:23 AM, Smalltalk wrote:
> Dale,
>
> Is there any measure of from what size a Dictionary could have this 
> problem ? Maybe it also depends in the length of the key...
>
> (if the problem arise from 1.000.000 entries in a dictionary --> is 
> not a problem for me right now)
>
> The keys of my Dictionary is an UUID like: 
> 93b1205cec32b42993b9382a9b0d89046fd937c8
>
> At this stage i think i will use 2 Rc collections (one dictionary + 
> one bag) in the future maybe this approach has to be changed...
>
> For now i will keep it simple !
>
> regards
>
> bruno
>
> El 03/01/2017 a las 17:46, Dale Henrichs escribió:
>>
>> As the size of the collections gets very large, you have to keep in 
>> mind that Dictionary-based structures have to be rebuilt periodically 
>> to keep the collision bucket size manageable and some of the 
>> dictionaries like RcKeyValueDictionary will rebuild automatically on 
>> insertion and depending upon the size of the dictionary that could 
>> lead to long and unpredictable delays for end users ... the btree 
>> structures used in GemStone indexes do splits and merges on 
>> individual leaf nodes limiting the cost of insertions ... 
>
>
> ---
> El software de antivirus Avast ha analizado este correo electrónico en 
> busca de virus.
> https://www.avast.com/antivirus
>



More information about the Glass mailing list