[Glass] Large collection and common practice
Dale Henrichs via Glass
glass at lists.gemtalksystems.com
Wed Jan 4 07:38:21 PST 2017
Bruno,
Again, I don't think there is a simple answer:)
For an RcKeyValueDictionary the dictionary is rebuilt when any one
collision bucket is larger than 1/4 of the total size of the dictionary
... so the rebuild frequency is function of the distribution of the hash
values of the keys as well as the number of entries in the dictionary.
As the size of the dictionary grows, the frequency of rebuilds will go
down, but the potential for having some very large collision bucket goes
up (depending upon the randomness of the hash function for the key
class). The RcKeyValueDictionary does a linear search through a
collision bucket, so large collision buckets can lead to slow query
times ...
Another factor is of course the initial size of the dictionary ... if
you start with a dictionary sized to the expected number of elements
(and your hash function gives a good distribution) then you will not
encounter the "rebuild problem"
I think the best approach is to run some scaling experiments yourself
with what you consider to be large collections ... the dictionary
rebuild time is roughly equivalent to the time it takes to build the
dictionary from scratch sizing the initial dictionary. Take a look at
the KeyValueDictionary>>statistics method to see if you are getting a
good distribution of entries and run some queries for the elements in
the largest collision buckets ...
If you are not seeing any unreasonable performance then you are good to
go ...
Dale
On 1/4/17 6:23 AM, Smalltalk wrote:
> Dale,
>
> Is there any measure of from what size a Dictionary could have this
> problem ? Maybe it also depends in the length of the key...
>
> (if the problem arise from 1.000.000 entries in a dictionary --> is
> not a problem for me right now)
>
> The keys of my Dictionary is an UUID like:
> 93b1205cec32b42993b9382a9b0d89046fd937c8
>
> At this stage i think i will use 2 Rc collections (one dictionary +
> one bag) in the future maybe this approach has to be changed...
>
> For now i will keep it simple !
>
> regards
>
> bruno
>
> El 03/01/2017 a las 17:46, Dale Henrichs escribió:
>>
>> As the size of the collections gets very large, you have to keep in
>> mind that Dictionary-based structures have to be rebuilt periodically
>> to keep the collision bucket size manageable and some of the
>> dictionaries like RcKeyValueDictionary will rebuild automatically on
>> insertion and depending upon the size of the dictionary that could
>> lead to long and unpredictable delays for end users ... the btree
>> structures used in GemStone indexes do splits and merges on
>> individual leaf nodes limiting the cost of insertions ...
>
>
> ---
> El software de antivirus Avast ha analizado este correo electrónico en
> busca de virus.
> https://www.avast.com/antivirus
>
More information about the Glass
mailing list