[Glass] How to manage large amounts of data ...

Dale Henrichs dale.henrichs at gemtalksystems.com
Thu Aug 14 10:12:54 PDT 2014


Marten,

I inadvertently sent you private email, correcting my previous post... I
shouldn't hit send on an empty stomach while getting ready for a plane trip
in a couple of hours:), but even that mail was not quite correct

Obviously I missed the bit about having string keys for the UUIDs and I
don't know what I was thinking about with the IdentitySet....

The fastest String indexes (using "basic" classes) use the first 12
characters of the string ... the first 12 characters are encoded as an
integer, so objects are not faulted in while scanning the btrees during the
#= test. If the Strings are longer than 12 characters the first 12 is used
as a locator and then message sends (and object faulting) is used (beyond
12 characters performance falls off especially if the Strings are common in
the first 12 characters)... The effective "collision bucket" size for a
basic index is 500 so a String index with keys less than or equal to 12
characters in length can outperform a StringKeyValueDictionary with
equivalent-sized collision buckets on the basis of avoiding object faults
... No rebuilding is required with btrees (they are balanced on the fly)
while the dictionary will require full rebuilds to manage the size of
collision buckets. Beyond 12 character keys, the StringKeyValueDictionary
takes the lead and wins running away:)

If you can convert your key to a unique SmallInteger, you would compare the
performance of a SmallInteger based index and an IdentityKeyValue
dictionary ... you would not have a range limit like the String-based index
and you would not have the faulting penalty in the IdentityKeyValue
dictionary. so the difference will come down to collision bucket management
and for the IdentityKeyValue case if you can predict your maximum
SmallInteger range of values, you should be able to decide how big you want
your collision buckets to be and I would think that the
IdentityKeyValueDictionary would have the advantage.

Dale


On Thu, Aug 14, 2014 at 8:55 AM, Dale Henrichs <
dale.henrichs at gemtalksystems.com> wrote:

> Marten,
>
> I would say that Jame's suggestion is correct ... a dictionary is used for
> identity-based indexes .
>
> You probably should use an IdentityKeyValueDictionary for quick access,
> but you will want to keep an eye on the size of the collision buckets in
> the dictionary ... when you hit the collision bucket limit on an at:put:
> the dictionary is immediately rebuilt, which can cause an unreasonable
> delay ... with a large dictionary you would probably want to rebuild the
> dictionary during off hours ...
>
> IdentitySet would be an even better bet (no collision buckets combined
> with quick identity based lookups) but to be able to use the UUID for
> lookup you'd need to store the uuid in the identitySet ... but if you could
> arrange for the uuid to reference the object directly an identity set would
> work ...
>
> Dale
>
>
> On Thu, Aug 14, 2014 at 7:01 AM, itlists at schrievkrom.de <
> itlists at schrievkrom.de> wrote:
>
>> Assuming I have a large number of objects with a unique key attribute
>> (uuid based).
>>
>> Normally I use an instance of class Dictionary (or perhaps better:
>> StringKeyValueDictionary) to store instances of these objects. The
>> initial finding access to these objects are mostly done via its unique
>> key attribute.
>>
>> So the access to an instance was pretty fast:
>>
>>  ^aDictionary at: aKey ifAbsent: [ nil ]
>>
>> But what happens, if this dictionary is getting very large (say: a
>> million of entries) ?
>>
>> Is is better to have a different approach then: e.g. a collection with a
>> defined index on that unique attribute ?
>>
>>
>> Thanks for answering ...
>>
>> Marten
>>
>>
>> --
>> Marten Feldtmann
>> _______________________________________________
>> Glass mailing list
>> Glass at lists.gemtalksystems.com
>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/glass/attachments/20140814/ce321bf8/attachment-0001.html>


More information about the Glass mailing list