[Glass] Cost of includesKey in RcKeyValueDictionary

Martin McClure martin.mcclure at gemtalksystems.com
Wed Feb 26 15:36:28 PST 2020


Hi Bruno,

What does the ID of a B object get used for? Is is needed to look up a B 
object in the A object?

If so, I think using a random SmallInteger key is a reasonable approach, 
likely less trouble than other possibilities. A UUID does not fit in a 
SmallInteger and would take more space, and though a UUID could reduce 
the probability of clashes to the point that you didn't need to test for 
them, it might well be slower than using a SmallInteger and testing for 
clashes.

Lookups with #includesKey: are fairly fast. I don't know what kind of 
speed you need, but using a variant of your test code (included below) I 
see the new keys being added at a rate of about 60,000 per second on an 
8-year-old not very fast desktop machine I have at home. I'm using more 
of the SmallInteger range by using a 56-bit random ID. The increase from 
a 32-bit ID has very little impact on speed, in my quick testing.

Regards,
-Martin

| numbers instances r int milli1 milli2 clashes|
numbers := KeyValueDictionary new.
r := Lag1MwcRandom new.
milli1 := System millisecondsToRun: [
instances := 900000.
1 to: instances do: [:each | int := r integer + ((r integer bitAnd: 
16rFFFFFF) bitShift: 32).
                 numbers at: int put: int]. "already generated collection"].

clashes := 0.
milli2 := System millisecondsToRun: [ "new ids"
1 to: 10000 do: [:each | int := ((r integer bitAnd: 16rFFFFFF) bitShift: 
32).
                 [numbers includesKey: int] whileTrue: [clashes := 
clashes + 1. int :=
int +1].
                 numbers at: int put: int].
].
^ Array with: clashes with: milli1 with: milli2



On 2/26/20 1:50 PM, BrunoBB via Glass wrote:
> Hi,
>
> A domain object (A) has a RcKeyValueDictionary to store large quantity of
> (B) objects.
>
> B objects has a random id generated with:
> id := Lag1MwcRandom new integer. "i think the max number is 4294967295"
>
> The problem here is that with 100.000 (B) objects the probability of id
> clash is pretty high.
>
> It can be fixed with #includesKey: (inside a loop :( ) but which is the cost
> of sending #includesKey: in RcKeyValueDictionary ?
>
> It all reduce to #_at: but is a primitive. As far as i remember in a Martin
> McClure presentation about Hash functions the cost should NOT be to much.
> But just want to confirm this :)
>
> Maybe a better solution it will be to use persistent counters ?
> Or maybe a UUID  ? (which has the same problem but a lower clash
> probability)
>
> regards,
> bruno
>
> PS:
> At first it seems that the loop is NOT a good idea but the collections of
> ids will be already generated and the #includesKey: does not take to long.
>
> | numbers instances r int milli clashes|
> numbers := Dictionary new.
> r := Lag1MwcRandom new.
> instances := 900000.
> 1 to: instances do: [:each | int := r integer.
> 				numbers at: int put: int]. "already generated collection"
>
> clashes := 0.
> milli := Time millisecondsToRun: [ "new ids"
> 1 to: 10000 do: [:each | int := r integer.
> 				[numbers includesKey: int] whileTrue: [clashes := clashes + 1. int :=
> int +1].
> 				numbers at: int put: int].
> ].
> Array with: clashes with: milli.
>
>
>
>
> --
> Sent from: http://forum.world.st/GLASS-f1460844.html
> _______________________________________________
> Glass mailing list
> Glass at lists.gemtalksystems.com
> https://lists.gemtalksystems.com/mailman/listinfo/glass



More information about the Glass mailing list