[Glass] Large collection and common practice

Dale Henrichs via Glass glass at lists.gemtalksystems.com
Tue Jan 3 12:46:01 PST 2017



On 01/03/2017 10:14 AM, BrunoBB via Glass wrote:
> Hi All,
>
> I have a lot RcKeyValueDictionary where the key is the id of the object and
> the value is the object itself.
> This id once assigned it does NOT change, so far so good :)
>
> The RcKeyValueDictionary is used intensively to add and remove objects (in
> my case OrbeonFormInstance). The dictionary is very useful because the key
> is always given as parameter.
>
> Also there are searchs by specific inst var of OrbeonFormInstance class
> (like username,group, createdTime and so on).
>
> My problem is that i can NOT create an index on aRcKeyValueDictionary.
> So which is the commom practice in these cases:
> 1- Change the RcKeyValueDictionary to be an UnorderedCollection ?
> 2- Add a new instance variable to the class that holds the
> RcKeyValueDictionary and this new variable to be anUnorderedCollection ?
>
> 1) This will complicate my direct searchs using the ID.
> 2) Extra computation when adding and removing objects (now there 2
> collections to maintain)
>
> The general question will be something like:
> When Dictionaries are very suitable to store large quantity of objects but
> indexes are also needed which solution should be implemented ?
>
Bruno,

The general answer to your general question is that if you start out 
using a dictionary for lookups of a single field in an object and then 
get to the point where you are interested doing queries against multiple 
fields in your object _REPLACING_ your dictionary with an indexed 
collection starts to make sense.

You can create identity indexes on the fields that are identity-based 
like your id field or group (assuming groups are identified by a Symbol 
or specific instances of a set of group objects) and use equality 
indexes for fields where you cannot use identity (username) where you 
are interested in doing queries that involve ranges of values 
(createdTime).

If the indexed collection is subject to concurrent additions/deletions, 
then you should use an RcIdentityBag. If the objects themselves are 
subject to concurrent updates to indexed fields, then you can create 
indexes using the `reducedConflict` option.

To do identity-based lookups you would not quite have the convenience of 
using `dictionary at: id` as you would need to create a GsQuery that in 
it's simplest form would look like (assuming indexedCollection and id 
set appropriately):

   ('each.id == id' asQueryOn: indexedCollection)
     bind: 'id' to: id
     queryResult.

The inconvenience of the query would be offset by the fact that you 
would still have only one collection to maintain (the RcIdentityBag) and 
using indexes means that if any or your fields are changed, the indexes 
are automatically updated ...

You can cache a GsQuery instance to avoid the overhead of parsing the 
query on every invocation and you can use a Smalltalk API for creating a 
GsQuery to avoid the complication of creating a string representation of 
queries.

If you are looking for maximum query and update performance, you might 
find that custom collections or object structures might well perform 
better than using indexes, but the speed advantages have to be offset by 
the complexity of maintaining these custom collections. Custom 
collections or object structures come into play if you know ahead of 
time the types of queries that you want to run ...

As the size of the collections gets very large, you have to keep in mind 
that Dictionary-based structures have to be rebuilt periodically to keep 
the collision bucket size manageable and some of the dictionaries like 
RcKeyValueDictionary will rebuild automatically on insertion and 
depending upon the size of the dictionary that could lead to long and 
unpredictable delays for end users ... the btree structures used in 
GemStone indexes do splits and merges on individual leaf nodes limiting 
the cost of insertions ...

I am afraid that there is no simple answer ...

Dale


More information about the Glass mailing list