[Glass] Large collection and common practice
Dale Henrichs via Glass
glass at lists.gemtalksystems.com
Tue Jan 3 12:46:01 PST 2017
On 01/03/2017 10:14 AM, BrunoBB via Glass wrote:
> Hi All,
>
> I have a lot RcKeyValueDictionary where the key is the id of the object and
> the value is the object itself.
> This id once assigned it does NOT change, so far so good :)
>
> The RcKeyValueDictionary is used intensively to add and remove objects (in
> my case OrbeonFormInstance). The dictionary is very useful because the key
> is always given as parameter.
>
> Also there are searchs by specific inst var of OrbeonFormInstance class
> (like username,group, createdTime and so on).
>
> My problem is that i can NOT create an index on aRcKeyValueDictionary.
> So which is the commom practice in these cases:
> 1- Change the RcKeyValueDictionary to be an UnorderedCollection ?
> 2- Add a new instance variable to the class that holds the
> RcKeyValueDictionary and this new variable to be anUnorderedCollection ?
>
> 1) This will complicate my direct searchs using the ID.
> 2) Extra computation when adding and removing objects (now there 2
> collections to maintain)
>
> The general question will be something like:
> When Dictionaries are very suitable to store large quantity of objects but
> indexes are also needed which solution should be implemented ?
>
Bruno,
The general answer to your general question is that if you start out
using a dictionary for lookups of a single field in an object and then
get to the point where you are interested doing queries against multiple
fields in your object _REPLACING_ your dictionary with an indexed
collection starts to make sense.
You can create identity indexes on the fields that are identity-based
like your id field or group (assuming groups are identified by a Symbol
or specific instances of a set of group objects) and use equality
indexes for fields where you cannot use identity (username) where you
are interested in doing queries that involve ranges of values
(createdTime).
If the indexed collection is subject to concurrent additions/deletions,
then you should use an RcIdentityBag. If the objects themselves are
subject to concurrent updates to indexed fields, then you can create
indexes using the `reducedConflict` option.
To do identity-based lookups you would not quite have the convenience of
using `dictionary at: id` as you would need to create a GsQuery that in
it's simplest form would look like (assuming indexedCollection and id
set appropriately):
('each.id == id' asQueryOn: indexedCollection)
bind: 'id' to: id
queryResult.
The inconvenience of the query would be offset by the fact that you
would still have only one collection to maintain (the RcIdentityBag) and
using indexes means that if any or your fields are changed, the indexes
are automatically updated ...
You can cache a GsQuery instance to avoid the overhead of parsing the
query on every invocation and you can use a Smalltalk API for creating a
GsQuery to avoid the complication of creating a string representation of
queries.
If you are looking for maximum query and update performance, you might
find that custom collections or object structures might well perform
better than using indexes, but the speed advantages have to be offset by
the complexity of maintaining these custom collections. Custom
collections or object structures come into play if you know ahead of
time the types of queries that you want to run ...
As the size of the collections gets very large, you have to keep in mind
that Dictionary-based structures have to be rebuilt periodically to keep
the collision bucket size manageable and some of the dictionaries like
RcKeyValueDictionary will rebuild automatically on insertion and
depending upon the size of the dictionary that could lead to long and
unpredictable delays for end users ... the btree structures used in
GemStone indexes do splits and merges on individual leaf nodes limiting
the cost of insertions ...
I am afraid that there is no simple answer ...
Dale
More information about the Glass
mailing list