[Glass] Large collection and common practice

Smalltalk via Glass glass at lists.gemtalksystems.com
Tue Jan 3 15:05:00 PST 2017


Dale,

Thanks for the detailed answer :)

I think replacing the Dictionary with Rc collection will be the less 
complex option.

Although adding an additional inst var (an rc collection) it won't be 
that complex.

Since to reach 200.000 forms or more it takes time then i will implement 
more complex in the near future :)

I will analyze the problem a little more  before selecting another approach.

Thanks to all for your answers !

regards,

bruno

El 03/01/2017 a las 17:46, Dale Henrichs escribió:
>
>
> On 01/03/2017 10:14 AM, BrunoBB via Glass wrote:
>> Hi All,
>>
>> I have a lot RcKeyValueDictionary where the key is the id of the 
>> object and
>> the value is the object itself.
>> This id once assigned it does NOT change, so far so good :)
>>
>> The RcKeyValueDictionary is used intensively to add and remove 
>> objects (in
>> my case OrbeonFormInstance). The dictionary is very useful because 
>> the key
>> is always given as parameter.
>>
>> Also there are searchs by specific inst var of OrbeonFormInstance class
>> (like username,group, createdTime and so on).
>>
>> My problem is that i can NOT create an index on aRcKeyValueDictionary.
>> So which is the commom practice in these cases:
>> 1- Change the RcKeyValueDictionary to be an UnorderedCollection ?
>> 2- Add a new instance variable to the class that holds the
>> RcKeyValueDictionary and this new variable to be anUnorderedCollection ?
>>
>> 1) This will complicate my direct searchs using the ID.
>> 2) Extra computation when adding and removing objects (now there 2
>> collections to maintain)
>>
>> The general question will be something like:
>> When Dictionaries are very suitable to store large quantity of 
>> objects but
>> indexes are also needed which solution should be implemented ?
>>
> Bruno,
>
> The general answer to your general question is that if you start out 
> using a dictionary for lookups of a single field in an object and then 
> get to the point where you are interested doing queries against 
> multiple fields in your object _REPLACING_ your dictionary with an 
> indexed collection starts to make sense.
>
> You can create identity indexes on the fields that are identity-based 
> like your id field or group (assuming groups are identified by a 
> Symbol or specific instances of a set of group objects) and use 
> equality indexes for fields where you cannot use identity (username) 
> where you are interested in doing queries that involve ranges of 
> values (createdTime).
>
> If the indexed collection is subject to concurrent 
> additions/deletions, then you should use an RcIdentityBag. If the 
> objects themselves are subject to concurrent updates to indexed 
> fields, then you can create indexes using the `reducedConflict` option.
>
> To do identity-based lookups you would not quite have the convenience 
> of using `dictionary at: id` as you would need to create a GsQuery 
> that in it's simplest form would look like (assuming indexedCollection 
> and id set appropriately):
>
>   ('each.id == id' asQueryOn: indexedCollection)
>     bind: 'id' to: id
>     queryResult.
>
> The inconvenience of the query would be offset by the fact that you 
> would still have only one collection to maintain (the RcIdentityBag) 
> and using indexes means that if any or your fields are changed, the 
> indexes are automatically updated ...
>
> You can cache a GsQuery instance to avoid the overhead of parsing the 
> query on every invocation and you can use a Smalltalk API for creating 
> a GsQuery to avoid the complication of creating a string 
> representation of queries.
>
> If you are looking for maximum query and update performance, you might 
> find that custom collections or object structures might well perform 
> better than using indexes, but the speed advantages have to be offset 
> by the complexity of maintaining these custom collections. Custom 
> collections or object structures come into play if you know ahead of 
> time the types of queries that you want to run ...
>
> As the size of the collections gets very large, you have to keep in 
> mind that Dictionary-based structures have to be rebuilt periodically 
> to keep the collision bucket size manageable and some of the 
> dictionaries like RcKeyValueDictionary will rebuild automatically on 
> insertion and depending upon the size of the dictionary that could 
> lead to long and unpredictable delays for end users ... the btree 
> structures used in GemStone indexes do splits and merges on individual 
> leaf nodes limiting the cost of insertions ...
>
> I am afraid that there is no simple answer ...
>
> Dale
>


---
El software de antivirus Avast ha analizado este correo electrónico en busca de virus.
https://www.avast.com/antivirus



More information about the Glass mailing list