[Glass] Large collection and common practice

Paul Baumann via Glass glass at lists.gemtalksystems.com
Tue Jan 3 15:33:14 PST 2017


Hi Bruno,

Why you might not want to keep objects forever in an RC collection:

RC collections have more overhead. Many are implemented with
session-specific sub-structures that can be modified by the session with
little risk of conflict (the rare conflict is when the RC collection itself
changes). Consider if you have an RC collection that was populated from 100
separate sessions and then the query that the RC collection would need to
do to see if an object/key exists in the RC collection. RC collections are
well implemented and reasonably efficient, they just aren't as efficient at
some operations (like lookup) as some non-RC collections (otherwise all
would be implemented to have RC behavior). You'll find that RC collections
have sometimes unexpected growth and shrinkage behavior. Some grow large
session-specific subcollections that may never be cleaned up unless there
is at least one removal. Some grow in inopportune moments that can affect
time-sensitive operations. I'm not saying that you shouldn't use the RC
collections for root collections, it depends on your application needs.

Regarding indexing for many attributes:

Sounds like you want to create indexes on a common collection like
OrderedCollection that only one session in in charge of updating. I know
that GemTalk had improved their indexing implementation several years back,
but some kinds of practical issues likely still remain. A field index
updates some underlying structure that might also be updated from changes
to other objects by other sessions, updates to indexes used to cause many
commit conflicts. The more indexes a collection has, the higher the odds of
commit conflict. Applications that I've worked on for the past decade or so
didn't use collection indexes like you are about to do. An application that
used a lot of indexes also had some custom code to save and replay changes
to domain objects to compensate for unpredictable commit failures. It is
from experiences like that that the queue-manager approach became useful
despite all the cross-session coordination.

I'd probably implement a query kind of object that wraps that collection to
support collection-specific queries and maintenance operations. The OC (or
whatever you use) would normally be private to the query object. The query
object could even have special behavior for avoiding commit conflicts (like
locking or queueing for example). The query object might for even be clever
enough to do a private/internal RC queue when your application code detects
conflict is possible (like from use of locks). The queue object would
manage the internal RC collection as practical.

You might think of making that query object a subclass of Collection but
any GBS users out there should beware that there would be replication bugs
(I'd reported the bug with workaround code to GemTalk many years ago). I
doubt you'd be doing replication of something like this even if you used
GBS, but just saying there was is a bit of strangeness to be discovered at
the basic/private/primitive levels and unfortunately it means that caution
applies to user-defined subclasses of Collection.

I'm not suggesting you do this, but it is an option. In the time that
indexing was not reliable I'd once resorted to creating my own
application-specific indexes. That query object that I just mentioned could
also have private dictionary instances that can quickly resolve specific
keys (attributes of the objects). The query object has the overhead of also
maintaining the private attribute-key dictionaries as object are added and
removed. I could go into how I implemented these application-defined
indexes even without the query object wrapping it, but no need because you
have good GemTalk supported indexes now anyway.

I've presented ideas more complicated than you'll need, hopefully an
awareness of potential issues and past remedies will save you some effort.

Regards,

Paul Baumann


On Tue, Jan 3, 2017 at 5:04 PM, Smalltalk <smalltalk at adinet.com.uy> wrote:

> Paul,
>
> Thanks for your answer ...
>
> /*  you don't always want to keep the objects in the RC collection
>
> Why you don't always want to keep the objects in the RC collection ?
> This is what i'm doing right now :(  - RcKeyValueDictionary
>
> Thanks for the technique you are explaining.
>
> For now i will keep as simple as i can :) may be in the future (next year)
> i can do something like that but i need to do much more research :)
>
> /* I wonder what kind of indexing you would need besides ID. If you don't
> need to query for anything other than ID then a dictionary would be fine
> with the ID as key.
> This project/system implement a persistence layer (using rest services)
> for a Java Application (www.orbeon.com) which is used to design, publish,
> save and query web forms.
> (https://github.com/brunobuzzi/OrbeonPersistenceLayer/)
>
> When designing/publishing/sending/saving form --> the ID is mostly used.
> Then you have the Summary page. That display all form instances (saved and
> sent forms) of some form definition.
> Here the user can search by a particular field of the defined form.
> A search can be by N different fields depending on the form definition
> (the definition could be a form with 200 nested fields and sections or
> whatever).
>
> In this case indexes are very useful but in the previous cases a
> Dictionary is more suitable using the id (that after assigned is immutable)
>
> regards,
> bruno
>
>
> El 03/01/2017 a las 17:38, Paul Baumann escribió:
>
> Hi Bruno,
>
> Multiple sessions can feed an RC collection with reduced commit conflicts,
> but you don't always want to keep the objects in the RC collection. One
> common technique is to have a manager session dedicated to moving objects
> from RC collections into collections that can be accessed more efficiently.
> Design so that the manager is the only session that will be updating the
> collections (so that commit conflicts will not happen). The manager session
> can do polling for new items and you can add gem-to-gem signaling to wake
> the manager for more timely responses. The challenges with this kind of
> design are related to update timing between sessions. The process involves
> a commit to add to the RC collection, an abort for the manager session to
> see the objects, a commit by the manager to update the root collection
> (with RC collection removal, RcQueues are usually used BTW), and an abort
> by the original session if it needs to see the indexed item was added to
> the root collection. If there is timing sensitivity with this data then
> you'll likely resort to searching first in your indexeded collection and
> then also reviewing objects still in the queue waiting for the manager to
> process them.
>
> A variation of the manager session technique is to send data to the
> manager session without doing a commit, this might be through communication
> between gems or by using session-specific file updates that the manager gem
> reads. Gem-to-gem signaling can be added to this approach later too if you
> need to improve timing. This variation can avoid the intermediate commit,
> but you'd still may need to #continueTransaction to see what the manager
> session updated.
>
> I wonder what kind of indexing you would need besides ID. If you don't
> need to query for anything other than ID then a dictionary would be fine
> with the ID as key. A dictionary can even use a key that is a custom object
> that redefines equality and hash from attributes of what is searched for.
> Merkle tree hashes might also be used as a way to test if some attribute is
> contained, but that is a bit advanced to go into. Another advanced item
> that I once implemented was a custom Dictionary where the key was derived
> from the value by behavior (it was more efficient because it avoided the
> cost of Association creation). So many cool tricks, I loved working with
> GS/S.
>
> Paul Baumann
>
>
>
> On Tue, Jan 3, 2017 at 2:39 PM, Mariano Martinez Peck via Glass <
> glass at lists.gemtalksystems.com> wrote:
>
>>
>> On Tue, Jan 3, 2017 at 3:14 PM, BrunoBB via Glass <
>> glass at lists.gemtalksystems.com> wrote:
>>
>>> Hi All,
>>>
>>> I have a lot RcKeyValueDictionary where the key is the id of the object
>>> and
>>> the value is the object itself.
>>> This id once assigned it does NOT change, so far so good :)
>>>
>>> The RcKeyValueDictionary is used intensively to add and remove objects
>>> (in
>>> my case OrbeonFormInstance). The dictionary is very useful because the
>>> key
>>> is always given as parameter.
>>>
>>> Also there are searchs by specific inst var of OrbeonFormInstance class
>>> (like username,group, createdTime and so on).
>>>
>>> My problem is that i can NOT create an index on aRcKeyValueDictionary.
>>> So which is the commom practice in these cases:
>>> 1- Change the RcKeyValueDictionary to be an UnorderedCollection ?
>>> 2- Add a new instance variable to the class that holds the
>>> RcKeyValueDictionary and this new variable to be anUnorderedCollection ?
>>>
>>> 1) This will complicate my direct searchs using the ID.
>>> 2) Extra computation when adding and removing objects (now there 2
>>> collections to maintain)
>>>
>>> The general question will be something like:
>>> When Dictionaries are very suitable to store large quantity of objects
>>> but
>>> indexes are also needed which solution should be implemented ?
>>>
>>
>>
>> Assuming you do need or get benefits from the RC flavor (else it brings
>> unnecessary overhead), then quickly analyzing the situation (until GemStone
>> have indexed and rc-flavor Dictionary impl), I think I would use a
>> RcIdentityBag. I would create a identity index for #id , and yes, you will
>> have to modify your code that access the dict, to know detect on the
>> collection using the identity index of ID.
>>
>> But...I am sure someone more experienced will come with a better approach!
>>
>> Cheers,
>>
>>
>>
>>> regards
>>> bruno
>>>
>>>
>>>
>>> --
>>> View this message in context: http://forum.world.st/Large-co
>>> llection-and-common-practice-tp4928607.html
>>> Sent from the GLASS mailing list archive at Nabble.com.
>>> _______________________________________________
>>> Glass mailing list
>>> Glass at lists.gemtalksystems.com
>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>>
>>
>>
>>
>> --
>> Mariano
>> http://marianopeck.wordpress.com
>>
>> _______________________________________________
>> Glass mailing list
>> Glass at lists.gemtalksystems.com
>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>
>>
>
>
>
> ------------------------------
> [image: Avast logo]
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>
> El software de antivirus Avast ha analizado este correo electrónico en
> busca de virus.
> www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/glass/attachments/20170103/e6865df3/attachment-0001.html>


More information about the Glass mailing list