[Glass] Large collection and common practice

Richard Sargent via Glass glass at lists.gemtalksystems.com
Tue Jan 3 16:06:00 PST 2017


GLASS mailing list wrote
> Hi Bruno,
> 
> Why you might not want to keep objects forever in an RC collection:
> 
> RC collections have more overhead. Many are implemented with
> session-specific sub-structures that can be modified by the session with
> little risk of conflict (the rare conflict is when the RC collection
> itself
> changes). Consider if you have an RC collection that was populated from
> 100
> separate sessions and then the query that the RC collection would need to
> do to see if an object/key exists in the RC collection. RC collections are
> well implemented and reasonably efficient, they just aren't as efficient
> at
> some operations (like lookup) as some non-RC collections (otherwise all
> would be implemented to have RC behavior). You'll find that RC collections
> have sometimes unexpected growth and shrinkage behavior. Some grow large
> session-specific subcollections that may never be cleaned up unless there
> is at least one removal. Some grow in inopportune moments that can affect
> time-sensitive operations. I'm not saying that you shouldn't use the RC
> collections for root collections, it depends on your application needs.
> 
> Regarding indexing for many attributes:
> 
> Sounds like you want to create indexes on a common collection like
> OrderedCollection that only one session in in charge of updating. I know
> that GemTalk had improved their indexing implementation several years
> back,
> but some kinds of practical issues likely still remain. A field index
> updates some underlying structure that might also be updated from changes
> to other objects by other sessions, updates to indexes used to cause many
> commit conflicts. The more indexes a collection has, the higher the odds
> of
> commit conflict. Applications that I've worked on for the past decade or
> so
> didn't use collection indexes like you are about to do. An application
> that
> used a lot of indexes also had some custom code to save and replay changes
> to domain objects to compensate for unpredictable commit failures. It is
> from experiences like that that the queue-manager approach became useful
> despite all the cross-session coordination.
> 
> I'd probably implement a query kind of object that wraps that collection
> to
> support collection-specific queries and maintenance operations. The OC (or
> whatever you use) would normally be private to the query object. The query
> object could even have special behavior for avoiding commit conflicts
> (like
> locking or queueing for example). The query object might for even be
> clever
> enough to do a private/internal RC queue when your application code
> detects
> conflict is possible (like from use of locks). The queue object would
> manage the internal RC collection as practical.
> 
> You might think of making that query object a subclass of Collection but
> any GBS users out there should beware that there would be replication bugs
> (I'd reported the bug with workaround code to GemTalk many years ago).

Hi Paul,

I tried to look up the bug you describe to report what the outcome of it
was. However, I couldn't find a GBS bug using the search criteria (
'subclass', 'collection', 'replication'). If you can think of anything more
precise that would be found in the bug report and/or the approximate date
range, I can try to find it.

Thanks,
Richard


>  I
> doubt you'd be doing replication of something like this even if you used
> GBS, but just saying there was is a bit of strangeness to be discovered at
> the basic/private/primitive levels and unfortunately it means that caution
> applies to user-defined subclasses of Collection.
> 
> I'm not suggesting you do this, but it is an option. In the time that
> indexing was not reliable I'd once resorted to creating my own
> application-specific indexes. That query object that I just mentioned
> could
> also have private dictionary instances that can quickly resolve specific
> keys (attributes of the objects). The query object has the overhead of
> also
> maintaining the private attribute-key dictionaries as object are added and
> removed. I could go into how I implemented these application-defined
> indexes even without the query object wrapping it, but no need because you
> have good GemTalk supported indexes now anyway.
> 
> I've presented ideas more complicated than you'll need, hopefully an
> awareness of potential issues and past remedies will save you some effort.
> 
> Regards,
> 
> Paul Baumann
> 
> 
> On Tue, Jan 3, 2017 at 5:04 PM, Smalltalk <

> smalltalk at .com

> > wrote:
> 
>> Paul,
>>
>> Thanks for your answer ...
>>
>> /*  you don't always want to keep the objects in the RC collection
>>
>> Why you don't always want to keep the objects in the RC collection ?
>> This is what i'm doing right now :(  - RcKeyValueDictionary
>>
>> Thanks for the technique you are explaining.
>>
>> For now i will keep as simple as i can :) may be in the future (next
>> year)
>> i can do something like that but i need to do much more research :)
>>
>> /* I wonder what kind of indexing you would need besides ID. If you don't
>> need to query for anything other than ID then a dictionary would be fine
>> with the ID as key.
>> This project/system implement a persistence layer (using rest services)
>> for a Java Application (www.orbeon.com) which is used to design, publish,
>> save and query web forms.
>> (https://github.com/brunobuzzi/OrbeonPersistenceLayer/)
>>
>> When designing/publishing/sending/saving form --> the ID is mostly used.
>> Then you have the Summary page. That display all form instances (saved
>> and
>> sent forms) of some form definition.
>> Here the user can search by a particular field of the defined form.
>> A search can be by N different fields depending on the form definition
>> (the definition could be a form with 200 nested fields and sections or
>> whatever).
>>
>> In this case indexes are very useful but in the previous cases a
>> Dictionary is more suitable using the id (that after assigned is
>> immutable)
>>
>> regards,
>> bruno
>>
>>
>> El 03/01/2017 a las 17:38, Paul Baumann escribió:
>>
>> Hi Bruno,
>>
>> Multiple sessions can feed an RC collection with reduced commit
>> conflicts,
>> but you don't always want to keep the objects in the RC collection. One
>> common technique is to have a manager session dedicated to moving objects
>> from RC collections into collections that can be accessed more
>> efficiently.
>> Design so that the manager is the only session that will be updating the
>> collections (so that commit conflicts will not happen). The manager
>> session
>> can do polling for new items and you can add gem-to-gem signaling to wake
>> the manager for more timely responses. The challenges with this kind of
>> design are related to update timing between sessions. The process
>> involves
>> a commit to add to the RC collection, an abort for the manager session to
>> see the objects, a commit by the manager to update the root collection
>> (with RC collection removal, RcQueues are usually used BTW), and an abort
>> by the original session if it needs to see the indexed item was added to
>> the root collection. If there is timing sensitivity with this data then
>> you'll likely resort to searching first in your indexeded collection and
>> then also reviewing objects still in the queue waiting for the manager to
>> process them.
>>
>> A variation of the manager session technique is to send data to the
>> manager session without doing a commit, this might be through
>> communication
>> between gems or by using session-specific file updates that the manager
>> gem
>> reads. Gem-to-gem signaling can be added to this approach later too if
>> you
>> need to improve timing. This variation can avoid the intermediate commit,
>> but you'd still may need to #continueTransaction to see what the manager
>> session updated.
>>
>> I wonder what kind of indexing you would need besides ID. If you don't
>> need to query for anything other than ID then a dictionary would be fine
>> with the ID as key. A dictionary can even use a key that is a custom
>> object
>> that redefines equality and hash from attributes of what is searched for.
>> Merkle tree hashes might also be used as a way to test if some attribute
>> is
>> contained, but that is a bit advanced to go into. Another advanced item
>> that I once implemented was a custom Dictionary where the key was derived
>> from the value by behavior (it was more efficient because it avoided the
>> cost of Association creation). So many cool tricks, I loved working with
>> GS/S.
>>
>> Paul Baumann
>>
>>
>>
>> On Tue, Jan 3, 2017 at 2:39 PM, Mariano Martinez Peck via Glass <
>> 

> glass at .gemtalksystems

>> wrote:
>>
>>>
>>> On Tue, Jan 3, 2017 at 3:14 PM, BrunoBB via Glass <
>>> 

> glass at .gemtalksystems

>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I have a lot RcKeyValueDictionary where the key is the id of the object
>>>> and
>>>> the value is the object itself.
>>>> This id once assigned it does NOT change, so far so good :)
>>>>
>>>> The RcKeyValueDictionary is used intensively to add and remove objects
>>>> (in
>>>> my case OrbeonFormInstance). The dictionary is very useful because the
>>>> key
>>>> is always given as parameter.
>>>>
>>>> Also there are searchs by specific inst var of OrbeonFormInstance class
>>>> (like username,group, createdTime and so on).
>>>>
>>>> My problem is that i can NOT create an index on aRcKeyValueDictionary.
>>>> So which is the commom practice in these cases:
>>>> 1- Change the RcKeyValueDictionary to be an UnorderedCollection ?
>>>> 2- Add a new instance variable to the class that holds the
>>>> RcKeyValueDictionary and this new variable to be anUnorderedCollection
>>>> ?
>>>>
>>>> 1) This will complicate my direct searchs using the ID.
>>>> 2) Extra computation when adding and removing objects (now there 2
>>>> collections to maintain)
>>>>
>>>> The general question will be something like:
>>>> When Dictionaries are very suitable to store large quantity of objects
>>>> but
>>>> indexes are also needed which solution should be implemented ?
>>>>
>>>
>>>
>>> Assuming you do need or get benefits from the RC flavor (else it brings
>>> unnecessary overhead), then quickly analyzing the situation (until
>>> GemStone
>>> have indexed and rc-flavor Dictionary impl), I think I would use a
>>> RcIdentityBag. I would create a identity index for #id , and yes, you
>>> will
>>> have to modify your code that access the dict, to know detect on the
>>> collection using the identity index of ID.
>>>
>>> But...I am sure someone more experienced will come with a better
>>> approach!
>>>
>>> Cheers,
>>>
>>>
>>>
>>>> regards
>>>> bruno
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://forum.world.st/Large-co
>>>> llection-and-common-practice-tp4928607.html
>>>> Sent from the GLASS mailing list archive at Nabble.com.
>>>> _______________________________________________
>>>> Glass mailing list
>>>> 

> Glass at .gemtalksystems

>>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>>>
>>>
>>>
>>>
>>> --
>>> Mariano
>>> http://marianopeck.wordpress.com
>>>
>>> _______________________________________________
>>> Glass mailing list
>>> 

> Glass at .gemtalksystems

>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>>
>>>
>>
>>
>>
>> ------------------------------
>> [image: Avast logo]
>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>>
>> El software de antivirus Avast ha analizado este correo electrónico en
>> busca de virus.
>> www.avast.com
>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>>
>>
> 
> _______________________________________________
> Glass mailing list

> Glass at .gemtalksystems

> http://lists.gemtalksystems.com/mailman/listinfo/glass





--
View this message in context: http://forum.world.st/Large-collection-and-common-practice-tp4928607p4928630.html
Sent from the GLASS mailing list archive at Nabble.com.


More information about the Glass mailing list