[Glass] Large collection and common practice

Smalltalk via Glass glass at lists.gemtalksystems.com
Tue Jan 3 15:48:55 PST 2017


Paul,

I welcome all your information that you mention is very useful. This 
application runs with Seaside no GBS with it's replicates.

Time will tell how i will approach this problem but the information is 
very useful. And with GemStone there are plenty of solutions :)

For now i will keep it simple maybe in the future i will try some sort 
of your solution !

regards

bruno


El 03/01/2017 a las 20:33, Paul Baumann escribió:
> Hi Bruno,
>
> Why you might not want to keep objects forever in an RC collection:
>
> RC collections have more overhead. Many are implemented with 
> session-specific sub-structures that can be modified by the session 
> with little risk of conflict (the rare conflict is when the RC 
> collection itself changes). Consider if you have an RC collection that 
> was populated from 100 separate sessions and then the query that the 
> RC collection would need to do to see if an object/key exists in the 
> RC collection. RC collections are well implemented and reasonably 
> efficient, they just aren't as efficient at some operations (like 
> lookup) as some non-RC collections (otherwise all would be implemented 
> to have RC behavior). You'll find that RC collections have sometimes 
> unexpected growth and shrinkage behavior. Some grow large 
> session-specific subcollections that may never be cleaned up unless 
> there is at least one removal. Some grow in inopportune moments that 
> can affect time-sensitive operations. I'm not saying that you 
> shouldn't use the RC collections for root collections, it depends on 
> your application needs.
>
> Regarding indexing for many attributes:
>
> Sounds like you want to create indexes on a common collection like 
> OrderedCollection that only one session in in charge of updating. I 
> know that GemTalk had improved their indexing implementation several 
> years back, but some kinds of practical issues likely still remain. A 
> field index updates some underlying structure that might also be 
> updated from changes to other objects by other sessions, updates to 
> indexes used to cause many commit conflicts. The more indexes a 
> collection has, the higher the odds of commit conflict. Applications 
> that I've worked on for the past decade or so didn't use collection 
> indexes like you are about to do. An application that used a lot of 
> indexes also had some custom code to save and replay changes to domain 
> objects to compensate for unpredictable commit failures. It is from 
> experiences like that that the queue-manager approach became useful 
> despite all the cross-session coordination.
>
> I'd probably implement a query kind of object that wraps that 
> collection to support collection-specific queries and maintenance 
> operations. The OC (or whatever you use) would normally be private to 
> the query object. The query object could even have special behavior 
> for avoiding commit conflicts (like locking or queueing for example). 
> The query object might for even be clever enough to do a 
> private/internal RC queue when your application code detects conflict 
> is possible (like from use of locks). The queue object would manage 
> the internal RC collection as practical.
>
> You might think of making that query object a subclass of Collection 
> but any GBS users out there should beware that there would be 
> replication bugs (I'd reported the bug with workaround code to GemTalk 
> many years ago). I doubt you'd be doing replication of something like 
> this even if you used GBS, but just saying there was is a bit of 
> strangeness to be discovered at the basic/private/primitive levels and 
> unfortunately it means that caution applies to user-defined subclasses 
> of Collection.
>
> I'm not suggesting you do this, but it is an option. In the time that 
> indexing was not reliable I'd once resorted to creating my own 
> application-specific indexes. That query object that I just mentioned 
> could also have private dictionary instances that can quickly resolve 
> specific keys (attributes of the objects). The query object has the 
> overhead of also maintaining the private attribute-key dictionaries as 
> object are added and removed. I could go into how I implemented these 
> application-defined indexes even without the query object wrapping it, 
> but no need because you have good GemTalk supported indexes now anyway.
>
> I've presented ideas more complicated than you'll need, hopefully an 
> awareness of potential issues and past remedies will save you some effort.
>
> Regards,
>
> Paul Baumann
>
>
> On Tue, Jan 3, 2017 at 5:04 PM, Smalltalk <smalltalk at adinet.com.uy 
> <mailto:smalltalk at adinet.com.uy>> wrote:
>
>     Paul,
>
>     Thanks for your answer ...
>
>     /*  you don't always want to keep the objects in the RC collection
>
>     Why you don't always want to keep the objects in the RC collection ?
>     This is what i'm doing right now :(  - RcKeyValueDictionary
>
>     Thanks for the technique you are explaining.
>
>     For now i will keep as simple as i can :) may be in the future
>     (next year) i can do something like that but i need to do much
>     more research :)
>
>     /* I wonder what kind of indexing you would need besides ID. If
>     you don't need to query for anything other than ID then a
>     dictionary would be fine with the ID as key.
>
>     This project/system implement a persistence layer (using rest
>     services) for a Java Application (www.orbeon.com
>     <http://www.orbeon.com>) which is used to design, publish, save
>     and query web forms.
>     (https://github.com/brunobuzzi/OrbeonPersistenceLayer/
>     <https://github.com/brunobuzzi/OrbeonPersistenceLayer/>)
>
>     When designing/publishing/sending/saving form --> the ID is mostly
>     used.
>     Then you have the Summary page. That display all form instances
>     (saved and sent forms) of some form definition.
>     Here the user can search by a particular field of the defined form.
>     A search can be by N different fields depending on the form
>     definition (the definition could be a form with 200 nested fields
>     and sections or whatever).
>
>     In this case indexes are very useful but in the previous cases a
>     Dictionary is more suitable using the id (that after assigned is
>     immutable)
>
>     regards,
>     bruno
>
>
>     El 03/01/2017 a las 17:38, Paul Baumann escribió:
>>     Hi Bruno,
>>
>>     Multiple sessions can feed an RC collection with reduced commit
>>     conflicts, but you don't always want to keep the objects in the
>>     RC collection. One common technique is to have a manager session
>>     dedicated to moving objects from RC collections into collections
>>     that can be accessed more efficiently. Design so that the manager
>>     is the only session that will be updating the collections (so
>>     that commit conflicts will not happen). The manager session can
>>     do polling for new items and you can add gem-to-gem signaling to
>>     wake the manager for more timely responses. The challenges with
>>     this kind of design are related to update timing between
>>     sessions. The process involves a commit to add to the RC
>>     collection, an abort for the manager session to see the objects,
>>     a commit by the manager to update the root collection (with RC
>>     collection removal, RcQueues are usually used BTW), and an abort
>>     by the original session if it needs to see the indexed item was
>>     added to the root collection. If there is timing sensitivity with
>>     this data then you'll likely resort to searching first in your
>>     indexeded collection and then also reviewing objects still in the
>>     queue waiting for the manager to process them.
>>
>>     A variation of the manager session technique is to send data to
>>     the manager session without doing a commit, this might be through
>>     communication between gems or by using session-specific file
>>     updates that the manager gem reads. Gem-to-gem signaling can be
>>     added to this approach later too if you need to improve timing.
>>     This variation can avoid the intermediate commit, but you'd still
>>     may need to #continueTransaction to see what the manager session
>>     updated.
>>
>>     I wonder what kind of indexing you would need besides ID. If you
>>     don't need to query for anything other than ID then a dictionary
>>     would be fine with the ID as key. A dictionary can even use a key
>>     that is a custom object that redefines equality and hash from
>>     attributes of what is searched for. Merkle tree hashes might also
>>     be used as a way to test if some attribute is contained, but that
>>     is a bit advanced to go into. Another advanced item that I once
>>     implemented was a custom Dictionary where the key was derived
>>     from the value by behavior (it was more efficient because it
>>     avoided the cost of Association creation). So many cool tricks, I
>>     loved working with GS/S.
>>
>>     Paul Baumann
>>
>>
>>
>>     On Tue, Jan 3, 2017 at 2:39 PM, Mariano Martinez Peck via Glass
>>     <glass at lists.gemtalksystems.com
>>     <mailto:glass at lists.gemtalksystems.com>> wrote:
>>
>>
>>         On Tue, Jan 3, 2017 at 3:14 PM, BrunoBB via Glass
>>         <glass at lists.gemtalksystems.com
>>         <mailto:glass at lists.gemtalksystems.com>> wrote:
>>
>>             Hi All,
>>
>>             I have a lot RcKeyValueDictionary where the key is the id
>>             of the object and
>>             the value is the object itself.
>>             This id once assigned it does NOT change, so far so good :)
>>
>>             The RcKeyValueDictionary is used intensively to add and
>>             remove objects (in
>>             my case OrbeonFormInstance). The dictionary is very
>>             useful because the key
>>             is always given as parameter.
>>
>>             Also there are searchs by specific inst var of
>>             OrbeonFormInstance class
>>             (like username,group, createdTime and so on).
>>
>>             My problem is that i can NOT create an index on
>>             aRcKeyValueDictionary.
>>             So which is the commom practice in these cases:
>>             1- Change the RcKeyValueDictionary to be an
>>             UnorderedCollection ?
>>             2- Add a new instance variable to the class that holds the
>>             RcKeyValueDictionary and this new variable to be
>>             anUnorderedCollection ?
>>
>>             1) This will complicate my direct searchs using the ID.
>>             2) Extra computation when adding and removing objects
>>             (now there 2
>>             collections to maintain)
>>
>>             The general question will be something like:
>>             When Dictionaries are very suitable to store large
>>             quantity of objects but
>>             indexes are also needed which solution should be
>>             implemented ?
>>
>>
>>
>>         Assuming you do need or get benefits from the RC flavor (else
>>         it brings unnecessary overhead), then quickly analyzing the
>>         situation (until GemStone have indexed and rc-flavor
>>         Dictionary impl), I think I would use a RcIdentityBag. I
>>         would create a identity index for #id , and yes, you will
>>         have to modify your code that access the dict, to know detect
>>         on the collection using the identity index of ID.
>>
>>         But...I am sure someone more experienced will come with a
>>         better approach!
>>
>>         Cheers,
>>
>>
>>
>>             regards
>>             bruno
>>
>>
>>
>>             --
>>             View this message in context:
>>             http://forum.world.st/Large-collection-and-common-practice-tp4928607.html
>>             <http://forum.world.st/Large-collection-and-common-practice-tp4928607.html>
>>             Sent from the GLASS mailing list archive at Nabble.com.
>>             _______________________________________________
>>             Glass mailing list
>>             Glass at lists.gemtalksystems.com
>>             <mailto:Glass at lists.gemtalksystems.com>
>>             http://lists.gemtalksystems.com/mailman/listinfo/glass
>>             <http://lists.gemtalksystems.com/mailman/listinfo/glass>
>>
>>
>>
>>
>>         -- 
>>         Mariano
>>         http://marianopeck.wordpress.com
>>         <http://marianopeck.wordpress.com>
>>
>>         _______________________________________________
>>         Glass mailing list
>>         Glass at lists.gemtalksystems.com
>>         <mailto:Glass at lists.gemtalksystems.com>
>>         http://lists.gemtalksystems.com/mailman/listinfo/glass
>>         <http://lists.gemtalksystems.com/mailman/listinfo/glass>
>>
>>
>
>
>
>     ------------------------------------------------------------------------
>     Avast logo
>     <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>     	
>
>     El software de antivirus Avast ha analizado este correo
>     electrónico en busca de virus.
>     www.avast.com
>     <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>
>
>
>



---
El software de antivirus Avast ha analizado este correo electrónico en busca de virus.
https://www.avast.com/antivirus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/glass/attachments/20170103/0b00194f/attachment-0001.html>


More information about the Glass mailing list