[Glass] Lots of seaside objects not being GCed (need gemstone advise)

Mon Jul 6 14:47:34 PDT 2015

On 07/06/2015 12:28 PM, Mariano Martinez Peck wrote:
>
>
> On Mon, Jul 6, 2015 at 2:33 PM, Dale Henrichs via Glass 
> <glass at lists.gemtalksystems.com 
> <mailto:glass at lists.gemtalksystems.com>> wrote:
>
>     Mariano,
>
>     I've read over your other messages and I guess you are still
>     struggling to clean these guys up ... Rest of my comments in line.
>
>
> Hi Dale.
> Thanks, I answer inline.
>
>     Dael
>
>     On 07/03/2015 08:22 PM, Mariano Martinez Peck via Glass wrote:
>>
>>     Then...I check some #allInstances size and I get this:
>>
>>     DpWebSession allInstances size 32
>>     WACallbackRegistry allInstances size 217
>>     JQueryClass allInstances size 16519
>>     WACache  allInstances size 35
>>     WAApplication allInstances size 3
>>     WARenderVisitor allInstances size 217
>>     WARenderContext allInstances size 217
>>     WAHtmlCanvas allInstances size 909
>>     .....
>>
>     Right off the bat, my observation is that this doesn't seem like
>     a  lot of uncollected objects, presumably you churn through a lot
>     more sessions than this on a regular basis, so these objects
>     appear to be the exception instead of the rule...
>
>
> Of course. All those numbers are in a system which didn't receive a 
> single request in a whole day. And this is the results after all the 
> cleanings I could do. So this is why I expect to have zero instances 
> of those (meaning .. no zero, but much less in the real system that 
> what I have now),.
Okay, you didn't have a single request today, so these objects must be 
hanging around from a previous day. Did you have zero instances the day 
before?

Without any other information, it is possible that these objects got 
left behind because of a voting issue (i.e., reference left in the head 
of a vm) ... did you cycle all of the gems before running the mfc? What 
is your setting for GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE? If I'm not 
mistaken GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE does not guarantee that there 
aren't other references in the gems head to objects ...

These instances did not appear out of thin air and there is a logical 
reason for them to be still hanging around ... This a complicated system 
with a number of moving parts and there is no way to rule out bugs 
either ...

Without a detailed accounting of the "starting point" and the gems, 
started and stopped between that point and now it is impossible to guess 
we cannot guess what might have happened ...

I would suggest that at some point you record the oops of the session 
objects so that we don't end up finding that every time we look we are 
looking at a different set of sessions ...

>
>
>>     (just as some examples).
>>
>>     The good news is that ALL the sessions do look expired:
>>
>>     (DpWebSession allInstances select: [ :each | (each instVarNamed:
>>     'parent') isNil ]) size 32
>>
>>     (expired sessions have a nil 'parent').
>     One of the interestings that came out of the Larry's "ordeal", is
>     that we found a bug in WACache>>gemstoneReap, where an error while
>     running this method can result in objects getting stuck in the
>     WACache. Basically objects are marked as expired in the
>     WARcLastAccessExpiryPolicy, but due to the error, they may not be
>     removed from the objectsByKey and keysByObjects dictionaries ...
>     thus keeping them alive "forever".
>
>     If you check your maintenance vm logs, you might find an error
>     with WACache>>gemstoneReap (Almost Out of Memory is how we found
>     the bug) in Larry's case.
>
>
>
> I grep but I found no error in my maintenance logs.
>
>
>     Since you have so few sessions, we can test whether the object 
>     leak is due to this bug:
>
>       | sessions |
>       System abortTransaction.
>       sessions := WASession allInstances
>         select: [ :each | (each instVarNamed: 'parent') isNil ].
>       System abortTransaction.
>       WAApplication allInstances
>         do: [ :app |
>           | cache keysByObject |
>           cache := app cache.
>           keysByObject := cache instVarNamed: 'keysByObject'.
>           sessions
>             do: [ :session |
>               (keysByObject includesKey: session)
>                 ifTrue: self halt ] ]
>
>     If you get a halt running the above, then you've been bitten by
>     the bug and you you need to arrange to remove the session objects
>     from both dicts. See WACache>>gemstoneReap for example code ...
>
>
> I did not get a Halt in above code.
Did you replace `WASession allInstances` with DpWebSession?
>
>
>     If the WASessions are not stuck in a WAApplication, then it's
>     likely that you have some accidental reference to the WASession
>     objects and you'll have to trace the reference path back to a
>     persistent root using Repository>>findReferencePathToObject: ..
>     this method only returns one reference path... In 3.2 we've
>     created Repository>>findAllReferencePathsToObject: that finds and
>     returns all of the reference paths (in a pinch you could upgrade
>     your repository to 3.2.6 just to run the aalysis) ...
>
>     [1] https://github.com/GsDevKit/Seaside31/issues/68
>
>
>
> Yes, in fact, earlier today I tried #findReferencePathToObject:  with 
> (MySessionSubclass allInstances any) and guess what????
> I get an array of only 2 entries, first element is target object and 
> second element is false. Reading method comment says it means there is 
> no path to that object. WTF!!!! so then why they do not go away??? As 
> said, I do run MFC, I do run #reclaimAll... so..... *in which scenario 
> would I hold into instances (and in fact found via #allInstances), yet 
> #findReferencePathToObject: would say there is no path?*
If I'm not mistaken, #findReferencePathToObject: scans for references in 
the repository, but does not take into account instances in a vm's 
memory ...

At this point I don't know  whether these objects are staying alive 
because of persistent references or because they are in a vms head and 
being voted down ...

>
>
>>
>>     However...I cannot explain why I still have all that garbage
>>     above if all sessions are expired. Is that normal? I would expect
>>     to have nothing.
>     It's not normal:)
>
>
> That's cool to hear. So...even if those are little number of objects, 
> this gives me a small scenario of the real system. If this stone has 
> not received a single request in hours, then I should get ZERO 
> instances of those :) Cool.
>

To know with certainty whether or not an object is considered truly 
dead, you can look at System class>>_deadNotReclaimed and see if the 
oops of the suspect sessions are in it or not (see the comment in the 
method for conditions of use). Barring any nasty bugs they are likely to 
have be voted down ...

If you set STN_TRAN_LOG_DEBUG_LEVEL=3 in your system.conf and restart 
your stone ... it is possible to find the list of objects in the 
possible dead set, the list of objects voted down (and the session id 
that voted them down) and the original list of deadNotReclaimed ...

Of course if you restart your stone then the heads of the various gems 
will be cleared and it is likely that the objects will go away on the 
next mfc ... Note that in 3.1.0.6, it is possible that the gem doing the 
mfc is hanging onto some objects in it's head, so unless you logout 
after the mfc, that might be the reason for voting guys down ...

Dale

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/glass/attachments/20150706/d3d1454f/attachment.html>