[Glass] Lots of seaside objects not being GCed (need gemstone advise)

Mariano Martinez Peck via Glass glass at lists.gemtalksystems.com
Mon Jul 6 15:09:12 PDT 2015


On Mon, Jul 6, 2015 at 6:47 PM, Dale Henrichs <
dale.henrichs at gemtalksystems.com> wrote:

>
> On 07/06/2015 12:28 PM, Mariano Martinez Peck wrote:
>
>
>
> On Mon, Jul 6, 2015 at 2:33 PM, Dale Henrichs via Glass <
> glass at lists.gemtalksystems.com> wrote:
>
>>  Mariano,
>>
>> I've read over your other messages and I guess you are still struggling
>> to clean these guys up ... Rest of my comments in line.
>>
>>
>  Hi Dale.
> Thanks, I answer inline.
>
>
>>  Dael
>>
>> On 07/03/2015 08:22 PM, Mariano Martinez Peck via Glass wrote:
>>
>>
>>  Then...I check some #allInstances size and I get this:
>>
>>  DpWebSession allInstances size 32
>> WACallbackRegistry allInstances size 217
>> JQueryClass allInstances size 16519
>> WACache  allInstances size 35
>> WAApplication allInstances size 3
>> WARenderVisitor allInstances size 217
>> WARenderContext allInstances size 217
>>  WAHtmlCanvas allInstances size 909
>> .....
>>
>>    Right off the bat, my observation is that this doesn't seem like a
>> lot of uncollected objects, presumably you churn through a lot more
>> sessions than this on a regular basis, so these objects appear to be the
>> exception instead of the rule...
>>
>
>  Of course. All those numbers are in a system which didn't receive a
> single request in a whole day. And this is the results after all the
> cleanings I could do. So this is why I expect to have zero instances of
> those (meaning .. no zero, but much less in the real system that what I
> have now),.
>
> Okay, you didn't have a single request today, so these objects must be
> hanging around from a previous day. Did you have zero instances the day
> before?
>
> Without any other information, it is possible that these objects got left
> behind because of a voting issue (i.e., reference left in the head of a vm)
> ... did you cycle all of the gems before running the mfc? What is your
> setting for GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE? If I'm not mistaken
> GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE does not guarantee that there aren't other
> references in the gems head to objects ...
>
> These instances did not appear out of thin air and there is a logical
> reason for them to be still hanging around ... This a complicated system
> with a number of moving parts and there is no way to rule out bugs either
> ...
>
> Without a detailed accounting of the "starting point" and the gems,
> started and stopped between that point and now it is impossible to guess we
> cannot guess what might have happened ...
>
> I would suggest that at some point you record the oops of the session
> objects so that we don't end up finding that every time we look we are
> looking at a different set of sessions ...
>


Good point. Thanks. I will remember it for next time: each time I am
dealing with this kind of stuff: cycle all seaside gems first!
Thanks. BTW, my GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE is 100% now to avoid
having to cycle gems.
I will continue with the tests with cycling/killing the gems... but....
continue reading below...



>
>
>
>
>>
>>
>>  (just as some examples).
>>
>>  The good news is that ALL the sessions do look expired:
>>
>>  (DpWebSession allInstances select: [ :each | (each instVarNamed:
>> 'parent') isNil ]) size 32
>>
>>  (expired sessions have a nil 'parent').
>>
>>  One of the interestings that came out of the Larry's "ordeal", is that
>> we found a bug in WACache>>gemstoneReap, where an error while running this
>> method can result in objects getting stuck in the WACache. Basically
>> objects are marked as expired in the WARcLastAccessExpiryPolicy, but due to
>> the error, they may not be removed from the objectsByKey and keysByObjects
>> dictionaries ... thus keeping them alive "forever".
>>
>> If you check your maintenance vm logs, you might find an error with
>> WACache>>gemstoneReap (Almost Out of Memory is how we found the bug) in
>> Larry's case.
>>
>
>
>  I grep but I found no error in my maintenance logs.
>
>
>
>>
>> Since you have so few sessions, we can test whether the object  leak is
>> due to this bug:
>>
>>   | sessions |
>>   System abortTransaction.
>>   sessions := WASession allInstances
>>     select: [ :each | (each instVarNamed: 'parent') isNil ].
>>   System abortTransaction.
>>   WAApplication allInstances
>>     do: [ :app |
>>       | cache keysByObject |
>>       cache := app cache.
>>       keysByObject := cache instVarNamed: 'keysByObject'.
>>       sessions
>>         do: [ :session |
>>           (keysByObject includesKey: session)
>>             ifTrue: self halt ] ]
>>
>> If you get a halt running the above, then you've been bitten by the bug
>> and you you need to arrange to remove the session objects from both dicts.
>> See WACache>>gemstoneReap for example code ...
>>
>
>  I did not get a Halt in above code.
>
> Did you replace `WASession allInstances` with DpWebSession?
>


hahahahaha how can I keep my respect after this? hahaha. Sorry. What
another pair of eyes can do... Thanks Dale..Sorry. Having a 2 month baby is
killing ahahah (perfect excuse!)
OK...so yeah, it halted.
So if I understand correct, a possible fix is what you submitted to
https://github.com/GsDevKit/Seaside31/issues/68 ?
As to the workaround (to remove existing ones) I am not sure what I should
do. I guess first step is to apply above fix. Then... A simple
WACache allInstances do: [:each | each gemstoneReap]   would not do it?


>
>
>>
>> If the WASessions are not stuck in a WAApplication, then it's likely that
>> you have some accidental reference to the WASession objects and you'll have
>> to trace the reference path back to a persistent root using
>> Repository>>findReferencePathToObject: .. this method only returns one
>> reference path... In 3.2 we've created
>> Repository>>findAllReferencePathsToObject: that finds and returns all of
>> the reference paths (in a pinch you could upgrade your repository to 3.2.6
>> just to run the aalysis) ...
>>
>> [1] https://github.com/GsDevKit/Seaside31/issues/68
>>
>
>
>  Yes, in fact, earlier today I tried #findReferencePathToObject:  with
> (MySessionSubclass allInstances any) and guess what????
> I get an array of only 2 entries, first element is target object and
> second element is false. Reading method comment says it means there is no
> path to that object. WTF!!!! so then why they do not go away??? As said, I
> do run MFC, I do run #reclaimAll... so..... *in which scenario would I
> hold into instances (and in fact found via #allInstances), yet
> #findReferencePathToObject: would say there is no path?*
>
> If I'm not mistaken, #findReferencePathToObject: scans for references in
> the repository, but does not take into account instances in a vm's memory
>

Ahhhh!!!! while #allInstances answer both!


> ...
>
> At this point I don't know  whether these objects are staying alive
> because of persistent references or because they are in a vms head and
> being voted down ...
>
>
>
>>
>>
>>  However...I cannot explain why I still have all that garbage above if
>> all sessions are expired. Is that normal? I would expect to have nothing.
>>
>>  It's not normal:)
>>
>
>  That's cool to hear. So...even if those are little number of objects,
> this gives me a small scenario of the real system. If this stone has not
> received a single request in hours, then I should get ZERO instances of
> those :) Cool.
>
>
>
> To know with certainty whether or not an object is considered truly dead,
> you can look at System class>>_deadNotReclaimed and see if the oops of the
> suspect sessions are in it or not (see the comment in the method for
> conditions of use). Barring any nasty bugs they are likely to have be voted
> down ...
>
> If you set STN_TRAN_LOG_DEBUG_LEVEL=3 in your system.conf and restart your
> stone ... it is possible to find the list of objects in the possible dead
> set, the list of objects voted down (and the session id that voted them
> down) and the original list of deadNotReclaimed ...
>
> Of course if you restart your stone then the heads of the various gems
> will be cleared and it is likely that the objects will go away on the next
> mfc ... Note that in 3.1.0.6, it is possible that the gem doing the mfc is
> hanging onto some objects in it's head, so unless you logout after the mfc,
> that might be the reason for voting guys down ...
>
>
OK, thanks for this explanation.



-- 
Mariano
http://marianopeck.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/glass/attachments/20150706/d8607062/attachment-0001.html>


More information about the Glass mailing list