[Glass] Lots of seaside objects not being GCed (need gemstone advise)

Tue Jul 7 11:56:28 PDT 2015

On 07/07/2015 05:49 AM, Mariano Martinez Peck wrote:
> Dale,
>
> I have continue analyzing this in other stones and after some testing 
> it is clear that some sessions (the size would depend on the system 
> usage) are NOT GCed unless I shut all seaside gems down or cycle them. 
> Originally I was having  GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE on 90% and I 
> was cycling seaside gems once a day as part of GC. Then, I changed it 
> to 100% and stop restarting gems. Now...it COULD have happened that I 
> did not restarted all gems since I modified the 
> GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE and so, the system was still running 
> with 90% and yet I was not restarting seaside gems anymore.
Yes. The meaning of GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=100 is that all 
pomgen spaces are dropped ... this does not mean that all references to 
persistent objects in the vm are dropped ....
> That could explain why I hold onto some instances, right?  Another 
> possibility is the "stale reference" you mention below. *I continue 
> answering below:*
>
>>     Good point. Thanks. I will remember it for next time: each time I
>>     am dealing with this kind of stuff: cycle all seaside gems first!
>>     Thanks. BTW, my GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE is 100% now to
>>     avoid having to cycle gems.
>>     I will continue with the tests with cycling/killing the gems...
>>     but.... continue reading below...
>     Do you also have the marksweep guy running?
>
>
> The guy that every 30 minutes perform the "System 
> _generationScavenge_vmMarkSweep."?  Then yes. Why you ask? how this 
> guy could affect? He does not hold any seaside session as far as I 
> know...i simply sends "System _generationScavenge_vmMarkSweep.". Could 
> it be that the #wait: freezes the gem and therefore does not answer 
> the the voting?
No if a gem is busy, the stone patiently waits for the gem to hit a 
transaction boundary - the vote happens on a transaction boundary. This 
is one of the factors that causes reclaimAll to be non-deterministic 
(our goal is for recalimAll to be deterministic, but the system _is_ a 
complex state machine). Gems can be busy doing a long running 
transaction or a a gem can be idle sitting in transaction - like an idle 
topaz or GemTools and unless the system triggers an event to cause the 
gem to wake up, like hitting the commit record limit thresholds, the 
system patiently waits for the Gem to finish it's "work".
>
> Mmmmm now I read in the sysadmin guide: *"Gems do not vote until they 
> complete their current transaction. If a Gem is sleeping or otherwise 
> engaged in a long transaction, the vote cannot be*
> *finalized and garbage collection pauses at this point. Commit records 
> accumulate, garbage accumulates, and a variety of problems can ensue."*
>
> Uffff maybe since this guys practically sleeps all the time and yet 
> does not do a commit nor abort in each iteration of the loop...maybe 
> this guy is preventing the vote?
Recall the little process that you installed the vm marksweep code? This 
particular process is there so that a Seaside gem is guaranteed to have 
a Smalltalk process ready and available to respond to the SigAbort ... 
The SigAbort is sent by the stone, when commit records accumulate ...
>
> Even more......the sysadmin guide also says: *"If a committed object 
> in the pom area has been modified, it is copied to the old area if a 
> scavenge occurs before the change is committed."*
> If it is not that ...maybe you asked because....if it happened that I 
> modified the session (by any seaside request) and the 
> _generationScavenge_vmMarkSweep happened before the request processing 
> finished, then the session would have been moved to "old" space? But 
> even in this case, when the request processing finishes, it would 
> commit the "modified persistent object" (seaside session)...
>
>     GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE only dumps the pomgen on the
>     floor ... a stale direct reference to one of these in the the TOC
>     can also keep these guys alive ... I think that was why I marveled
>     in the earlier message about it only being 32 instances ... with
>     the referencePath method not finding anything you should be able
>     to declare a victory:)
>
>
> What do you mean by a stale direct reference?
>
The case I was thinking about is that you could create a reference to a 
piece of session state that ultimately refers back to a session from a 
purely temporary object and if that temporary object is alive in the TOC 
when it's time to vote and no sweep has been run then that reference 
_could_ cause the WASession to be voted down ...

At the end of the day, when we are talking about a handful of sessions 
being kept alive for an mfc or two,  I don't think it is a major problem 
... you can survive with this temporary leakage and it shouldn't become 
necessary to shut the entire system down and restart to make sure that 
the last crumb is swept from the table ...

All of these checks and balances ensure that we do not garbage collect 
an object that shouldn't be garbage collected and in a dynamic system 
that means that we have to err on the side of caution.

It might be worth verifying that we don't have a bug in the system, by 
copying the extent into a sandbox, where you can do a clinical attempt 
to run an mfc and satisfy yourself that these 50 session objects can 
indeed be collect in a tightly controlled system ... if you re still 
unable to collect these objects under controlled conditions and the 
references paths are empty we are looking at the real posibility of a 
bug and we will want to get to the bottom of it...

Dale
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/glass/attachments/20150707/b3962614/attachment.html>