[Glass] Lots of seaside objects not being GCed (need gemstone advise)

Sat Jul 11 14:28:19 PDT 2015

On Tue, Jul 7, 2015 at 3:56 PM, Dale Henrichs <
dale.henrichs at gemtalksystems.com> wrote:

>
>
> On 07/07/2015 05:49 AM, Mariano Martinez Peck wrote:
>
>   Dale,
>
>  I have continue analyzing this in other stones and after some testing it
> is clear that some sessions (the size would depend on the system usage) are
> NOT GCed unless I shut all seaside gems down or cycle them. Originally I
> was having  GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE on 90% and I was cycling
> seaside gems once a day as part of GC. Then, I changed it to 100% and stop
> restarting gems. Now...it COULD have happened that I did not restarted all
> gems since I modified the GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE and so, the
> system was still running with 90% and yet I was not restarting seaside gems
> anymore.
>
> Yes. The meaning of GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=100 is that all
> pomgen spaces are dropped ... this does not mean that all references to
> persistent objects in the vm are dropped ....
>

Indeed. That's why to be 100% sure to drop all references to persistent
objects you likely need to recycle seaside gems (even with
EM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE=100)

>   That could explain why I hold onto some instances, right?  Another
> possibility is the "stale reference" you mention below. *I continue
> answering below:*
>
>
>>     Good point. Thanks. I will remember it for next time: each time I am
>> dealing with this kind of stuff: cycle all seaside gems first!
>> Thanks. BTW, my GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE is 100% now to avoid
>> having to cycle gems.
>> I will continue with the tests with cycling/killing the gems... but....
>> continue reading below...
>>
>>  Do you also have the marksweep guy running?
>>
>
>  The guy that every 30 minutes perform the "System
> _generationScavenge_vmMarkSweep."?  Then yes. Why you ask? how this guy
> could affect? He does not hold any seaside session as far as I know...i
> simply sends "System _generationScavenge_vmMarkSweep.". Could it be that
> the #wait: freezes the gem and therefore does not answer the the voting?
>
> No if a gem is busy, the stone patiently waits for the gem to hit a
> transaction boundary - the vote happens on a transaction boundary.
>

Dale, with this comment, I do not understand why then the comment in the
sys admin guide I pasted below "*Gems do not vote until they complete their
current transaction. If a Gem is sleeping or otherwise engaged in a long
transaction, the vote cannot be **finalized and garbage collection pauses
at this point.*"

> This is one of the factors that causes reclaimAll to be non-deterministic
> (our goal is for recalimAll to be deterministic, but the system _is_ a
> complex state machine). Gems can be busy doing a long running transaction
> or a a gem can be idle sitting in transaction - like an idle topaz or
> GemTools and unless the system triggers an event to cause the gem to wake
> up, like hitting the commit record limit thresholds, the system patiently
> waits for the Gem to finish it's "work".
>

Ok... so it will wait. Ok, I got that.

>
>  Mmmmm now I read in the sysadmin guide: *"Gems do not vote until they
> complete their current transaction. If a Gem is sleeping or otherwise
> engaged in a long transaction, the vote cannot be*
> *finalized and garbage collection pauses at this point. Commit records
> accumulate, garbage accumulates, and a variety of problems can ensue."*
>
>  Uffff maybe since this guys practically sleeps all the time and yet does
> not do a commit nor abort in each iteration of the loop...maybe this guy is
> preventing the vote?
>
> Recall the little process that you installed the vm marksweep code? This
> particular process is there so that a Seaside gem is guaranteed to have a
> Smalltalk process ready and available to respond to the SigAbort ... The
> SigAbort is sent by the stone, when commit records accumulate ...
>
>
> Well. Here is where I have the last question. That little process we are
talking about does this code:

 [
  | count minutesToForceGemGC |
  count := 0.
  minutesToForceGemGC := 30.
   [ true ] whileTrue: [
  (Delay forSeconds: 30) wait.
  count := count + 1.
  (count \\\ (minutesToForceGemGC * 2)) = 0 ifTrue: [
  System _generationScavenge_vmMarkSweep.
  count := 0.
  ].
  ].
 ] forkAt: Processor lowestPriority.

So my question is.... in that code you see I do NOT ever do a commit or
abort. So I don't see how this code can enter what you describe as "the
vote happens on a transaction boundary". I mean...that code is 99.9% time
in a #wait doing no commit nor abort. So...wouldn't that make the voting
process to wait for it forever?  Or the SigAbort is what would prevent that?

>  Even more......the sysadmin guide also says: *"If a committed object in
> the pom area has been modified, it is copied to the old area if a scavenge
> occurs before the change is committed."*
>
>   If it is not that ...maybe you asked because....if it happened that I
> modified the session (by any seaside request) and the
> _generationScavenge_vmMarkSweep happened before the request processing
> finished, then the session would have been moved to "old" space? But even
> in this case, when the request processing finishes, it would commit the
> "modified persistent object" (seaside session)...
>
>
>
>>  GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE only dumps the pomgen on the floor ...
>> a stale direct reference to one of these in the the TOC can also keep these
>> guys alive ... I think that was why I marveled in the earlier message about
>> it only being 32 instances ... with the referencePath method not finding
>> anything you should be able to declare a victory:)
>>
>
>  What do you mean by a stale direct reference?
>
>
> The case I was thinking about is that you could create a reference to a
> piece of session state that ultimately refers back to a session from a
> purely temporary object and if that temporary object is alive in the TOC
> when it's time to vote and no sweep has been run then that reference
> _could_ cause the WASession to be voted down ...
>

OK , I got that scenario. I think I prefer this than cycling gems.

> At the end of the day, when we are talking about a handful of sessions
> being kept alive for an mfc or two,  I don't think it is a major problem
> ... you can survive with this temporary leakage and it shouldn't become
> necessary to shut the entire system down and restart to make sure that the
> last crumb is swept from the table ...
>

Indeed. Fully agree. I just wanted to confirm that my sessions around were
THIS scenario and nor a major leak.

>
>
> All of these checks and balances ensure that we do not garbage collect an
> object that shouldn't be garbage collected and in a dynamic system that
> means that we have to err on the side of caution.
>
> It might be worth verifying that we don't have a bug in the system, by
> copying the extent into a sandbox, where you can do a clinical attempt to
> run an mfc and satisfy yourself that these 50 session objects can indeed be
> collect in a tightly controlled system ... if you re still unable to
> collect these objects under controlled conditions and the references paths
> are empty we are looking at the real posibility of a bug and we will want
> to get to the bottom of it...
>

Yes, I will run BackScan  too in the servers and check that too.

Thanks Dale,

-- 
Mariano
http://marianopeck.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/glass/attachments/20150711/20254b37/attachment.html>