[GemStone-Smalltalk] Slow Reclaims during Recovery

Fri Jul 18 11:20:50 PDT 2014

Ken,

It is indeed a known bug:

"Well observed on the impact of reclaimSleepTime. This parameter is not
supposed to impact active reclaim, as Bill notes; it should only apply when
reclaim is not busy and ready to go to sleep. We've reproduced the problem,
however; it turns out that there is a bug that causes this to have an
impact similar to sleepTimeBetweenReclaimMs, and so, significantly slows
down reclaim. This has been reported as bug 44292."

The bug _has_ been fixed in 3.2.1 and is queued up for release in 3.1.0.7,
but we do not have a release date for 3.1.0.7. Here's a bugnote[1]
describing the bug and workarounds...

Dale

[1] http://gemtalksystems.com/data/bugnotes/44292.html

On Fri, Jul 18, 2014 at 9:29 AM, Ken Treis <ken at miriamtech.com> wrote:

> I made a major mistake yesterday and had to restore from a backup copy of
> the extent file. Thankfully, the copy was quite recent, so I expected
> recovery to be more or less instant because there were so few transactions
> to restore.
>
> But.. recovery took over 9 hours:
>
>     Recovery waited 0.630 seconds for freeFrameCount to recover.
>     Recovery took 34749.732 seconds
>       6.770 seconds spent waiting for free frames in shared page cache
>       34350.070 seconds spent in readThread waiting for free buffers
>
>
> I watched the DeadNotReclaimedObjs stat during this time (thanks Dale for
> pointing me to that), and I watched it slowly reclaim 65 million dead
> objects. Once the reclaim was done, things came alive really quickly. I'd
> finished a MFC earlier in the day, so apparently we had a long list of
> un-reclaimed objects.
>
> While the whole world was waiting for reclaim, the server appeared to be
> idle. Load average was about 0.01, GemStone processes were using very
> little CPU and doing no measurable I/O. Even after the entire 9-hour
> process, the reclaim gem had only accumulated 33 seconds of CPU time.
>
> During this agonizing process, I read up on tuning reclaim, and I fully
> expected to find that my system had a non-zero #sleepTimeBetweenReclaimMs:
>
> #sleepTimeBetweenReclaimMs
> The minimum amount of time in milliseconds that the process will sleep
> between reclaims, even when work is scheduled. The default is 0
> milliseconds, maximum 3600000.
>
>
> ... because that's what it was acting like. But no, after recovery
> finished, I checked and found that parameter to be 0.
>
> What else could cause reclaim-during-recovery to run so slowly? The
> reclaim gem didn't seem to be working any harder than it does during normal
> operation. I kind of appreciate that it doesn't over-tax the system
> normally, though but during recovery I want it to work its tail off.
>
> This is a GLASS app, so we have a small SPC (2 GB) but our extent is large
> (77 GB). But again, I didn't see that this work was either processor- or
> I/O-bound. We're running on an i2.xlarge EC2 instance with SSD storage, so
> we can pull all of that data through the small SPC pretty quickly. MFCs
> take us less than 20 minutes.
>
> --
> Ken Treis
> Miriam Technologies, Inc.
> (866) 652-2040 x221
>
>
> _______________________________________________
> GemStone-Smalltalk mailing list
> GemStone-Smalltalk at lists.gemtalksystems.com
> http://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/gemstone-smalltalk/attachments/20140718/df78e359/attachment-0001.html>