[GemStone-Smalltalk] Slow Reclaims during Recovery
Ken Treis
ken at miriamtech.com
Fri Jul 18 11:29:53 PDT 2014
Great, thanks Dale. I'll drop the reclaimSleepTime to 1 as the workaround suggests.
--
Ken Treis
Miriam Technologies, Inc.
(866) 652-2040 x221
On Jul 18, 2014, at 11:20 AM, Dale Henrichs <dale.henrichs at gemtalksystems.com> wrote:
> Ken,
>
> It is indeed a known bug:
>
> "Well observed on the impact of reclaimSleepTime. This parameter is not supposed to impact active reclaim, as Bill notes; it should only apply when reclaim is not busy and ready to go to sleep. We've reproduced the problem, however; it turns out that there is a bug that causes this to have an impact similar to sleepTimeBetweenReclaimMs, and so, significantly slows down reclaim. This has been reported as bug 44292."
>
> The bug _has_ been fixed in 3.2.1 and is queued up for release in 3.1.0.7, but we do not have a release date for 3.1.0.7. Here's a bugnote[1] describing the bug and workarounds...
>
> Dale
>
> [1] http://gemtalksystems.com/data/bugnotes/44292.html
>
>
> On Fri, Jul 18, 2014 at 9:29 AM, Ken Treis <ken at miriamtech.com> wrote:
> I made a major mistake yesterday and had to restore from a backup copy of the extent file. Thankfully, the copy was quite recent, so I expected recovery to be more or less instant because there were so few transactions to restore.
>
> But.. recovery took over 9 hours:
>
>> Recovery waited 0.630 seconds for freeFrameCount to recover.
>> Recovery took 34749.732 seconds
>> 6.770 seconds spent waiting for free frames in shared page cache
>> 34350.070 seconds spent in readThread waiting for free buffers
>
> I watched the DeadNotReclaimedObjs stat during this time (thanks Dale for pointing me to that), and I watched it slowly reclaim 65 million dead objects. Once the reclaim was done, things came alive really quickly. I'd finished a MFC earlier in the day, so apparently we had a long list of un-reclaimed objects.
>
> While the whole world was waiting for reclaim, the server appeared to be idle. Load average was about 0.01, GemStone processes were using very little CPU and doing no measurable I/O. Even after the entire 9-hour process, the reclaim gem had only accumulated 33 seconds of CPU time.
>
> During this agonizing process, I read up on tuning reclaim, and I fully expected to find that my system had a non-zero #sleepTimeBetweenReclaimMs:
>
>> #sleepTimeBetweenReclaimMs
>> The minimum amount of time in milliseconds that the process will sleep between reclaims, even when work is scheduled. The default is 0 milliseconds, maximum 3600000.
>
> ... because that's what it was acting like. But no, after recovery finished, I checked and found that parameter to be 0.
>
> What else could cause reclaim-during-recovery to run so slowly? The reclaim gem didn't seem to be working any harder than it does during normal operation. I kind of appreciate that it doesn't over-tax the system normally, though but during recovery I want it to work its tail off.
>
> This is a GLASS app, so we have a small SPC (2 GB) but our extent is large (77 GB). But again, I didn't see that this work was either processor- or I/O-bound. We're running on an i2.xlarge EC2 instance with SSD storage, so we can pull all of that data through the small SPC pretty quickly. MFCs take us less than 20 minutes.
>
> --
> Ken Treis
> Miriam Technologies, Inc.
> (866) 652-2040 x221
>
>
> _______________________________________________
> GemStone-Smalltalk mailing list
> GemStone-Smalltalk at lists.gemtalksystems.com
> http://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/gemstone-smalltalk/attachments/20140718/7e03f5d5/attachment.html>
More information about the GemStone-Smalltalk
mailing list