[GemStone-Smalltalk] Slow Reclaims during Recovery

Dale Henrichs dale.henrichs at gemtalksystems.com
Fri Jul 18 09:35:33 PDT 2014


Ken,

I'm pretty sure that there are some known bugs in this area with regards to
slow recovery ... and we do have fixes for these bugs, but I'm not sure
which version they have been fixed in, but if they are fixed they should be
in 3.2.1....

I'll check with the guys in engineering to see if they recognize this
problem, but there are some folks on vacation so I won't necessarily be
getting quick responses:)

Dale


On Fri, Jul 18, 2014 at 9:29 AM, Ken Treis <ken at miriamtech.com> wrote:

> I made a major mistake yesterday and had to restore from a backup copy of
> the extent file. Thankfully, the copy was quite recent, so I expected
> recovery to be more or less instant because there were so few transactions
> to restore.
>
> But.. recovery took over 9 hours:
>
>     Recovery waited 0.630 seconds for freeFrameCount to recover.
>     Recovery took 34749.732 seconds
>       6.770 seconds spent waiting for free frames in shared page cache
>       34350.070 seconds spent in readThread waiting for free buffers
>
>
> I watched the DeadNotReclaimedObjs stat during this time (thanks Dale for
> pointing me to that), and I watched it slowly reclaim 65 million dead
> objects. Once the reclaim was done, things came alive really quickly. I'd
> finished a MFC earlier in the day, so apparently we had a long list of
> un-reclaimed objects.
>
> While the whole world was waiting for reclaim, the server appeared to be
> idle. Load average was about 0.01, GemStone processes were using very
> little CPU and doing no measurable I/O. Even after the entire 9-hour
> process, the reclaim gem had only accumulated 33 seconds of CPU time.
>
> During this agonizing process, I read up on tuning reclaim, and I fully
> expected to find that my system had a non-zero #sleepTimeBetweenReclaimMs:
>
> #sleepTimeBetweenReclaimMs
> The minimum amount of time in milliseconds that the process will sleep
> between reclaims, even when work is scheduled. The default is 0
> milliseconds, maximum 3600000.
>
>
> ... because that's what it was acting like. But no, after recovery
> finished, I checked and found that parameter to be 0.
>
> What else could cause reclaim-during-recovery to run so slowly? The
> reclaim gem didn't seem to be working any harder than it does during normal
> operation. I kind of appreciate that it doesn't over-tax the system
> normally, though but during recovery I want it to work its tail off.
>
> This is a GLASS app, so we have a small SPC (2 GB) but our extent is large
> (77 GB). But again, I didn't see that this work was either processor- or
> I/O-bound. We're running on an i2.xlarge EC2 instance with SSD storage, so
> we can pull all of that data through the small SPC pretty quickly. MFCs
> take us less than 20 minutes.
>
> --
> Ken Treis
> Miriam Technologies, Inc.
> (866) 652-2040 x221
>
>
> _______________________________________________
> GemStone-Smalltalk mailing list
> GemStone-Smalltalk at lists.gemtalksystems.com
> http://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/gemstone-smalltalk/attachments/20140718/0402b123/attachment.html>


More information about the GemStone-Smalltalk mailing list