[GemStone-Smalltalk] Slow Reclaims during Recovery

Fri Jul 18 11:29:53 PDT 2014

Great, thanks Dale. I'll drop the reclaimSleepTime to 1 as the workaround suggests.

--
Ken Treis
Miriam Technologies, Inc.
(866) 652-2040 x221

On Jul 18, 2014, at 11:20 AM, Dale Henrichs <dale.henrichs at gemtalksystems.com> wrote:

> Ken,
> 
> It is indeed a known bug:
> 
> "Well observed on the impact of reclaimSleepTime. This parameter is not supposed to impact active reclaim, as Bill notes; it should only apply when reclaim is not busy and ready to go to sleep. We've reproduced the problem, however; it turns out that there is a bug that causes this to have an impact similar to sleepTimeBetweenReclaimMs, and so, significantly slows down reclaim. This has been reported as bug 44292."
> 
> The bug _has_ been fixed in 3.2.1 and is queued up for release in 3.1.0.7, but we do not have a release date for 3.1.0.7. Here's a bugnote[1] describing the bug and workarounds...
> 
> Dale
> 
> [1] http://gemtalksystems.com/data/bugnotes/44292.html
> 
> 
> On Fri, Jul 18, 2014 at 9:29 AM, Ken Treis <ken at miriamtech.com> wrote:
> I made a major mistake yesterday and had to restore from a backup copy of the extent file. Thankfully, the copy was quite recent, so I expected recovery to be more or less instant because there were so few transactions to restore.
> 
> But.. recovery took over 9 hours:
> 
>>     Recovery waited 0.630 seconds for freeFrameCount to recover.
>>     Recovery took 34749.732 seconds
>>       6.770 seconds spent waiting for free frames in shared page cache
>>       34350.070 seconds spent in readThread waiting for free buffers
> 
> I watched the DeadNotReclaimedObjs stat during this time (thanks Dale for pointing me to that), and I watched it slowly reclaim 65 million dead objects. Once the reclaim was done, things came alive really quickly. I'd finished a MFC earlier in the day, so apparently we had a long list of un-reclaimed objects.
> 
> While the whole world was waiting for reclaim, the server appeared to be idle. Load average was about 0.01, GemStone processes were using very little CPU and doing no measurable I/O. Even after the entire 9-hour process, the reclaim gem had only accumulated 33 seconds of CPU time.
> 
> During this agonizing process, I read up on tuning reclaim, and I fully expected to find that my system had a non-zero #sleepTimeBetweenReclaimMs:
> 
>> #sleepTimeBetweenReclaimMs
>> The minimum amount of time in milliseconds that the process will sleep between reclaims, even when work is scheduled. The default is 0 milliseconds, maximum 3600000.
> 
> ... because that's what it was acting like. But no, after recovery finished, I checked and found that parameter to be 0.
> 
> What else could cause reclaim-during-recovery to run so slowly? The reclaim gem didn't seem to be working any harder than it does during normal operation. I kind of appreciate that it doesn't over-tax the system normally, though but during recovery I want it to work its tail off.
> 
> This is a GLASS app, so we have a small SPC (2 GB) but our extent is large (77 GB). But again, I didn't see that this work was either processor- or I/O-bound. We're running on an i2.xlarge EC2 instance with SSD storage, so we can pull all of that data through the small SPC pretty quickly. MFCs take us less than 20 minutes. 
> 
> --
> Ken Treis
> Miriam Technologies, Inc.
> (866) 652-2040 x221
> 
> 
> _______________________________________________
> GemStone-Smalltalk mailing list
> GemStone-Smalltalk at lists.gemtalksystems.com
> http://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/gemstone-smalltalk/attachments/20140718/7e03f5d5/attachment.html>