[Glass] extent0.dbf grows

Thu Aug 6 16:29:10 PDT 2015

On 08/05/2015 02:33 PM, Mariano Martinez Peck wrote:
>
>
> Dale, it is not clear to me what the checkpoint interval is. I 
> understood it was at abort/commit/logout..so how is this internal 
> related? Is there a way I can check the stone configuration parameter 
> of this (the 5 minutes)?
A checkpoint is a gemstone system operation where all data pages in the 
SPC that were written before a given commit are flushed to disk. After a 
checkpoint, gemstone guarantees that all data for a particular commit 
has been written to the extents ... Until a commit is checkpointed, the 
systems relies on the records in the tranlog for recovery ... the 
checkpoint is not necessarily the primary point at which pages are freed 
up in an extent but it does represent a book keeping boundary that 
triggers other processing that does lead to freeing up extent pages ....

When you recover from a crash, the system looks at the commit that was 
last checkpointed and then scans the tranlogs looking for that commit 
and then reads the tranlogs from that point forwards to complete 
recovery ....

STN_CHECKPOINT_INTERVAL in the system.conf file is used to customize the 
checkpoint interval ... Given the above the checkpoint interval controls 
how much data would have to be recovered from tranlogs in the event of a 
system crash ... with the default checkpoint interval of 5 minutes, that 
means that 5 minutes worth of tranlog data will have to be recovered on 
the other end of the spectrum, with a checkpoint interval of 5 minutes, 
that means that every 5 minutes the SPC will be scanned for pages that 
have not been written to disk ... tuning the checkpoint interval 
involves finding a balance between scan cpu consumption, disk i/o, and 
recovery time, SPC size and commit rates.

If the checkpoint interval is too short you may consume a lot of cpu 
time doing checkpoints without actually writing any dirty pages. If the 
checkpoint interval is too long, it may take a long time to replay 
tranlogs during crash recovery ... There's a third inflection point 
where the checkpoint interval is shorter than the time it takes to write 
all the dirty pages accumulated and you end up in perpetual checkpoint 
mode - in this case you just have to live with the exposure to longer 
recovery times (or take steps to improve disk i/o or ....)

At the end of the day, tuning the checkpoint interval only comes into 
play at higher commit rates ...
>
> So for the "system data" to turn into free space, the other gems need 
> to abort/commit/logout AND only after 5 minutes that turning into free 
> space will happen?   I ask because I am scheduling some batch jobs 
> running at night, and as soon as they all finish, I run GC...and 
> sometimes it seems I do not really GC what I should have...
>
The actual process for freeing up a page is a pretty complicated 
process. Dirty pages cannot be reclaimed, so I mentioned the checkpoint 
interval in relation to free pages because the checkpoint kicks off 
processing (in a manner of speaking) that leads to the possible creation 
of free pages ... different types of data are stoned on pages and the 
rules for reclaiming a page depends upon what type of data is on the 
page ...

Data pages cannot be reclaimed until there are no active views that 
could possible reference an object on that page (the object table maps 
oops to pages) so a single live object on a page can keep an entire data 
page from being reclaimed ... So in your case when you run the gc, you 
are not guaranteed that all of the pages housing the objects will be 
reclaimed. In the worse case, each dead object may be on a separate  
page and you end up with no additional free pages ... the reclaim gems 
do look around for "scavengable pages" and will do a certain amount of 
automatic data consolidation and you can get information about "data 
fragmentation" by using Repository>>pagesWithPercentFree: or 
Repository>>fastPagesWithPercentFree:, but as I think I've mentioned 
before the system is very dynamic and over time a system that is running 
at constant rates will achieve an equilibrium in terms of free pages but 
the actual number of free pages will fluctuate within a  range that is 
dictated by quite a few different factors.

It's good to keep an eye on things to recognize when unreasonable growth 
is occurring, but I don't think that it is reasonable to expect that the 
system always stay within some strict size limits ...

Of course the challenge is to differentiate between unreasonable growth 
and growth due to real data accumulation, so it's worth diving deep on 
these different subjects so that you can learn about the normal rhythms 
of your own system ...

Dale