[Glass] extent0.dbf grows
Dale Henrichs via Glass
glass at lists.gemtalksystems.com
Thu Aug 6 16:29:10 PDT 2015
On 08/05/2015 02:33 PM, Mariano Martinez Peck wrote:
>
>
> Dale, it is not clear to me what the checkpoint interval is. I
> understood it was at abort/commit/logout..so how is this internal
> related? Is there a way I can check the stone configuration parameter
> of this (the 5 minutes)?
A checkpoint is a gemstone system operation where all data pages in the
SPC that were written before a given commit are flushed to disk. After a
checkpoint, gemstone guarantees that all data for a particular commit
has been written to the extents ... Until a commit is checkpointed, the
systems relies on the records in the tranlog for recovery ... the
checkpoint is not necessarily the primary point at which pages are freed
up in an extent but it does represent a book keeping boundary that
triggers other processing that does lead to freeing up extent pages ....
When you recover from a crash, the system looks at the commit that was
last checkpointed and then scans the tranlogs looking for that commit
and then reads the tranlogs from that point forwards to complete
recovery ....
STN_CHECKPOINT_INTERVAL in the system.conf file is used to customize the
checkpoint interval ... Given the above the checkpoint interval controls
how much data would have to be recovered from tranlogs in the event of a
system crash ... with the default checkpoint interval of 5 minutes, that
means that 5 minutes worth of tranlog data will have to be recovered on
the other end of the spectrum, with a checkpoint interval of 5 minutes,
that means that every 5 minutes the SPC will be scanned for pages that
have not been written to disk ... tuning the checkpoint interval
involves finding a balance between scan cpu consumption, disk i/o, and
recovery time, SPC size and commit rates.
If the checkpoint interval is too short you may consume a lot of cpu
time doing checkpoints without actually writing any dirty pages. If the
checkpoint interval is too long, it may take a long time to replay
tranlogs during crash recovery ... There's a third inflection point
where the checkpoint interval is shorter than the time it takes to write
all the dirty pages accumulated and you end up in perpetual checkpoint
mode - in this case you just have to live with the exposure to longer
recovery times (or take steps to improve disk i/o or ....)
At the end of the day, tuning the checkpoint interval only comes into
play at higher commit rates ...
>
> So for the "system data" to turn into free space, the other gems need
> to abort/commit/logout AND only after 5 minutes that turning into free
> space will happen? I ask because I am scheduling some batch jobs
> running at night, and as soon as they all finish, I run GC...and
> sometimes it seems I do not really GC what I should have...
>
The actual process for freeing up a page is a pretty complicated
process. Dirty pages cannot be reclaimed, so I mentioned the checkpoint
interval in relation to free pages because the checkpoint kicks off
processing (in a manner of speaking) that leads to the possible creation
of free pages ... different types of data are stoned on pages and the
rules for reclaiming a page depends upon what type of data is on the
page ...
Data pages cannot be reclaimed until there are no active views that
could possible reference an object on that page (the object table maps
oops to pages) so a single live object on a page can keep an entire data
page from being reclaimed ... So in your case when you run the gc, you
are not guaranteed that all of the pages housing the objects will be
reclaimed. In the worse case, each dead object may be on a separate
page and you end up with no additional free pages ... the reclaim gems
do look around for "scavengable pages" and will do a certain amount of
automatic data consolidation and you can get information about "data
fragmentation" by using Repository>>pagesWithPercentFree: or
Repository>>fastPagesWithPercentFree:, but as I think I've mentioned
before the system is very dynamic and over time a system that is running
at constant rates will achieve an equilibrium in terms of free pages but
the actual number of free pages will fluctuate within a range that is
dictated by quite a few different factors.
It's good to keep an eye on things to recognize when unreasonable growth
is occurring, but I don't think that it is reasonable to expect that the
system always stay within some strict size limits ...
Of course the challenge is to differentiate between unreasonable growth
and growth due to real data accumulation, so it's worth diving deep on
these different subjects so that you can learn about the normal rhythms
of your own system ...
Dale
More information about the Glass
mailing list