[Glass] Out of space and now cannot start stone again :(

Wed May 21 13:05:33 PDT 2014

On Wed, May 21, 2014 at 3:47 PM, James Foster <
james.foster at gemtalksystems.com> wrote:

> Hi Mariano,
>
> When a transaction log full situation happens the stone will still be
> running. You should be able to make room (typically by moving existing
> files elsewhere and then deleting), and then the system will recognize that
> there is space available and resume operations. Alternatively you can log
> in as DataCurator and add another transaction log directory (presumably on
> another disk) and the system will resume operations. See the System
> Administration Guide (SAG), section 8.4, page 190 for details on these
> options.
>
>
Hi James, thanks for this great answer!!

Ohh wow....I should have clearly not rebooted the machine right ahead to
add more hard disk space. Next time: either remove some files/directories
so that to resume operations and then do a clean shutdown. Then fix the low
space for real and then restore operations.

> It is best to try to keep the stone running so that you do not lose data.
>

OK...didn't know. I learned today.

> If you stop the stone and it was not able to shut down cleanly, then it
> will attempt to replay transaction logs when it next starts. If a
> transaction log is corrupt (e.g., has an incomplete record), then the
> replay will fail (as shown in your stone log).
>
>
Indeed, that's exactly what happened in my case.

> If you start the stone with the -N option (SAG p. 326), then the database
> will be consistent as of the last checkpoint, which is generally not the
> most recent transaction. Checkpoints can be triggered manually (Stone
> class>>#’startCheckpointAsync’ and Stone class>>#’startCheckpointSync’) and
> they will happen automatically based on the STN_CHECKPOINT_INTERVAL
> configuration (SAG p. 289).
>
>
Thanks, I have just checked and yes, it is 5 minutes (300 seconds).

> Each checkpoint is recorded in the transaction log and you can get
> information using the copydbf command (SAG pp. 310-11). In particular, the
> -I (capital eye) option lists all checkpoint times. Tools are available for
> doing further analysis of the transaction log (SAG Chapter F), so you can
> see something about the transactions that happened after the last
> checkpoint.
>
>
OK, good to know. In my case it was not that critical (I could lost 5
minutes of edits). But good to know. I have just tried, and yes, I can see
the checkpoint lists.

> It is possible that using copydbf (SAG p. 310) you can copy all but the
> last (invalid) transaction record. (I took a valid transaction log that had
> 100899 records, used ‘truncate’ to shorten it, used copydbf, and the result
> had 100898 records.) To try this rename the existing (bad) log file, and
> copydbf to the former name and try to restart the stone. If that works then
> everything except the last transaction is fine.
>
>
OK, I tried. I made the new one and yes, it was some bytes less than the
"broken" one, but my stone would not launch anyway..saying the same error.
So I proceed with the -N option. But it was a good shot. Don't know why it
didn't work.

> Otherwise, by starting with the -N option you will have lost transactions
> since the last checkpoint (typically every five minutes). If you want us to
> try to recover more then it might be time for a consulting engagement. ;-)
>
>
:) I appreciate your detailed answer. Starting with a -N was enough for the
moment. Next time: do not reboot nor kill stone. First try to shut it down
cleanly.

Thanks james!

> Let us know how it goes,
>
> James Foster
>
>
> On May 21, 2014, at 11:02 AM, Mariano Martinez Peck <marianopeck at gmail.com>
> wrote:
>
> Hi guys,
>
> My server got out of disk space and gemstone had the following error while
> trying to connect via topaz:
>
> *Login denied to other than SystemUser or DataCurator. All tranlog*
> *directories are full. Stone process waiting for operator to archive*
> *tranlogs or add more directories.,*
>
> OK, I make the virtual hard disk , expand partitions etc...now the OS hard
> disk space looks with 400GB free. But when I try to start the stone again,
> it fails to do it from the last translog.
>
> What I am supposed to do?  startstone -N ?   If true...what info do I
> lost? what "info" is up to the last checkpoint?  In my case this is a
> seaside app with GLASS transaction management. Would the lost be just the
> last request?
>
>
> Thanks in advance,
>
>
>
>
> ========================================================================
>     Now starting GemStone monitor.
>
> Write to /proc/2468/oom_score_adj failed with EACCES , linux user does not
> have CAP_SYS_RESOURCE
> No server process protection from OOM killer
>
>  _____________________________________________________________________________
> |     SESSION CONFIGURATION: The maximum number of concurrent sessions is
> 41. |
>
> |_____________________________________________________________________________|
>
>     Attaching the Shared Cache using Stone name: XXX
>     Successfully started 1 free frame page servers.
>
>     -------------------------------------------------------
>     Summary of Configured Transaction Logs
>       Directory   0:
>         configured name /XXX/Sites/XXX/gemstone/data
>         expanded name /XXX/Sites/XXX/gemstone/data/
>         configuredSize 1000 MB
>       Directory   1:
>         configured name /XXX/Sites/XXX/gemstone/data
>         expanded name /XXX/Sites/XXX/gemstone/data/
>         configuredSize 1000 MB
>     -------------------------------------------------------
>
>     Extent #0
>     -----------
>     Filename = !#dbf!/XXX/Sites/XXX/gemstone/data/extent0.dbf
>     Maximum size = NONE
>     File size = 10586 Mbytes = 677504 pages
>     Space available = 7927 Mbytes = 507353 pages
>
>     Totals
>     ------
>     Repository Size = 10586 Mbytes = 677504 pages
>     Free Space = 7927 Mbytes = 507353 pages
>     ---------------------------------------------------
>     Extent 0 was not cleanly shutdown.
>
>
>     Repository startup statistics:
>         Pages Need Reclaiming =10
>         Free Oops=133758598
>         Oop Number High Water Mark=157752737
>         Possible Dead Objects=12405000
>         Dead Objects=0
>         Epoch Transaction Count=0
>         Epoch New Objects Union=0
>         Epoch Written Objects Union=0
>         Epoch DependencyMap Objects Union=0
>
>     Repository startup is from checkpoint = (fileId 13, blockId 388232)
>
>    SearchForMostRecentLog did not find any tranlogs
>
>  :: (wildcard) found in listening addresses, ignoring other addresses
> created listening socket for ::, on :: port 43405
>
>
>     Opened a transaction log file for read_nolocks.
>        filename = /XXX/Sites/XXX/gemstone/data/tranlog13.dbf
> EOF while reading log record.
> EOF encountered while reading log record.    Unable to read log record
> 13.388232 for current checkpoint
>     Waiting for all tranlog writes to complete
>
>     Stone startup has failed.
>
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>  _______________________________________________
> Glass mailing list
> Glass at lists.gemtalksystems.com
> http://lists.gemtalksystems.com/mailman/listinfo/glass
>
>
>

-- 
Mariano
http://marianopeck.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/glass/attachments/20140521/6987d29a/attachment.html>