[Glass] Out of space and now cannot start stone again :(

Mariano Martinez Peck marianopeck at gmail.com
Thu May 22 07:25:36 PDT 2014


On Wed, May 21, 2014 at 7:02 PM, James Foster <
james.foster at gemtalksystems.com> wrote:

> On a production system I would replace “delete the tranlog file” with
> “move the tranlog file elsewhere”!
>
>
Indeed that was what I did. Now I wonder...say copydbf failed to make the
tranlog to work again (like in my case), is the tranlog still useful?
What kind of black magic could you do? I guess there is something because
you are saying to keep it instead of remove it.

Thanks!


>
> On May 21, 2014, at 2:59 PM, Mariano Martinez Peck <marianopeck at gmail.com>
> wrote:
>
>
>
>
> On Wed, May 21, 2014 at 6:26 PM, James Foster <
> james.foster at gemtalksystems.com> wrote:
>
>> Thanks for the report on how it went. At least you learned something and
>> maybe others will have gained a lesson as well.
>>
>
> Ohhh I remembered something....once you start the stone again with -R it
> won't start and log will say something like:
>
>     Opened a transaction log file for read_nolocks.
>        filename = /XXX/Sites/XXX/gemstone/data/tranlog24.dbf
>     Starting up without recovery and the log file 24 exists
>       Remove log files and restart.
>     Terminating stone.
>
> So I needed to delete the tranlog file and start again. And there it
> worked.
>
> Thanks!
>
>
>
>
>> James
>>
>>
>> On May 21, 2014, at 1:05 PM, Mariano Martinez Peck <marianopeck at gmail.com>
>> wrote:
>>
>>
>>
>>
>>  On Wed, May 21, 2014 at 3:47 PM, James Foster <
>> james.foster at gemtalksystems.com> wrote:
>>
>>> Hi Mariano,
>>>
>>> When a transaction log full situation happens the stone will still be
>>> running. You should be able to make room (typically by moving existing
>>> files elsewhere and then deleting), and then the system will recognize that
>>> there is space available and resume operations. Alternatively you can log
>>> in as DataCurator and add another transaction log directory (presumably on
>>> another disk) and the system will resume operations. See the System
>>> Administration Guide (SAG), section 8.4, page 190 for details on these
>>> options.
>>>
>>>
>> Hi James, thanks for this great answer!!
>>
>> Ohh wow....I should have clearly not rebooted the machine right ahead to
>> add more hard disk space. Next time: either remove some files/directories
>> so that to resume operations and then do a clean shutdown. Then fix the low
>> space for real and then restore operations.
>>
>>
>>
>>> It is best to try to keep the stone running so that you do not lose data.
>>>
>>
>> OK...didn't know. I learned today.
>>
>>
>>> If you stop the stone and it was not able to shut down cleanly, then it
>>> will attempt to replay transaction logs when it next starts. If a
>>> transaction log is corrupt (e.g., has an incomplete record), then the
>>> replay will fail (as shown in your stone log).
>>>
>>>
>> Indeed, that's exactly what happened in my case.
>>
>>
>>> If you start the stone with the -N option (SAG p. 326), then the
>>> database will be consistent as of the last checkpoint, which is generally
>>> not the most recent transaction. Checkpoints can be triggered manually
>>> (Stone class>>#’startCheckpointAsync’ and Stone
>>> class>>#’startCheckpointSync’) and they will happen automatically based on
>>> the STN_CHECKPOINT_INTERVAL configuration (SAG p. 289).
>>>
>>>
>> Thanks, I have just checked and yes, it is 5 minutes (300 seconds).
>>
>>
>>> Each checkpoint is recorded in the transaction log and you can get
>>> information using the copydbf command (SAG pp. 310-11). In particular, the
>>> -I (capital eye) option lists all checkpoint times. Tools are available for
>>> doing further analysis of the transaction log (SAG Chapter F), so you can
>>> see something about the transactions that happened after the last
>>> checkpoint.
>>>
>>>
>> OK, good to know. In my case it was not that critical (I could lost 5
>> minutes of edits). But good to know. I have just tried, and yes, I can see
>> the checkpoint lists.
>>
>>
>>> It is possible that using copydbf (SAG p. 310) you can copy all but the
>>> last (invalid) transaction record. (I took a valid transaction log that had
>>> 100899 records, used ‘truncate’ to shorten it, used copydbf, and the result
>>> had 100898 records.) To try this rename the existing (bad) log file, and
>>> copydbf to the former name and try to restart the stone. If that works then
>>> everything except the last transaction is fine.
>>>
>>>
>> OK, I tried. I made the new one and yes, it was some bytes less than the
>> "broken" one, but my stone would not launch anyway..saying the same error.
>> So I proceed with the -N option. But it was a good shot. Don't know why it
>> didn't work.
>>
>>
>>> Otherwise, by starting with the -N option you will have lost
>>> transactions since the last checkpoint (typically every five minutes). If
>>> you want us to try to recover more then it might be time for a consulting
>>> engagement. ;-)
>>>
>>>
>> :) I appreciate your detailed answer. Starting with a -N was enough for
>> the moment. Next time: do not reboot nor kill stone. First try to shut it
>> down cleanly.
>>
>> Thanks james!
>>
>>
>>> Let us know how it goes,
>>>
>>> James Foster
>>>
>>>
>>> On May 21, 2014, at 11:02 AM, Mariano Martinez Peck <
>>> marianopeck at gmail.com> wrote:
>>>
>>> Hi guys,
>>>
>>> My server got out of disk space and gemstone had the following error
>>> while trying to connect via topaz:
>>>
>>> *Login denied to other than SystemUser or DataCurator. All tranlog*
>>> *directories are full. Stone process waiting for operator to archive*
>>> *tranlogs or add more directories.,*
>>>
>>> OK, I make the virtual hard disk , expand partitions etc...now the OS
>>> hard disk space looks with 400GB free. But when I try to start the stone
>>> again, it fails to do it from the last translog.
>>>
>>> What I am supposed to do?  startstone -N ?   If true...what info do I
>>> lost? what "info" is up to the last checkpoint?  In my case this is a
>>> seaside app with GLASS transaction management. Would the lost be just the
>>> last request?
>>>
>>>
>>> Thanks in advance,
>>>
>>>
>>>
>>>
>>> ========================================================================
>>>     Now starting GemStone monitor.
>>>
>>> Write to /proc/2468/oom_score_adj failed with EACCES , linux user does
>>> not have CAP_SYS_RESOURCE
>>> No server process protection from OOM killer
>>>
>>>  _____________________________________________________________________________
>>> |     SESSION CONFIGURATION: The maximum number of concurrent sessions
>>> is 41. |
>>>
>>> |_____________________________________________________________________________|
>>>
>>>     Attaching the Shared Cache using Stone name: XXX
>>>     Successfully started 1 free frame page servers.
>>>
>>>     -------------------------------------------------------
>>>     Summary of Configured Transaction Logs
>>>       Directory   0:
>>>         configured name /XXX/Sites/XXX/gemstone/data
>>>         expanded name /XXX/Sites/XXX/gemstone/data/
>>>         configuredSize 1000 MB
>>>       Directory   1:
>>>         configured name /XXX/Sites/XXX/gemstone/data
>>>         expanded name /XXX/Sites/XXX/gemstone/data/
>>>         configuredSize 1000 MB
>>>     -------------------------------------------------------
>>>
>>>     Extent #0
>>>     -----------
>>>     Filename = !#dbf!/XXX/Sites/XXX/gemstone/data/extent0.dbf
>>>     Maximum size = NONE
>>>     File size = 10586 Mbytes = 677504 pages
>>>     Space available = 7927 Mbytes = 507353 pages
>>>
>>>     Totals
>>>     ------
>>>     Repository Size = 10586 Mbytes = 677504 pages
>>>     Free Space = 7927 Mbytes = 507353 pages
>>>     ---------------------------------------------------
>>>     Extent 0 was not cleanly shutdown.
>>>
>>>
>>>     Repository startup statistics:
>>>         Pages Need Reclaiming =10
>>>         Free Oops=133758598
>>>         Oop Number High Water Mark=157752737
>>>         Possible Dead Objects=12405000
>>>         Dead Objects=0
>>>         Epoch Transaction Count=0
>>>         Epoch New Objects Union=0
>>>         Epoch Written Objects Union=0
>>>         Epoch DependencyMap Objects Union=0
>>>
>>>     Repository startup is from checkpoint = (fileId 13, blockId 388232)
>>>
>>>    SearchForMostRecentLog did not find any tranlogs
>>>
>>>  :: (wildcard) found in listening addresses, ignoring other addresses
>>> created listening socket for ::, on :: port 43405
>>>
>>>
>>>     Opened a transaction log file for read_nolocks.
>>>        filename = /XXX/Sites/XXX/gemstone/data/tranlog13.dbf
>>> EOF while reading log record.
>>> EOF encountered while reading log record.    Unable to read log record
>>> 13.388232 for current checkpoint
>>>     Waiting for all tranlog writes to complete
>>>
>>>     Stone startup has failed.
>>>
>>>
>>>
>>>
>>> --
>>> Mariano
>>> http://marianopeck.wordpress.com
>>> _______________________________________________
>>> Glass mailing list
>>> Glass at lists.gemtalksystems.com
>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>>
>>>
>>>
>>
>>
>> --
>> Mariano
>> http://marianopeck.wordpress.com
>>
>>
>>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>
>
>


-- 
Mariano
http://marianopeck.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/glass/attachments/20140522/48d4770c/attachment-0001.html>


More information about the Glass mailing list