[Glass] Out of space and now cannot start stone again :(

James Foster james.foster at gemtalksystems.com
Wed May 21 15:02:07 PDT 2014


On a production system I would replace “delete the tranlog file” with “move the tranlog file elsewhere”!

On May 21, 2014, at 2:59 PM, Mariano Martinez Peck <marianopeck at gmail.com> wrote:

> 
> 
> 
> On Wed, May 21, 2014 at 6:26 PM, James Foster <james.foster at gemtalksystems.com> wrote:
> Thanks for the report on how it went. At least you learned something and maybe others will have gained a lesson as well. 
> 
> Ohhh I remembered something....once you start the stone again with -R it won't start and log will say something like:
> 
>     Opened a transaction log file for read_nolocks.
>        filename = /XXX/Sites/XXX/gemstone/data/tranlog24.dbf
>     Starting up without recovery and the log file 24 exists
>       Remove log files and restart.
>     Terminating stone.
> 
> So I needed to delete the tranlog file and start again. And there it worked.
> 
> Thanks!
> 
> 
>  
> James
> 
> 
> On May 21, 2014, at 1:05 PM, Mariano Martinez Peck <marianopeck at gmail.com> wrote:
> 
>> 
>> 
>> 
>> On Wed, May 21, 2014 at 3:47 PM, James Foster <james.foster at gemtalksystems.com> wrote:
>> Hi Mariano,
>> 
>> When a transaction log full situation happens the stone will still be running. You should be able to make room (typically by moving existing files elsewhere and then deleting), and then the system will recognize that there is space available and resume operations. Alternatively you can log in as DataCurator and add another transaction log directory (presumably on another disk) and the system will resume operations. See the System Administration Guide (SAG), section 8.4, page 190 for details on these options. 
>> 
>> 
>> Hi James, thanks for this great answer!! 
>> 
>> Ohh wow....I should have clearly not rebooted the machine right ahead to add more hard disk space. Next time: either remove some files/directories so that to resume operations and then do a clean shutdown. Then fix the low space for real and then restore operations.
>> 
>>  
>> It is best to try to keep the stone running so that you do not lose data.
>> 
>> OK...didn't know. I learned today.
>>  
>> If you stop the stone and it was not able to shut down cleanly, then it will attempt to replay transaction logs when it next starts. If a transaction log is corrupt (e.g., has an incomplete record), then the replay will fail (as shown in your stone log).
>> 
>> 
>> Indeed, that's exactly what happened in my case. 
>>  
>> If you start the stone with the -N option (SAG p. 326), then the database will be consistent as of the last checkpoint, which is generally not the most recent transaction. Checkpoints can be triggered manually (Stone class>>#’startCheckpointAsync’ and Stone class>>#’startCheckpointSync’) and they will happen automatically based on the STN_CHECKPOINT_INTERVAL configuration (SAG p. 289).
>> 
>> 
>> Thanks, I have just checked and yes, it is 5 minutes (300 seconds). 
>>  
>> Each checkpoint is recorded in the transaction log and you can get information using the copydbf command (SAG pp. 310-11). In particular, the -I (capital eye) option lists all checkpoint times. Tools are available for doing further analysis of the transaction log (SAG Chapter F), so you can see something about the transactions that happened after the last checkpoint. 
>> 
>> 
>> OK, good to know. In my case it was not that critical (I could lost 5 minutes of edits). But good to know. I have just tried, and yes, I can see the checkpoint lists.  
>>  
>> It is possible that using copydbf (SAG p. 310) you can copy all but the last (invalid) transaction record. (I took a valid transaction log that had 100899 records, used ‘truncate’ to shorten it, used copydbf, and the result had 100898 records.) To try this rename the existing (bad) log file, and copydbf to the former name and try to restart the stone. If that works then everything except the last transaction is fine. 
>> 
>> 
>> OK, I tried. I made the new one and yes, it was some bytes less than the "broken" one, but my stone would not launch anyway..saying the same error. So I proceed with the -N option. But it was a good shot. Don't know why it didn't work. 
>>  
>> Otherwise, by starting with the -N option you will have lost transactions since the last checkpoint (typically every five minutes). If you want us to try to recover more then it might be time for a consulting engagement. ;-)
>> 
>> 
>> :) I appreciate your detailed answer. Starting with a -N was enough for the moment. Next time: do not reboot nor kill stone. First try to shut it down cleanly. 
>> 
>> Thanks james!
>>  
>> Let us know how it goes,
>> 
>> James Foster
>> 
>> 
>> On May 21, 2014, at 11:02 AM, Mariano Martinez Peck <marianopeck at gmail.com> wrote:
>> 
>>> Hi guys,
>>> 
>>> My server got out of disk space and gemstone had the following error while trying to connect via topaz:
>>> 
>>> Login denied to other than SystemUser or DataCurator. All tranlog
>>> directories are full. Stone process waiting for operator to archive
>>> tranlogs or add more directories.,
>>> 
>>> OK, I make the virtual hard disk , expand partitions etc...now the OS hard disk space looks with 400GB free. But when I try to start the stone again, it fails to do it from the last translog. 
>>> 
>>> What I am supposed to do?  startstone -N ?   If true...what info do I lost? what "info" is up to the last checkpoint?  In my case this is a seaside app with GLASS transaction management. Would the lost be just the last request? 
>>> 
>>> 
>>> Thanks in advance,
>>> 
>>> 
>>> 
>>> 
>>> ========================================================================
>>>     Now starting GemStone monitor.
>>> 
>>> Write to /proc/2468/oom_score_adj failed with EACCES , linux user does not have CAP_SYS_RESOURCE
>>> No server process protection from OOM killer
>>>  _____________________________________________________________________________
>>> |     SESSION CONFIGURATION: The maximum number of concurrent sessions is 41. |
>>> |_____________________________________________________________________________|
>>> 
>>>     Attaching the Shared Cache using Stone name: XXX
>>>     Successfully started 1 free frame page servers.
>>> 
>>>     -------------------------------------------------------
>>>     Summary of Configured Transaction Logs
>>>       Directory   0:
>>>         configured name /XXX/Sites/XXX/gemstone/data
>>>         expanded name /XXX/Sites/XXX/gemstone/data/
>>>         configuredSize 1000 MB
>>>       Directory   1:
>>>         configured name /XXX/Sites/XXX/gemstone/data
>>>         expanded name /XXX/Sites/XXX/gemstone/data/
>>>         configuredSize 1000 MB
>>>     -------------------------------------------------------
>>> 
>>>     Extent #0
>>>     -----------
>>>     Filename = !#dbf!/XXX/Sites/XXX/gemstone/data/extent0.dbf
>>>     Maximum size = NONE
>>>     File size = 10586 Mbytes = 677504 pages
>>>     Space available = 7927 Mbytes = 507353 pages
>>> 
>>>     Totals
>>>     ------
>>>     Repository Size = 10586 Mbytes = 677504 pages
>>>     Free Space = 7927 Mbytes = 507353 pages
>>>     ---------------------------------------------------
>>>     Extent 0 was not cleanly shutdown.
>>> 
>>> 
>>>     Repository startup statistics:
>>>         Pages Need Reclaiming =10
>>>         Free Oops=133758598
>>>         Oop Number High Water Mark=157752737
>>>         Possible Dead Objects=12405000
>>>         Dead Objects=0
>>>         Epoch Transaction Count=0
>>>         Epoch New Objects Union=0
>>>         Epoch Written Objects Union=0
>>>         Epoch DependencyMap Objects Union=0
>>> 
>>>     Repository startup is from checkpoint = (fileId 13, blockId 388232)
>>> 
>>>    SearchForMostRecentLog did not find any tranlogs
>>> 
>>>  :: (wildcard) found in listening addresses, ignoring other addresses
>>> created listening socket for ::, on :: port 43405
>>> 
>>> 
>>>     Opened a transaction log file for read_nolocks.
>>>        filename = /XXX/Sites/XXX/gemstone/data/tranlog13.dbf
>>> EOF while reading log record.
>>> EOF encountered while reading log record.    Unable to read log record 13.388232 for current checkpoint
>>>     Waiting for all tranlog writes to complete
>>> 
>>>     Stone startup has failed.
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Mariano
>>> http://marianopeck.wordpress.com
>>> _______________________________________________
>>> Glass mailing list
>>> Glass at lists.gemtalksystems.com
>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>> 
>> 
>> 
>> 
>> -- 
>> Mariano
>> http://marianopeck.wordpress.com
> 
> 
> 
> 
> -- 
> Mariano
> http://marianopeck.wordpress.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/glass/attachments/20140521/d399e4f7/attachment-0001.html>


More information about the Glass mailing list