[GemStone-Smalltalk] Starting a repository fails after shutdown of server.
James Foster via GemStone-Smalltalk
gemstone-smalltalk at lists.gemtalksystems.com
Mon Sep 25 11:37:06 PDT 2017
Hi Ezequiel,
From the initial stone log we see that a listening socket was created, the AIO pgsvrs were started, but the reconnect failed:
created listening socket for :: on :: port 39755
...
--- 09/21/17 11:54:24 -03 ---
Starting AF_INET reconnect to AIO pgsvrs
AioServerReconnect pgsvrPid 6590 old fd 12 new fd -1
reconnect failed(B) for AIO pgsvr pid 6590
--- 09/21/17 11:56:25 -03 ---
RDbfAioServersReconnect failed
Note the 121-second gap and the ‘-1’ value for the new fd (file descriptor or socket ID), indicating an error.
From the later page server log we see the process start date/time and the connection failure date/time:
| PROCESS ID: 7937 DATE: 09/25/2017 12:57:17 -03
…
--- 09/25/2017 12:59:18.753 -03 Connection failure detected:
Note again a 121-second gap.
As part of stone startup, the stone spawns additional processes and waits for them to connect back to itself. The system waits 120 seconds for the connection to complete and then reports a failure. So, the page server is unable to open a socket to the stone (that is, a networking problem).
Is this machine heavily overloaded, such that a new process would fail to run in two minutes? Have you every run GemStone successfully on this host? What sort of internal firewall does it have?
During the two-minute delay in the startup, it would be interesting to see if you can communicate with the stone using another process. From another shell try the following
$ gslist -cvl
If the shell doesn’t recognize gslist, then you don’t have $GEMSTONE/bin in your path. If it doesn’t find any servers, then you haven’t started the NetLDI process yet and it would be good to do so (if only to show that gslist is working!). If it finds the stone and reports that the stone is in ‘startup’ mode, then gslist has successfully communicated with the stone, suggesting that internal networking is not completely broken. If gslist recognizes that there is a stone, but can’t communicate with it, then we’ve confirmed an internal networking problem.
To further debug networking problems, export GEMSTONE_SOCKET_DEBUG=1 (as you did with GEMSTONE_KEEP_LOG; good job reading the manual!), and then provide full logs on the stone and on a failing page server process.
James Foster
> On Sep 25, 2017, at 10:59 AM, brianstone via GemStone-Smalltalk <gemstone-smalltalk at lists.gemtalksystems.com> wrote:
>
> Hi James,
>
> There was not a log for that PID.
> I was reading the SysAdminGuide and there says that the log is deleted
> automatically in case of normal exit.
> In this case is not a normal exit but anyway log files are deleted.
>
> So, I modified the file runpgsrvrmain to keep log files (uncommenting
> "export GEMSTONE_KEEP_LOG=1") and tried to start again the repository.
>
> Here is the results:
>
> *An extract from repository.log*
>
> Starting AF_INET reconnect to AIO pgsvrs
> Recovery took 0.028 seconds
> Waiting for Recovery Reader thread to stop
> AioServerReconnect pgsvrPid 7940 old fd 12 new fd -1
> reconnect failed(B) for AIO pgsvr pid 7940
>
> --- 09/25/17 12:59:18 -03 ---
> RDbfAioServersReconnect failed
>
> Terminating stone.
>
> *Log file for PID 7940 in file named "repository_7940pgsvraio.log"*
>
> _____________________________________________________________________________
> | GemStone Child Task
> |
> |
> |
> | VERSION: 3.2.8.1, Fri Aug 28 08:43:23 2015
> |
> | BUILD: gss64_3_2_x_branch-37291
> |
> | BUILT FOR: x86-64 (Linux)
> |
> | MODE: 64 bit
> |
> | RUNNING ON: 12-CPU SRVR23 x86_64 (Linux 2.6.32-642.15.1.el6.x86_64 #1 SMP
> Fri
> | Feb 24 14:31:22 UTC 2017) 32059MB
> |
> | PROCESS ID: 7940 DATE: 09/25/17 12:57:17 -03
> |
> | USER IDS: REAL=gemst643281 (1045) EFFECTIVE=gemst643281 (1045)
> LOGIN=gemst643281
> | (1045)
> |
> | COMMAND: /usr/local/gemstone643281/sys/runpgsvrmain TCP 13 90
> |
> |_____________________________________________________________________________|
> runpgsvr[Info]: Description of arguments:
> the hostname is: SRVR23
> GEMSTONE is: /usr/local/gemstone643281
> pgsvr arguments are: TCP 13 90
>
> _____________________________________________________________________________
> | GemStone/S64 Object-Oriented Data Management System
> |
> | Copyright (C) GemTalk Systems 1986-2015
> |
> | All rights reserved.
> |
> +-----------------------------------------------------------------------------+
> | PROGRAM: PGSVRSHR, GemStone Networked DBF I/O Service (shared library)
> |
> | VERSION: 3.2.8.1, Fri Aug 28 08:43:23 2015
> |
> | BUILD: gss64_3_2_x_branch-37291
> |
> | BUILT FOR: x86-64 (Linux)
> |
> | MODE: 64 bit
> |
> | RUNNING ON: 12-CPU SRVR23 x86_64 (Linux 2.6.32-642.15.1.el6.x86_64 #1 SMP
> Fri
> | Feb 24 14:31:22 UTC 2017) 32059MB
> |
> | PROCESS ID: 7940 DATE: 09/25/2017 12:57:17 -03
> |
> | USER IDS: REAL=gemst643281 (1045) EFFECTIVE=gemst643281 (1045)
> LOGIN=gemst643281
> | (1045)
> |
> |_____________________________________________________________________________|
>
>
> command line is:
> /usr/local/gemstone643281/sys/pgsvrmain TCP 13 90
>
> The hostname is SRVR23
> createNetConnection: SocketFamily_UNIX
> Network connection has been inherited.
> Entering Service Loop
> [Info]: ClientPid: 7928
> [Info]: Client SessionId: -2
> [Info]: Client Host: SRVR23
> [Info]: My cache slot: 3
> [Info]: My cache name: AioPgsvr3
> Write to /proc/7940/oom_score_adj failed with EACCES , linux user does not
> have CAP_SYS_RESOURCE
> No server process protection from OOM killer
> --- 09/25/2017 12:59:18.753 -03 Connection failure detected:
>
>
> --- 09/25/2017 12:59:18.753 -03 entering pgsShrExit
> mainThread Detaching cache.--- 09/25/2017 12:59:18.753 -03 [Info]:
> Detaching Shared Page Cache.
> --- 09/25/2017 12:59:18.791 -03 mainThread: pgsShrExit with status: 0
>
>
> *Additionally heres is the content of another file named
> "repository_7937pgsvrff.log"*
>
>
> _____________________________________________________________________________
> | GemStone Child Task
> |
> |
> |
> | VERSION: 3.2.8.1, Fri Aug 28 08:43:23 2015
> |
> | BUILD: gss64_3_2_x_branch-37291
> |
> | BUILT FOR: x86-64 (Linux)
> |
> | MODE: 64 bit
> |
> | RUNNING ON: 12-CPU SRVR23 x86_64 (Linux 2.6.32-642.15.1.el6.x86_64 #1 SMP
> Fri
> | Feb 24 14:31:22 UTC 2017) 32059MB
> |
> | PROCESS ID: 7937 DATE: 09/25/17 12:57:17 -03
> |
> | USER IDS: REAL=gemst643281 (1045) EFFECTIVE=gemst643281 (1045)
> LOGIN=gemst643281
> | (1045)
> |
> | COMMAND: /usr/local/gemstone643281/sys/runpgsvrmain
> Newstone~7663a27bab8c7a96
> | 0 1 -1 TCP 10 90
> |
> |_____________________________________________________________________________|
> runpgsvr[Info]: Description of arguments:
> the hostname is: SRVR23
> GEMSTONE is: /usr/local/gemstone643281
> pgsvr arguments are: Newstone~7663a27bab8c7a96 0 1 -1 TCP 10 90
>
> _____________________________________________________________________________
> | GemStone/S64 Object-Oriented Data Management System
> |
> | Copyright (C) GemTalk Systems 1986-2015
> |
> | All rights reserved.
> |
> +-----------------------------------------------------------------------------+
> | PROGRAM: PGSVRSHR, GemStone Networked DBF I/O Service (shared library)
> |
> | VERSION: 3.2.8.1, Fri Aug 28 08:43:23 2015
> |
> | BUILD: gss64_3_2_x_branch-37291
> |
> | BUILT FOR: x86-64 (Linux)
> |
> | MODE: 64 bit
> |
> | RUNNING ON: 12-CPU SRVR23 x86_64 (Linux 2.6.32-642.15.1.el6.x86_64 #1 SMP
> Fri
> | Feb 24 14:31:22 UTC 2017) 32059MB
> |
> | PROCESS ID: 7937 DATE: 09/25/2017 12:57:17 -03
> |
> | USER IDS: REAL=gemst643281 (1045) EFFECTIVE=gemst643281 (1045)
> LOGIN=gemst643281
> | (1045)
> |
> |_____________________________________________________________________________|
>
>
> command line is:
> /usr/local/gemstone643281/sys/pgsvrmain Newstone~7663a27bab8c7a96 0 1 -1 TCP
> 10 90
>
> The hostname is SRVR23
> Write to /proc/7937/oom_score_adj failed with EACCES , linux user does not
> have CAP_SYS_RESOURCE
> No server process protection from OOM killer
>
> Free Frame Page Server startup was successful.
> Target Free Frame Limit is 7000
> Entering Free List Service Loop.
> createNetConnection: SocketFamily_UNIX
> Network connection has been inherited.
> Entering Service Loop
> [Info]: ClientPid: 7928
> --- 09/25/2017 12:59:18.753 -03 Connection failure detected:
>
>
> --- 09/25/2017 12:59:18.753 -03 entering pgsShrExit
> mainThread Detaching cache.--- 09/25/2017 12:59:18.753 -03 [Info]:
> Detaching Shared Page Cache.
> --- 09/25/2017 12:59:18.753 -03 mainThread: pgsShrExit with status: 0
>
>
>
>
>
>
> --
> Sent from: http://forum.world.st/Gemstone-S-f1461796.html
> _______________________________________________
> GemStone-Smalltalk mailing list
> GemStone-Smalltalk at lists.gemtalksystems.com
> http://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/gemstone-smalltalk/attachments/20170925/805a143b/attachment-0001.html>
More information about the GemStone-Smalltalk
mailing list