[GemStone-Smalltalk] GbsGsErrStnNetLost every two hours?
Richard Sargent
richard.sargent at gemtalksystems.com
Wed Nov 17 10:41:41 PST 2021
On Wed, Nov 17, 2021 at 10:40 AM David Shaffer via GemStone-Smalltalk <
gemstone-smalltalk at lists.gemtalksystems.com> wrote:
> Following-up in case future GemStoners go this direction. The
> disconnection problem is inherent to docker swarms virtual IP system used
> in its routing meshes. The assumption is that long-running connections
> result from bugs or DoS attempts so they are cleared periodically (every 2
> hours by default…seems silly…what DoS would be mitigated by something that
> takes 2 hrs to detect a problem?). To solve the problem launch GemStone in
> such a way that it isn’t part of the swarm routing mesh (the option is
> “dnsrr” stands for DNS round robin but I’m not suggesting taking advantage
> of this…only one GemStone server = 1 DNS entry). If you still want to
> publish your GemStone port so that it can be accessed outside the swarm,
> you must use “host” mode (the port is published on only the host that is
> running GemStone, rather than all hosts in the swarm). I’ll paste an
> example docker compose/swarm YAML file after my sig. I’ve been running for
> 12 hours now without problems.
>
I'm glad you resolved this, but especially pleased with your sharing the
solution!
> David
>
> services:
> gemstone:
> image: <your-gemstone-image>
> ports:
> - target: 40055
> published: <port that non-swarm participants should use>
> mode: host # published port only available on /this/ host
> deploy:
> replicas: 1
> endpoint_mode: dnsrr # use DNS Round Robin to avoid VIP
> disconnects every 2 hours
> volumes:
> - type: volume
> source: gemstone-data
> target: /gemstone-data
> - type: volume
> source: gemstone-backup
> target: /gemstone-backup
> - type: volume
> source: gemstone-log
> target: /gemstone-log
> shm_size: '1gb'
>
>
> On Nov 14, 2021, at 7:24 PM, David Shaffer <shaffer at SHAFFER-CONSULTING.COM>
> wrote:
>
> I just tried updating one ivar of an object every commit (a timestamp
> stored in one of my “root” objects) so now there should be some traffic
> with every commit (wouldn’t there be traffic even if my commits didn’t have
> data to push? At the very least VW would need to sync with the gem to get
> the list of new modified objects?). Anyway, no dice, still dies every 2
> hours.
>
> I am knee deep in google hits right now…I’ll post back if anything pans
> out.
>
> -D
>
> On Nov 14, 2021, at 1:53 PM, James Foster <smalltalk at jgfoster.net> wrote:
>
> David,
>
> The error “socket read EOF” indicates that the Gem attempted to read from
> a socket and received an EOF response.
>
> Given that the client and the server are in different Docker containers,
> they are effectively on separate hosts and there is a strong indication
> that the socket closed between them. Given the timing duration and
> consistency, my first guess is that a socket is being closed due to
> inactivity. If you added a commit every minute (say, the last time through
> the loop), would that change the behavior?
>
> James
>
> On Nov 14, 2021, at 10:40 AM, David Shaffer <
> shaffer at shaffer-consulting.com> wrote:
>
> The host is an AWS EC2 instance running Ubuntu (20.04.2) running on AWS
> (kernel 5.11.0-1020-aws) with Docker (version 20.10.7, build
> 20.10.7-0ubuntu5~20.04.2). Gemstone and the VisualWorks client are running
> in separate Ubuntu containers (“latest” on Dockerhub which is
> 20.04/focal). Docker is running in “swarm mode” on this host and both
> client and server are swarm services.
>
> The sleep is 1 second and there is not always work to do. In fact, most
> loop iterations complete without committing any changes.
>
> I’m currently pursuing some sketchy syslog entries on the EC2 host that
> seem to correspond to the network errors. I’ll share them as soon as I’ve
> pruned them down a bit. This same setup ran for 6 years (this EC2
> instance for 2 years, my transition to swarm mode was about 1 year ago) with
> GOODS as backend without network-related errors, though.
>
> -D
>
> On Nov 14, 2021, at 1:29 PM, James Foster <smalltalk at jgfoster.net> wrote:
>
> David,
>
> Tell us a bit more about your configuration. Are you running Windows,
> macOS, or Linux? Is the client inside the Docker container (you mentioned
> the “entire system”)? How long is the sleep? Is there always work to do?
>
> James
>
>
> On Nov 14, 2021, at 10:22 AM, David Shaffer via GemStone-Smalltalk <
> gemstone-smalltalk at lists.gemtalksystems.com> wrote:
>
> Hey folks:
>
> I’ve (finally) deployed a server using GemStone 3.6.2, GemBuilder 8.5 and
> VW 9.0. My server’s main loop is essentially:
>
> Abort
> Do work
> Commit
> Sleep
>
> Every two hours (I’m not sure it is exactly two hours but it seems pretty
> reliable), I get the following during the abort call:
>
> GS Server Error - GbsGsErrStnNetLost - The session has lost its connection
> to the Stone Repository monitor.
>
>
> The entire system runs on a single host in Docker so it can’t possibly be
> a network hiccup. The gemnetobject logs are pasted below…they make it seem
> like a network error but, again, this is highly unlikely. Has anyone else
> run into this or have any troubleshooting advice?
>
> -David
>
> --- 11/13/21 22:56:18.959 UTC Login
> [Info]: Gave this process preference for OOM killer: wrote to
> /proc/460/oom_score_adj value 250
> [Info]: User ID: Trader
> [Info]: Repository: gs64stone
> [Info]: Session ID: 5 login at 11/13/21 22:56:18.964 UTC
> [Info]: GCI Client Host:
> [Info]: Page server PID: -1
> [Info]: using libicu version 58.2
> -----------------------------------------------------
> GemStone: Error Fatal
> Network error - text follows:
> , socket read EOF
> Error Category: 231169 [GemStone] Number: 4137 Arg Count: 1 Context : 20
> exception : 20
> Arg 1: 20
> --- 11/13/21 23:03:16.322 UTC Logging out
>
>
> *****************************************************
> ****** Abnormal Shutdown at 11/13/21 23:03:16.824 UTC
> *****************************************************
> -----------------------------------------------------
> GemStone: Error Fatal
> Network error - text follows:
> , socket read EOF
> Error Category: 231169 [GemStone] Number: 4137 Arg Count: 1 Context : 20
> exception : 20
> Arg 1: 20
>
> _______________________________________________
> GemStone-Smalltalk mailing list
> GemStone-Smalltalk at lists.gemtalksystems.com
> https://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk
>
>
>
>
>
>
> _______________________________________________
> GemStone-Smalltalk mailing list
> GemStone-Smalltalk at lists.gemtalksystems.com
> https://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/archives/gemstone-smalltalk/attachments/20211117/e1bad910/attachment-0001.htm>
More information about the GemStone-Smalltalk
mailing list