[GemStone-Smalltalk] GbsGsErrStnNetLost every two hours?

Richard Sargent richard.sargent at gemtalksystems.com
Wed Nov 17 10:41:41 PST 2021


On Wed, Nov 17, 2021 at 10:40 AM David Shaffer via GemStone-Smalltalk <
gemstone-smalltalk at lists.gemtalksystems.com> wrote:

> Following-up in case future GemStoners go this direction.  The
> disconnection problem is inherent to docker swarms virtual IP system used
> in its routing meshes.  The assumption is that long-running connections
> result from bugs or DoS attempts so they are cleared periodically (every 2
> hours by default…seems silly…what DoS would be mitigated by something that
> takes 2 hrs to detect a problem?).  To solve the problem launch GemStone in
> such a way that it isn’t part of the swarm routing mesh (the option is
> “dnsrr” stands for DNS round robin but I’m not suggesting taking advantage
> of this…only one GemStone server = 1 DNS entry).  If you still want to
> publish your GemStone port so that it can be accessed outside the swarm,
> you must use “host” mode (the port is published on only the host that is
> running GemStone, rather than all hosts in the swarm).  I’ll paste an
> example docker compose/swarm YAML file after my sig.  I’ve been running for
> 12 hours now without problems.
>

I'm glad you resolved this, but especially pleased with your sharing the
solution!



> David
>
> services:
>   gemstone:
>     image: <your-gemstone-image>
>     ports:
>       - target: 40055
>         published: <port that non-swarm participants should use>
>         mode: host    # published port only available on /this/ host
>     deploy:
>       replicas: 1
>       endpoint_mode: dnsrr    # use DNS Round Robin to avoid VIP
> disconnects every 2 hours
>     volumes:
>       - type: volume
>         source: gemstone-data
>         target: /gemstone-data
>       - type: volume
>         source: gemstone-backup
>         target: /gemstone-backup
>       - type: volume
>         source: gemstone-log
>         target: /gemstone-log
>     shm_size: '1gb'
>
>
> On Nov 14, 2021, at 7:24 PM, David Shaffer <shaffer at SHAFFER-CONSULTING.COM>
> wrote:
>
> I just tried updating one ivar of an object every commit (a timestamp
> stored in one of my “root” objects) so now there should be some traffic
> with every commit (wouldn’t there be traffic even if my commits didn’t have
> data to push?  At the very least VW would need to sync with the gem to get
> the list of new modified objects?).  Anyway, no dice, still dies every 2
> hours.
>
> I am knee deep in google hits right now…I’ll post back if anything pans
> out.
>
> -D
>
> On Nov 14, 2021, at 1:53 PM, James Foster <smalltalk at jgfoster.net> wrote:
>
> David,
>
> The error “socket read EOF” indicates that the Gem attempted to read from
> a socket and received an EOF response.
>
> Given that the client and the server are in different Docker containers,
> they are effectively on separate hosts and there is a strong indication
> that the socket closed between them. Given the timing duration and
> consistency, my first guess is that a socket is being closed due to
> inactivity. If you added a commit every minute (say, the last time through
> the loop), would that change the behavior?
>
> James
>
> On Nov 14, 2021, at 10:40 AM, David Shaffer <
> shaffer at shaffer-consulting.com> wrote:
>
> The host is an AWS EC2 instance running Ubuntu (20.04.2) running on AWS
> (kernel 5.11.0-1020-aws) with Docker (version 20.10.7, build
> 20.10.7-0ubuntu5~20.04.2).  Gemstone and the VisualWorks client are running
> in separate Ubuntu containers (“latest” on Dockerhub which is
> 20.04/focal).  Docker is running in “swarm mode” on this host and both
> client and server are swarm services.
>
> The sleep is 1 second and there is not always work to do.  In fact, most
> loop iterations complete without committing any changes.
>
> I’m currently pursuing some sketchy syslog entries on the EC2 host that
> seem to correspond to the network errors.  I’ll share them as soon as I’ve
> pruned them down a bit.  This same setup ran for 6 years (this EC2
> instance for 2 years, my transition to swarm mode was about 1 year ago) with
> GOODS as backend without network-related errors, though.
>
> -D
>
> On Nov 14, 2021, at 1:29 PM, James Foster <smalltalk at jgfoster.net> wrote:
>
> David,
>
> Tell us a bit more about your configuration. Are you running Windows,
> macOS, or Linux? Is the client inside the Docker container (you mentioned
> the “entire system”)? How long is the sleep? Is there always work to do?
>
> James
>
>
> On Nov 14, 2021, at 10:22 AM, David Shaffer via GemStone-Smalltalk <
> gemstone-smalltalk at lists.gemtalksystems.com> wrote:
>
> Hey folks:
>
> I’ve (finally) deployed a server using GemStone 3.6.2, GemBuilder 8.5 and
> VW 9.0.  My server’s main loop is essentially:
>
> Abort
> Do work
> Commit
> Sleep
>
> Every two hours (I’m not sure it is exactly two hours but it seems pretty
> reliable), I get the following during the abort call:
>
> GS Server Error - GbsGsErrStnNetLost - The session has lost its connection
> to the Stone Repository monitor.
>
>
> The entire system runs on a single host in Docker so it can’t possibly be
> a network hiccup.  The gemnetobject logs are pasted below…they make it seem
> like a network error but, again, this is highly unlikely.  Has anyone else
> run into this or have any troubleshooting advice?
>
> -David
>
> --- 11/13/21 22:56:18.959 UTC Login
> [Info]: Gave this process preference for OOM killer: wrote to
> /proc/460/oom_score_adj value 250
> [Info]: User ID: Trader
> [Info]: Repository: gs64stone
> [Info]: Session ID: 5 login at 11/13/21 22:56:18.964 UTC
> [Info]: GCI Client Host:
> [Info]: Page server PID: -1
> [Info]: using libicu version 58.2
> -----------------------------------------------------
> GemStone: Error         Fatal
> Network error - text follows:
> , socket read EOF
> Error Category: 231169 [GemStone] Number: 4137  Arg Count: 1 Context : 20
> exception : 20
> Arg 1:   20
> --- 11/13/21 23:03:16.322 UTC Logging out
>
>
> *****************************************************
> ****** Abnormal Shutdown at 11/13/21 23:03:16.824 UTC
> *****************************************************
> -----------------------------------------------------
> GemStone: Error         Fatal
> Network error - text follows:
> , socket read EOF
> Error Category: 231169 [GemStone] Number: 4137  Arg Count: 1 Context : 20
> exception : 20
> Arg 1:   20
>
> _______________________________________________
> GemStone-Smalltalk mailing list
> GemStone-Smalltalk at lists.gemtalksystems.com
> https://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk
>
>
>
>
>
>
> _______________________________________________
> GemStone-Smalltalk mailing list
> GemStone-Smalltalk at lists.gemtalksystems.com
> https://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/archives/gemstone-smalltalk/attachments/20211117/e1bad910/attachment-0001.htm>


More information about the GemStone-Smalltalk mailing list