[Glass] upgrade speed sanity check

Dale Henrichs dale.henrichs at gemtalksystems.com
Tue Sep 24 09:43:38 PDT 2019


On 9/23/19 6:21 PM, Paul DeBruicker via Glass wrote:
> Thanks Dale
>
> After the load script finished it showed
>
>
> ...finished baseline
> 	Migrated 0 instances for 125 classes.
>
>
>
>
> So it doesn't look like there was a massive number of migrations.  The SPC
> cache was only 1GB, so maybe that likely had something to do with the time
> taken.  I'll bump it to 2GB and the # of threads to 2 which I think is the
> limit of hte license.

That's a good guess ... 125 classes means that list instances should 
have only scanned the db once (GsDeployer scans for 2000 classes at a 
time ... with no instances to migrate, the migration process shouldn't 
have done any writes at all, so it seems that some other operation might 
the culprit for triggering the writes that you are seeing ...

Running statmon (edit the upgraeStone script to launch statmon after the 
stone is started before upgradeImage) might help pinpoint where the disk 
i/o is happening ... the upgrade scripts have quite a few timestamps and 
they can be used to correlate with high disk i/o ..

Dale

>
>
> Paul
>
>
>
> GLASS mailing list wrote
>> On 9/23/19 3:04 PM, Paul DeBruicker via Glass wrote:
>>> Thanks Dale,
>>>
>>> I did use the upgradeStone script.  That script finished relatively
>>> quickly
>>> (~10-15 mins IIRC).   Reloading my project code is what was taking a
>>> really
>>> long time.
>> Ahhh, okay .. then I suspect that you are getting hit by a bunch of
>> migrations while loading your application code ...
>>> Sorry I neglected to mention that in my prior email.   It
>>> finished about an hour ago and seems to have worked OK.
>> I saw the copy of the script that you used to upgrade your application
>> code and you _are_ using GsDeployer>>deploy:, so that is a good thing
>> ... presumably the migration time would have been worse if you hadn't
>> used the GsDeployer>>deploy:
>>
>> When the deploy block completes it should have written to the log the
>> number of classes and instances that were migrated .... and presumably
>> this was fairly high, given how long it took ... if you look at
>> MCPlatformSupport class>>listInstances:, you'll see that a single thread
>> is being used to do the migration ... if you have  a multi-core machine
>> you could bump up this number to match the number of cores you have and
>> do the list instances step in parallel ....
>>
>>> I did use a fresh install of GsDevKit_home for the upgrade but the 8GB
>>> stone
>>> has come from 2.4.4.1 thus far so could be wonky in interesting ways.
>> Ahhh, this will be a pretty old version fo GLASS and friends, so I can
>> imagine that you probably ended up with a lot of classes with shape
>> changes and they are scanned for instances whether or not there are any
>> ... listInstances is a read only operation so the fact that you ended up
>> with a large volume of bytes written to disk, indicates that you ended
>> up with a number of instance migrations ... I'm curious if you'd done an
>> MFC before doing the upgrade ... if not, you might have migrated a fair
>> amount of dead objects ...
>>
>>> In the upgradeLog directory topaz.out was the last edited file and its
>>> last
>>> line is "GsDevKit Upgrade completed (GLASS/GLASS1/GsDevKit/tODE). No
>>> errors
>>> detected".
>>>
>>>
>>> Looking in the logs for the stone the only thing that stands out is that
>>> the
>>> reclaim gc gem log for the session where I upgraded the project code is
>>> 241,400 lines long, ~21MB .  The last 10 of which are:
>>>
>>> --- 09/23/2019 13:17:55.601 PDT reclaim abort: 239093 viewCr 754 numDp 36
>>> deadNotRecl 0
>>> --- 09/23/2019 13:19:56.128 PDT reclaim abort: 239094 viewCr 754 numDp 36
>>> deadNotRecl 0
>>> --- 09/23/2019 13:21:56.650 PDT reclaim abort: 239095 viewCr 754 numDp 36
>>> deadNotRecl 0
>>> --- 09/23/2019 13:23:57.170 PDT reclaim abort: 239096 viewCr 754 numDp 36
>>> deadNotRecl 0
>>> --- 09/23/2019 13:25:57.688 PDT reclaim abort: 239097 viewCr 754 numDp 36
>>> deadNotRecl 0
>>> --- 09/23/2019 13:27:58.232 PDT reclaim abort: 239098 viewCr 754 numDp 36
>>> deadNotRecl 0
>>> --- 09/23/2019 13:29:58.788 PDT reclaim abort: 239099 viewCr 754 numDp 36
>>> deadNotRecl 0
>>> --- 09/23/2019 13:31:59.339 PDT reclaim abort: 239100 viewCr 754 numDp 36
>>> deadNotRecl 0
>>> --- 09/23/2019 13:33:59.888 PDT reclaim abort: 239101 viewCr 754 numDp 36
>>> deadNotRecl 0
>>> --- 09/23/2019 13:36:00.435 PDT reclaim abort: 239102 viewCr 754 numDp 36
>>> deadNotRecl 0
>>>
>>>
>>> The reclaimgcgem.log for the session the other day  where I ran the
>>> upgradeStone command is 112,397 lines long (~10MB).
>>>
>>> --- 09/20/2019 16:58:03.711 PDT reclaim abort: 112234 viewCr 12443 numDp
>>> 31
>>> deadNotRecl 0
>>> --- 09/20/2019 17:02:47.350 PDT reclaim abort: 112235 viewCr 12443 numDp
>>> 31
>>> deadNotRecl 0
>>> --- 09/20/2019 17:04:47.412 PDT reclaim abort: 112236 viewCr 12443 numDp
>>> 31
>>> deadNotRecl 0
>>> --- 09/20/2019 17:06:47.743 PDT reclaim abort: 112237 viewCr 12443 numDp
>>> 31
>>> deadNotRecl 0
>>> --- 09/20/2019 17:08:48.056 PDT reclaim abort: 112238 viewCr 12443 numDp
>>> 31
>>> deadNotRecl 0
>>> --- 09/20/2019 17:10:48.351 PDT reclaim abort: 112239 viewCr 12443 numDp
>>> 31
>>> deadNotRecl 0
>>> --- 09/20/2019 17:12:48.808 PDT reclaim abort: 112240 viewCr 12443 numDp
>>> 31
>>> deadNotRecl 0
>>> --- 09/20/2019 17:14:49.282 PDT reclaim abort: 112241 viewCr 12443 numDp
>>> 31
>>> deadNotRecl 0
>>> --- 09/20/2019 17:15:33.616 PDT Logging out
>>> --- 09/20/2019 17:15:37.437 PDT Exiting
>>>
>>>
>>> I'm happy to send you whatever else you want that might help.
>> I assume that since it did finish and appears to be in good shape, I'm
>> not sure there is too much left to be done ...
>>
>> In general if you have a tight upgrade window for a production stone, it
>> is prudent to run the upgrade on a backup of production on another
>> machine to work out these kinds of issues ahead of time ... running a
>> statmon during these test runs would also supply information that would
>> tell you whether or not you are disk i/o bound or not ... bumping up the
>> number of threads used for list instances is also a good idea if that's
>> where you're spending time ...
>>
>> Dale
>>
>>> Paul
>>>
>>>
>>>
>>> GLASS mailing list wrote
>>>> On 9/23/19 12:36 PM, PAUL DEBRUICKER via Glass wrote:
>>>>> Hi -
>>>>>
>>>>> I'm trying to upgrade a 8 GB and 30GB stone from 3.3.1 to 3.5 on a mac
>>>>> first (while troubleshooting my upgrade process) then possibly on
>>>>> ubuntu
>>>>> 18.04 if need be.  Both machines have NVMe SSD's with plenty of room
>>>>> and
>>>>> with plenty of RAM and CPU.
>>>>>
>>>>> Right now the extent and tranlogs are both on the same device, in the
>>>>> default GsDevKit_home structure.  On the mac I could move the tranlogs
>>>>> to
>>>>> an SDHC hard, and on the Ubuntu machine I could move them to a SATA
>>>>> drive.
>>>>>
>>>>> While upgrading the 8GB stone I'm seeing speeds inline with the drives
>>>>> published specs of around 250-400MB/s i/o speeds on the Mac.  And it
>>>>> has
>>>>> read 1.35TB of data and written 105GB of data in about 40 minutes.
>>>>>
>>>>> Right now I assume things are fine and going smoothly but because it
>>>>> might be taking longer than it should I wanted to check:
>>>>>
>>>>> 1. Should it take more than 40 minutes to upgrade a 8GB stone?
>>>>> 2. Should the 30GB stone (that has different objects/code/etc) take ~4x
>>>>> as long?
>>>>> 3. Is there any way to tell ahead of time how long an upgrade will
>>>>> take?
>>>> No ... which should answer questions 1 and 2 ... however, it does seem
>>>> that it is taking a long time ...
>>>>
>>>> What scripts are you using to do the upgrade? I'll assume that you are
>>>> using $GS_HOME/bin/upgradeStone to do the upgrade ... if you are running
>>>> different scripts that would be useful to know ...
>>>>
>>>> Assuming upgradeStone, there should be an upgradeLog directory in the
>>>> directory of the stone that you are upgrading to ... in that directory
>>>> there should be a bunch of log files in that directory and an `ls -l`
>>>> will give us a clue of where you are in the upgrade process.
>>>>
>>>> Here's an example directory from and upgrade of a 500M 3.4.3 extent to
>>>> 3.5.0:
>>>>
>>>>       rogue:upgradeLog>ls -altr
>>>>       total 11456
>>>>       drwxr-xr-x 11 dhenrich smalltalk     4096 Sep 23 13:23 ..
>>>>       -rw-r--r--  1 dhenrich smalltalk        0 Sep 23 13:23
>>>> topazerrors.log
>>>>       -rw-r--r--  1 dhenrich smalltalk 11286737 Sep 23 13:24
>>>> upgradeImage.out
>>>>       -rw-r--r--  1 dhenrich smalltalk       33 Sep 23 13:24
>>>> conversion.conf
>>>>       -rw-r--r--  1 dhenrich smalltalk     1254 Sep 23 13:24
>>>> installGsDevKit_upgrade.topaz
>>>>       -rw-r--r--  1 dhenrich smalltalk      253 Sep 23 13:24
>>>> createGsDevKit_upgrade.topaz
>>>>       -rw-r--r--  1 dhenrich smalltalk     3696 Sep 23 13:24
>>>> createGsDevKit_upgrade.out
>>>>       -rw-r--r--  1 dhenrich smalltalk     5527 Sep 23 13:24
>>>> installGsDevKit_upgrade_topaz.log
>>>>       -rw-r--r--  1 dhenrich smalltalk   112384 Sep 23 13:24
>>>> installGsDevKit_upgrade.out
>>>>       -rw-r--r--  1 dhenrich smalltalk      533 Sep 23 13:24
>>>> prepareGsDevKitImage_pragma_user.topaz
>>>>       -rw-r--r--  1 dhenrich smalltalk     6942 Sep 23 13:24
>>>> prepareGsDevKitImage_pragma_user.log
>>>>       -rw-r--r--  1 dhenrich smalltalk      462 Sep 23 13:24
>>>> prepareImage_pragma_systemuser.topaz
>>>>       -rw-r--r--  1 dhenrich smalltalk     6874 Sep 23 13:24
>>>> prepareImage_pragma_systemuser_topaz.log
>>>>       -rw-r--r--  1 dhenrich smalltalk      551 Sep 23 13:24
>>>> prepareGsDevKitImage.topaz
>>>>       -rw-r--r--  1 dhenrich smalltalk   118768 Sep 23 13:25
>>>> prepareGsDevKitImage.log
>>>>       -rw-r--r--  1 dhenrich smalltalk      525 Sep 23 13:25
>>>> cleanupGsDevKitImage.topaz
>>>>       drwxr-xr-x  2 dhenrich smalltalk     4096 Sep 23 13:25 .
>>>>       -rw-r--r--  1 dhenrich smalltalk   118379 Sep 23 13:25
>>>> upgradeSeasideImage.out
>>>>       -rw-r--r--  1 dhenrich smalltalk     6436 Sep 23 13:25
>>>> cleanupGsDevKitImage.log
>>>>       -rw-r--r--  1 dhenrich smalltalk    13452 Sep 23 13:25 topaz.out
>>>>
>>>> sending me the newest log file and the`ls -l` list should be enough
>>>> information for me to figure out where you are in the upgrade process
>>>> and we can go from there ...
>>>>
>>>>> 4. Are there any optimizations or a pre-upgrade checklist I should look
>>>>> or go through at that isn't covered in the install guide?
>>>> no .... I'm guessing that you are getting hit by "excessive migrations",
>>>> and that you _might_ be using an older version of GsDevKit_home (the SHA
>>>> of $GS_HOME commit should answer that question ... I can also tell by
>>>> the list of files in the upgradeLog directory)... the "modern" upgrade
>>>> from 3.3.1 to 3.5.0 should be avoiding most if not all of the migrations
>>>> that would cause long upgrade times, but that will actually depend upon
>>>> whether or not you are running with a "modern" version of GLASS/TODE in
>>>> your 3.3.1 stone or not (the SHAs of the various projects you have
>>>> loaded in your image will help figure that out) ... let's start by
>>>> looking at the log files and go from there ...
>>>>> 5. Would moving the tranlogs to a separate disk prior to upgrading help
>>>>> substantially?
>>>> Maybe, but first we want to determine what's happening ... if the
>>>> migrations cannot be avoided, it might make sense to arrange for the
>>>> repository scans to use multi-threaded list instances, before resorting
>>>> to movint tranlogs around, but I first want to see if you are getting
>>>> unnecessary upgrades.
>>>>
>>>>
>>>> Dale
>>>>
>>>>
>>>> _______________________________________________
>>>> Glass mailing list
>>>> Glass at .gemtalksystems
>>>> https://lists.gemtalksystems.com/mailman/listinfo/glass
>>>
>>>
>>>
>>> --
>>> Sent from: http://forum.world.st/GLASS-f1460844.html
>>> _______________________________________________
>>> Glass mailing list
>>>
>> Glass at .gemtalksystems
>>> https://lists.gemtalksystems.com/mailman/listinfo/glass
>> _______________________________________________
>> Glass mailing list
>> Glass at .gemtalksystems
>> https://lists.gemtalksystems.com/mailman/listinfo/glass
>
>
>
>
> --
> Sent from: http://forum.world.st/GLASS-f1460844.html
> _______________________________________________
> Glass mailing list
> Glass at lists.gemtalksystems.com
> https://lists.gemtalksystems.com/mailman/listinfo/glass


More information about the Glass mailing list