[Glass] Experiences migrating a Gemstone/S database

Fri Dec 20 09:45:50 PST 2019

Great article Marten, thank you for posting.  This would make a great 
experience report to present at ESUG.

Norm

On 12/20/2019 3:56 AM, Marten Feldtmann via Glass wrote:
> Migration und Updating Databases
>
> I would like to post some experiences I had while migrating a
> Gemstone database.
>
> The customer database - my tests are based on - has a size
> of about 420GB. The database has been copied to our reference
> system - an old Thinkpad W520 (I7 based) with 16GB of RAM
> and ONE SSD and the tests were done on this machine. The stone
> is working with 8GM shared cache page.
>
> Between v70 and v71 of our product there were several
> changes to the domain model we were developing. The model
> is defined by 197 domain classes.
>
> In v71 39 of these classes have been changed and theses changes
> are the reason for 119.000.000 objects to be migrated. One class
> had 66.000.000 instances, another one 49.000.000 instances and
> the other classes have around 4.000.000 instances.
>
> *** The origial way ***
>
> The old traditional way had been written in the early state
> of this product, where databases were not that big and migration
> speed was not that critical.
>
> It worked more or less the following way (shame on me):
>
> a) Scan the repository for ONE (!) changed class
>
> b) For each instance do a migration and on demand (no memory)
>     make a commit.
>
> This was ok in the past. I could start update process on saturday
> and finish the update on sunday remote.
>
> Now the database became too large and this way of updating the
> database would take from Thursday, 11:00 to Monday afternoon (so
> more or less 4 days !)
>
> *** Repository Scanning ***
>
> The next evolution in this topic had been done:
>
> a) now ONE repository scan (FOR ALL changed classes) is done - using
> fastAllInstances and GsBitmap instances.
>
> b) For each instance do a migration and on demand make a commit.
>
> With this step the multiple scanning of the repository has been removed
> and the largest time is now the base migration code execution. But
> for 119 millions objects this still takes much time. I did not make a full
> test but an initial test over some hours suggested, that this would take
> around 2 days.
>
> *** Indices ***
>
> More than satisfied with the benefits of ONE scan, I had to look to the migration
> code. The base migration code was generated by our code generator and I did
> not want to change that (because it is general and would cover all model versions),
> but actually knowing the specific model I want to migrate from, would cut the
> to be executed code to 1/4 of the originial code. So here would be possibilities
> for enhancements.
>
> So, what about starting multiple processes, do the step (b) in parallel ? I stored
> the GsBitmap in page order on the disc and that file became around 600 MByte of data.
>
> And wrote processes to do the migration in parallel based on that GsBitmap file
> ... and it did not work.
> Over and over commit conflicts. No way to go ... speed was pretty bad.
>
> Actually only one process was running more or less without problems - the other
> processes sometime did a little work, but most of the time they did an abort transaction.
>
> So, somehow these conflicts were based on. As a first step I decided to remove
> ALL indices used in the database. I had luck, that this application had an execution
> path to find all used indices to remove them, to build them etc.
>
>
> That script to remove all indices were started before migration (and it took at
> least 1-2 hours).
>
> Then I started the parallel migration code and now the stuff was working. The I7
> had 8 execution threads and I started 8 of these processes and they work without
> problems. The topaz script were started with "-t 500000" and that fit very well to
> the machine above. 100% usage of the available RAM und minimal swapping.
>
> The code itself had a sliding transaction size (from 1 to max. of 20000 objects between
> each commit. This limit is adapted according to conflicts/successes) - but the logs showed,
> that the processes are working with the upper value of 20000 each commit.
>
>
> So to summarize:
>   
> a) Scanning the objects with fastAllInstances in ONE scan (1-2 hours)
> b) Removing the indices (1-2) hours
> c) Starting the migration code in 8 tasks (8 hours)
> d) Scanning the objects with fastAllInstances in ONE scan (1-2 hours) - to reassure
> e) Clean the history
> f) Building the indices (3) hours
>
> So, now I am at 17 hours and that is ok. I think, that (b) and (f) could also be
> done in parallel execution mode.
>
> *** Workload ***
>
> So removing indices in concurrent tasks leads to very strange exception errors, so
> I gave that up.
>
> Creating indices in concurrent tasks work - so the 3h above can be reduced to 40
> minutes and the overall time is now 15 hours.
>
> *** Equal Workload up to the end ***
>
> The next point shown up in this work was, that the work of creating indices vary
> very much and so some task have much more to do than others ... and the
> parallel workidea is not done up to the end. (creating indices task: 37 minutes (longest)
> against 11 minutes (fastest)). So rearranging this work could still improve the
> time needed to create the indices.
>
>
> Marten
> _______________________________________________
> Glass mailing list
> Glass at lists.gemtalksystems.com
> https://lists.gemtalksystems.com/mailman/listinfo/glass