[Glass] Experiences migrating a Gemstone/S database

Fri Dec 20 05:55:51 PST 2019

Marten,

Thank you very much for describing your experience. At each step it seems to me like you made good choices and exhibited a strong understanding of how GemStone works and can be optimized. I appreciate that you arrived at something that is adequate but recognize that more could be done if necessary.

James

> On Dec 20, 2019, at 3:56 AM, Marten Feldtmann via Glass <glass at lists.gemtalksystems.com> wrote:
> 
> Migration und Updating Databases
> 
> I would like to post some experiences I had while migrating a
> Gemstone database. 
> 
> The customer database - my tests are based on - has a size
> of about 420GB. The database has been copied to our reference
> system - an old Thinkpad W520 (I7 based) with 16GB of RAM 
> and ONE SSD and the tests were done on this machine. The stone 
> is working with 8GM shared cache page.
> 
> Between v70 and v71 of our product there were several 
> changes to the domain model we were developing. The model 
> is defined by 197 domain classes. 
> 
> In v71 39 of these classes have been changed and theses changes
> are the reason for 119.000.000 objects to be migrated. One class
> had 66.000.000 instances, another one 49.000.000 instances and
> the other classes have around 4.000.000 instances.
> 
> *** The origial way ***
> 
> The old traditional way had been written in the early state
> of this product, where databases were not that big and migration
> speed was not that critical.
> 
> It worked more or less the following way (shame on me):
> 
> a) Scan the repository for ONE (!) changed class
> 
> b) For each instance do a migration and on demand (no memory)
>   make a commit.
> 
> This was ok in the past. I could start update process on saturday
> and finish the update on sunday remote.
> 
> Now the database became too large and this way of updating the
> database would take from Thursday, 11:00 to Monday afternoon (so 
> more or less 4 days !)
> 
> *** Repository Scanning ***
> 
> The next evolution in this topic had been done:
> 
> a) now ONE repository scan (FOR ALL changed classes) is done - using 
> fastAllInstances and GsBitmap instances.
> 
> b) For each instance do a migration and on demand make a commit.
> 
> With this step the multiple scanning of the repository has been removed
> and the largest time is now the base migration code execution. But
> for 119 millions objects this still takes much time. I did not make a full 
> test but an initial test over some hours suggested, that this would take
> around 2 days.
> 
> *** Indices ***
> 
> More than satisfied with the benefits of ONE scan, I had to look to the migration
> code. The base migration code was generated by our code generator and I did
> not want to change that (because it is general and would cover all model versions),
> but actually knowing the specific model I want to migrate from, would cut the
> to be executed code to 1/4 of the originial code. So here would be possibilities
> for enhancements.
> 
> So, what about starting multiple processes, do the step (b) in parallel ? I stored 
> the GsBitmap in page order on the disc and that file became around 600 MByte of data.
> 
> And wrote processes to do the migration in parallel based on that GsBitmap file 
> ... and it did not work. 
> Over and over commit conflicts. No way to go ... speed was pretty bad. 
> 
> Actually only one process was running more or less without problems - the other 
> processes sometime did a little work, but most of the time they did an abort transaction.
> 
> So, somehow these conflicts were based on. As a first step I decided to remove
> ALL indices used in the database. I had luck, that this application had an execution
> path to find all used indices to remove them, to build them etc. 
> 
> 
> That script to remove all indices were started before migration (and it took at 
> least 1-2 hours).
> 
> Then I started the parallel migration code and now the stuff was working. The I7 
> had 8 execution threads and I started 8 of these processes and they work without 
> problems. The topaz script were started with "-t 500000" and that fit very well to
> the machine above. 100% usage of the available RAM und minimal swapping.
> 
> The code itself had a sliding transaction size (from 1 to max. of 20000 objects between 
> each commit. This limit is adapted according to conflicts/successes) - but the logs showed, 
> that the processes are working with the upper value of 20000 each commit.
> 
> 
> So to summarize:
> 
> a) Scanning the objects with fastAllInstances in ONE scan (1-2 hours)
> b) Removing the indices (1-2) hours
> c) Starting the migration code in 8 tasks (8 hours)
> d) Scanning the objects with fastAllInstances in ONE scan (1-2 hours) - to reassure
> e) Clean the history
> f) Building the indices (3) hours
> 
> So, now I am at 17 hours and that is ok. I think, that (b) and (f) could also be 
> done in parallel execution mode.
> 
> *** Workload ***
> 
> So removing indices in concurrent tasks leads to very strange exception errors, so 
> I gave that up. 
> 
> Creating indices in concurrent tasks work - so the 3h above can be reduced to 40 
> minutes and the overall time is now 15 hours.
> 
> *** Equal Workload up to the end ***
> 
> The next point shown up in this work was, that the work of creating indices vary 
> very much and so some task have much more to do than others ... and the
> parallel workidea is not done up to the end. (creating indices task: 37 minutes (longest) 
> against 11 minutes (fastest)). So rearranging this work could still improve the
> time needed to create the indices.
> 
> 
> Marten
> _______________________________________________
> Glass mailing list
> Glass at lists.gemtalksystems.com
> https://lists.gemtalksystems.com/mailman/listinfo/glass
>