[Glass] Experiences migrating a Gemstone/S database

Fri Dec 20 06:36:12 PST 2019

Thanks for sharing your story ... impressive improvements .... 

I see that you had “very strange exception errors” while attempted to concurrently remove indexes ... could you share more details about the errors you hit? In theory you should be able to do concurrent index removal without errors, so perhaps you ran into some fixable bugs?

Dale

Sent from my iPhone

> On Dec 20, 2019, at 4:01 AM, Marten Feldtmann via Glass <glass at lists.gemtalksystems.com> wrote:
> 
> Migration und Updating Databases
> 
> I would like to post some experiences I had while migrating a
> Gemstone database. 
> 
> The customer database - my tests are based on - has a size
> of about 420GB. The database has been copied to our reference
> system - an old Thinkpad W520 (I7 based) with 16GB of RAM 
> and ONE SSD and the tests were done on this machine. The stone 
> is working with 8GM shared cache page.
> 
> Between v70 and v71 of our product there were several 
> changes to the domain model we were developing. The model 
> is defined by 197 domain classes. 
> 
> In v71 39 of these classes have been changed and theses changes
> are the reason for 119.000.000 objects to be migrated. One class
> had 66.000.000 instances, another one 49.000.000 instances and
> the other classes have around 4.000.000 instances.
> 
> *** The origial way ***
> 
> The old traditional way had been written in the early state
> of this product, where databases were not that big and migration
> speed was not that critical.
> 
> It worked more or less the following way (shame on me):
> 
> a) Scan the repository for ONE (!) changed class
> 
> b) For each instance do a migration and on demand (no memory)
>   make a commit.
> 
> This was ok in the past. I could start update process on saturday
> and finish the update on sunday remote.
> 
> Now the database became too large and this way of updating the
> database would take from Thursday, 11:00 to Monday afternoon (so 
> more or less 4 days !)
> 
> *** Repository Scanning ***
> 
> The next evolution in this topic had been done:
> 
> a) now ONE repository scan (FOR ALL changed classes) is done - using 
> fastAllInstances and GsBitmap instances.
> 
> b) For each instance do a migration and on demand make a commit.
> 
> With this step the multiple scanning of the repository has been removed
> and the largest time is now the base migration code execution. But
> for 119 millions objects this still takes much time. I did not make a full 
> test but an initial test over some hours suggested, that this would take
> around 2 days.
> 
> *** Indices ***
> 
> More than satisfied with the benefits of ONE scan, I had to look to the migration
> code. The base migration code was generated by our code generator and I did
> not want to change that (because it is general and would cover all model versions),
> but actually knowing the specific model I want to migrate from, would cut the
> to be executed code to 1/4 of the originial code. So here would be possibilities
> for enhancements.
> 
> So, what about starting multiple processes, do the step (b) in parallel ? I stored 
> the GsBitmap in page order on the disc and that file became around 600 MByte of data.
> 
> And wrote processes to do the migration in parallel based on that GsBitmap file 
> ... and it did not work. 
> Over and over commit conflicts. No way to go ... speed was pretty bad. 
> 
> Actually only one process was running more or less without problems - the other 
> processes sometime did a little work, but most of the time they did an abort transaction.
> 
> So, somehow these conflicts were based on. As a first step I decided to remove
> ALL indices used in the database. I had luck, that this application had an execution
> path to find all used indices to remove them, to build them etc. 
> 
> 
> That script to remove all indices were started before migration (and it took at 
> least 1-2 hours).
> 
> Then I started the parallel migration code and now the stuff was working. The I7 
> had 8 execution threads and I started 8 of these processes and they work without 
> problems. The topaz script were started with "-t 500000" and that fit very well to
> the machine above. 100% usage of the available RAM und minimal swapping.
> 
> The code itself had a sliding transaction size (from 1 to max. of 20000 objects between 
> each commit. This limit is adapted according to conflicts/successes) - but the logs showed, 
> that the processes are working with the upper value of 20000 each commit.
> 
> 
> So to summarize:
> 
> a) Scanning the objects with fastAllInstances in ONE scan (1-2 hours)
> b) Removing the indices (1-2) hours
> c) Starting the migration code in 8 tasks (8 hours)
> d) Scanning the objects with fastAllInstances in ONE scan (1-2 hours) - to reassure
> e) Clean the history
> f) Building the indices (3) hours
> 
> So, now I am at 17 hours and that is ok. I think, that (b) and (f) could also be 
> done in parallel execution mode.
> 
> *** Workload ***
> 
> So removing indices in concurrent tasks leads to very strange exception errors, so 
> I gave that up. 
> 
> Creating indices in concurrent tasks work - so the 3h above can be reduced to 40 
> minutes and the overall time is now 15 hours.
> 
> *** Equal Workload up to the end ***
> 
> The next point shown up in this work was, that the work of creating indices vary 
> very much and so some task have much more to do than others ... and the
> parallel workidea is not done up to the end. (creating indices task: 37 minutes (longest) 
> against 11 minutes (fastest)). So rearranging this work could still improve the
> time needed to create the indices.
> 
> 
> Marten
> _______________________________________________
> Glass mailing list
> Glass at lists.gemtalksystems.com
> https://lists.gemtalksystems.com/mailman/listinfo/glass