[Glass] Instance variable handling and performance questions

Sun Feb 8 06:39:34 PST 2015

On 02/08/2015 02:32 AM, FrankB via Glass wrote:
> Hello,

Hi Frank,

I'll give some very quick answers below. I expect I or others will have 
more complete answers later. Do keep asking any questions you might have 
on this forum!

Regards,

-Martin

>
> I am still in the process of reading into and trying to understand GemStone
> with the final goal of becoming able to decide if we shall use GemStone for
> a new product family currently under development targeting a mass-market as
> well as requiring server installations with a potentially very high number
> of users and traffic.
>
> This means setting the course for our technical environment for the next 10+
> years.
>
> I currently have two questions and I apologize upfront for that it needs
> quite some longer text to explain the details and also that I did not have
> the time to cut it short (which in German is a famous quote from the great
> writer and poet Johann Wolfgang von Goethe):
>
> 1) My first major concern arises from our special way of using application
> data instance variables for searching and selecting objects in GemStone:
>
> In my comprehensive Smalltalk application classes (VW5) ALL user data
> related instance variables:
> - are kept by data model instances in attribute dictionaries
> - are typically accessed by just sending the instVar name via
> doesNotUnderstand (in 98% there are no getter/setter methods and never ever
> any direct use oft instVars inside methods)
> - are created from and defined by version dependent and partially
> user-defined definition data making them totally dynamic at run-time,
> because all application data instance variables are created from this
> definition data and held in dictionaries.
>
> Converting this form of dynamically defined instance variables to
> 'classical' hard-coded instance variables would not only be a major step
> back, in my view, causing many severe and inacceptable disadvantages, but it
> would just be practically impossible, because it would require an almost
> total rewrite of my entire software and its underlying complex framework. No
> chance in this life!
>
> Therefore, my question is whether or not such use of instance variables in
> dictionaries is supported by GemStone?
>
> In other words, would or could GemStone access, search, filter etc. these
> instVars via doesNotUnderstand just as all of our code does or requires
> GemStone hard-coded instVars?

GemStone runs Smalltalk, and #doesNotUnderstand works as it normally 
does in Smalltalk, so your approach should work just fine. Some of 
GemStone's special indexing features are geared towards instance 
variables, so you'd get less benefit from those features.

GemStone does, however, have dynamic instance variables, which you might 
find attractive to eventually replace your instance variable dictionaries.

>
> In case the answer is NO, you can also forget about the next question,
> because this would mean the end for all of my considerations on using
> GemStone.
>
> 2) Performance of loading complex objects through GemStone versus from
> MariaDB

I haven't benchmarked MariaDB, but generally I'd expect GemStone to have 
better performance loading complex objects, and especially complex 
object graphs. GemStone usually outperforms relational DBs in this area.

>
> My concerns regarding this subject relate to two very different projects and
> products, which have different and in some respects even oppositional
> requirements. I therefore divide this into two sections:
>
> a) Loading very large numbers of middle complex objects under extreme
> traffic on web servers for generating data for client-side UI and html
> generation
>
> I am currently preparing an already existing but previously only desktop
> application in VW5 to act as web server that must support these conditions,
> which are best described using a well-known example:
> - serve a potentially extremely high number of simultaneous users very
> similar in type and numbers to those accessing LinkedIn
> - delivering to them data objects somewhat similar to the person profile
> structures of LinkedIn but much more complex, because they support multiple
> content languages, have mandator related parts, user-defied data, and object
> history, for every master data object
> - in the beginning support for a few 10 million such data objects is
> required (we have 5 million already) with the potential need for a couple of
> 100 million (in the end and hopefully)
> - support for a large number of searching and filtering criteria, which
> offer substantially more choices than the relatively simple so-called
> advanced search features of LinkedIn.
> - of course, answers are expected by users at Google like performance
>
> I should mention that besides this project I have another two different
> application scenarios to follow with similar requirements and all three are
> truly new and filling huge market niches. Of course, I am trying to develop
> the still missing software parts so that all can be re-used later as much as
> possible. All three projects are supporting the Freemium concept and will be
> implemented and marketed without any involvement of venture capital sharks,
> legally organized criminal Mafia, generally called "banks", or any other
> capital parasites who all together have disastrous control over our
> economies, societies, and politics - but not over me and my ideas!!! (I am
> NOT a socialist BTW)
>
> As for the storing technology I strongly favour the combination of MariaDB
> for mere content storage and Sphinx for queries, for searching, filtering,
> sorting, grouping etc. I do not see any viable alternative to Sphinx, which
> seems to be the best if not the only player in this top league of highest
> performance web sites. For the near future the application logic will remain
> in Smalltalk but I keep the option in mind of later moving parts or even all
> of it to node.js and server-side JavaScript provided that Smalltalk cannot
> cope well enough with the performance requirements. Experience will show!

Integration between Sphinx and GemStone is not already built, as far as 
I know. I believe that Sphinx will talk PostgresQL wire protocol, and 
there have been some experiments in GemStone of understanding PostgresQL 
wire protocol, so this may not be too difficult.

>
> Now the decisive question:
>
> There are two ways of loading the master data objects resulting from a
> Sphinx query and needed to generate the data for answering the requests by
> the browser clients:
> - have Smalltalk collect the data object fragments from relational MariaDB
> tables, with their ids are already known; one master object and its
> depending sub-classes must be collected from about 5 to 8 tables with mostly
> one access per table, in 2 to 3 cases with typically 2 and up to 10
> dependent instances per sub-class / table
> - or use GemStone to load these middle-complex objects by their id with no
> further querying needed.
>
> Would you expect substantial performance advantages deriving from GemStone
> in such cases?

Yes, this is exactly the kind of case where GemStone will make things 
much simpler, and usually faster as well.

>
> Regarding this I should mention that experience shows that most of the CPU
> time is consumed not by the database accesses but rather by converting the
> fields from these ugly rectangle records into nice round objects and their
> dictionary based instance variables (you certainly noticed my analogy to
> this famous old Byte [?] article on "How to squeeze nice round objects into
> an ugly square database"; I have been in OOD since 1986).

Right. This conversion is precisely what GemStone does *not* have to do, 
since objects are stored on disk in essentially the same format as they 
have in memory.

>
> b) Loading rather few extremely complex objects
>
> The other potential use-case for GemStone is my old but still currently not
> yet marketed database publishing software. It was developed more than ten
> years ago with an effort of around 7 men years also in VW5 and it was used
> so far only by this one world-wide known very large Dutch electrical company
> where it generates a couple of 100.000 product catalogue pages with very
> complex layout for print, PDF and html. Data came from my product management
> software storing over 1 million items.

Interesting application!

>
> This software severely suffered from great performance problems when loading
> the 500 to more than 2.000 little components that one page was made up of.
> This resulted in loading times between 1 and up to 5 minutes per page from
> MySQL even in single user mode.
>
> This was the major reason why this software was never marketed beyond this
> one large customer. That is a pity not only because of the investment but
> primarily, because there still is no similar solution available on the
> market to the best of my knowledge. And the software also generated html,
> too, which makes it suitable for many more purposes today.
>
> Of course, both DBMS software and the available hardware (SSD or RAM disks)
> are multiple times faster today than back from 2002 to 2007. Despite this I
> would still expect that GemStone could improve loading times of these very
> complex objects substantially compared to MariaDB.

I would think that yes, you would see performance improvements with 
GemStone.

>
> Any general comments?
>
> Now comes the great BUT:
>
> Instead of targeting the professional publishing market, I would rather
> prefer to first cover a different and newly developing mass market via a
> browser based solution (I am having a couple of unique ideas and new
> features in mind), primarily because such an application perfectly fits into
> and substantially up-values two of my above mentioned server projects.
>
> Therefore, I will have to expect and prepare for up to a couple of thousand
> simultaneous users accessing their own private or group-wide private data
> (no shared data beyond group borders, but a few simultaneous users per
> group) stored on web servers.
>
> Here the main two questions:
>
> I know that this is a very difficult guess but what would you expect that
> one GemStone server could support in terms of simultaneous users and
> requests (mostly pages loaded).

It varies by many factors. Experimentation would be required. Individual 
GemStone databases have been scaled up to run many thousands of VMs on 
hundreds of multi-core machines, so it's likely that it will be possible 
to scale up to the level you need.

>
> And my last question:
>
> Would you consider a GemStone only server installation suitable? Or would
> you recommend to offload the non-DB related work to clients like Pharo
> ('work' here is essentially extracting data chunks to be sent to the fat
> clients, which generate UI and hmtl for themselves)?
>
> I would very much appreciate a comment.
>

For web-based solutions, the simplest is to split the application 
between the GemStone server and the browser. Depending on the 
sophistication of the UI, the browser can either just use HTML generated 
on GemStone or can use its JavaScript engine to run Amber or Javascript 
code.

GemStone can either be the web server or can talk FCGI to a webserver 
such as Nginx.

> Thank you very much for taking the time to read my long novel.
> Frank
>

You're welcome!