[Glass] Instance variable handling and performance questions

Sun Feb 8 02:32:44 PST 2015

Hello,

I am still in the process of reading into and trying to understand GemStone
with the final goal of becoming able to decide if we shall use GemStone for
a new product family currently under development targeting a mass-market as
well as requiring server installations with a potentially very high number
of users and traffic. 

This means setting the course for our technical environment for the next 10+
years.

I currently have two questions and I apologize upfront for that it needs
quite some longer text to explain the details and also that I did not have
the time to cut it short (which in German is a famous quote from the great
writer and poet Johann Wolfgang von Goethe):

1) My first major concern arises from our special way of using application
data instance variables for searching and selecting objects in GemStone:

In my comprehensive Smalltalk application classes (VW5) ALL user data
related instance variables:
- are kept by data model instances in attribute dictionaries 
- are typically accessed by just sending the instVar name via
doesNotUnderstand (in 98% there are no getter/setter methods and never ever
any direct use oft instVars inside methods)
- are created from and defined by version dependent and partially
user-defined definition data making them totally dynamic at run-time,
because all application data instance variables are created from this
definition data and held in dictionaries.

Converting this form of dynamically defined instance variables to
'classical' hard-coded instance variables would not only be a major step
back, in my view, causing many severe and inacceptable disadvantages, but it
would just be practically impossible, because it would require an almost
total rewrite of my entire software and its underlying complex framework. No
chance in this life!

Therefore, my question is whether or not such use of instance variables in
dictionaries is supported by GemStone? 

In other words, would or could GemStone access, search, filter etc. these
instVars via doesNotUnderstand just as all of our code does or requires
GemStone hard-coded instVars?

In case the answer is NO, you can also forget about the next question,
because this would mean the end for all of my considerations on using
GemStone.

2) Performance of loading complex objects through GemStone versus from
MariaDB

My concerns regarding this subject relate to two very different projects and
products, which have different and in some respects even oppositional
requirements. I therefore divide this into two sections:

a) Loading very large numbers of middle complex objects under extreme
traffic on web servers for generating data for client-side UI and html
generation

I am currently preparing an already existing but previously only desktop
application in VW5 to act as web server that must support these conditions,
which are best described using a well-known example:
- serve a potentially extremely high number of simultaneous users very
similar in type and numbers to those accessing LinkedIn
- delivering to them data objects somewhat similar to the person profile
structures of LinkedIn but much more complex, because they support multiple
content languages, have mandator related parts, user-defied data, and object
history, for every master data object
- in the beginning support for a few 10 million such data objects is
required (we have 5 million already) with the potential need for a couple of
100 million (in the end and hopefully)
- support for a large number of searching and filtering criteria, which
offer substantially more choices than the relatively simple so-called
advanced search features of LinkedIn.
- of course, answers are expected by users at Google like performance

I should mention that besides this project I have another two different
application scenarios to follow with similar requirements and all three are
truly new and filling huge market niches. Of course, I am trying to develop
the still missing software parts so that all can be re-used later as much as
possible. All three projects are supporting the Freemium concept and will be
implemented and marketed without any involvement of venture capital sharks,
legally organized criminal Mafia, generally called "banks", or any other
capital parasites who all together have disastrous control over our
economies, societies, and politics - but not over me and my ideas!!! (I am
NOT a socialist BTW)

As for the storing technology I strongly favour the combination of MariaDB
for mere content storage and Sphinx for queries, for searching, filtering,
sorting, grouping etc. I do not see any viable alternative to Sphinx, which
seems to be the best if not the only player in this top league of highest
performance web sites. For the near future the application logic will remain
in Smalltalk but I keep the option in mind of later moving parts or even all
of it to node.js and server-side JavaScript provided that Smalltalk cannot
cope well enough with the performance requirements. Experience will show!

Now the decisive question:

There are two ways of loading the master data objects resulting from a
Sphinx query and needed to generate the data for answering the requests by
the browser clients:
- have Smalltalk collect the data object fragments from relational MariaDB
tables, with their ids are already known; one master object and its
depending sub-classes must be collected from about 5 to 8 tables with mostly
one access per table, in 2 to 3 cases with typically 2 and up to 10
dependent instances per sub-class / table
- or use GemStone to load these middle-complex objects by their id with no
further querying needed.

Would you expect substantial performance advantages deriving from GemStone
in such cases?

Regarding this I should mention that experience shows that most of the CPU
time is consumed not by the database accesses but rather by converting the
fields from these ugly rectangle records into nice round objects and their
dictionary based instance variables (you certainly noticed my analogy to
this famous old Byte [?] article on "How to squeeze nice round objects into
an ugly square database"; I have been in OOD since 1986).

b) Loading rather few extremely complex objects

The other potential use-case for GemStone is my old but still currently not
yet marketed database publishing software. It was developed more than ten
years ago with an effort of around 7 men years also in VW5 and it was used
so far only by this one world-wide known very large Dutch electrical company
where it generates a couple of 100.000 product catalogue pages with very
complex layout for print, PDF and html. Data came from my product management
software storing over 1 million items. 

This software severely suffered from great performance problems when loading
the 500 to more than 2.000 little components that one page was made up of.
This resulted in loading times between 1 and up to 5 minutes per page from
MySQL even in single user mode.

This was the major reason why this software was never marketed beyond this
one large customer. That is a pity not only because of the investment but
primarily, because there still is no similar solution available on the
market to the best of my knowledge. And the software also generated html,
too, which makes it suitable for many more purposes today.

Of course, both DBMS software and the available hardware (SSD or RAM disks)
are multiple times faster today than back from 2002 to 2007. Despite this I
would still expect that GemStone could improve loading times of these very
complex objects substantially compared to MariaDB. 

Any general comments?

Now comes the great BUT:

Instead of targeting the professional publishing market, I would rather
prefer to first cover a different and newly developing mass market via a
browser based solution (I am having a couple of unique ideas and new
features in mind), primarily because such an application perfectly fits into
and substantially up-values two of my above mentioned server projects.

Therefore, I will have to expect and prepare for up to a couple of thousand
simultaneous users accessing their own private or group-wide private data
(no shared data beyond group borders, but a few simultaneous users per
group) stored on web servers. 

Here the main two questions:

I know that this is a very difficult guess but what would you expect that
one GemStone server could support in terms of simultaneous users and
requests (mostly pages loaded).

And my last question:

Would you consider a GemStone only server installation suitable? Or would
you recommend to offload the non-DB related work to clients like Pharo
('work' here is essentially extracting data chunks to be sent to the fat
clients, which generate UI and hmtl for themselves)?

I would very much appreciate a comment.

Thank you very much for taking the time to read my long novel.
Frank

--
View this message in context: http://forum.world.st/Instance-variable-handling-and-performance-questions-tp4804451.html
Sent from the GLASS mailing list archive at Nabble.com.