[Glass] Instance variable handling and performance questions - Sphinx interface
FrankB via Glass
glass at lists.gemtalksystems.com
Sun Feb 8 07:43:39 PST 2015
Hello Martin,
thank you very much for going through my novel and for your quick answers.
As for the moment, I just want to clarify that interfacing Sphinx is NOT an
issue, because this is available through the MariaDB SQL interface, which is
100% compatible to MySQL.
Such an interface is available in my old VW5 and it is most certainly
possible also in GemStone (if not in any better way then at least via the C
interface).
Dynamic instance variables
I read about them in the manual. I see no major difference to our instVars
in a dictionary except that we are not limited to 255 instance variables and
that limit is exceeded in a couple of our model classes.
But I would be curious for a hint where to find more about these limitations
(I found nothing in the docs):
> Some of GemStone's special indexing features are geared towards instance
> variables,
> so you'd get less benefit from those features.
Greetings
Frank
GLASS mailing list wrote
> On 02/08/2015 02:32 AM, FrankB via Glass wrote:
>> Hello,
>
> Hi Frank,
>
> I'll give some very quick answers below. I expect I or others will have
> more complete answers later. Do keep asking any questions you might have
> on this forum!
>
> Regards,
>
> -Martin
>
>>
>> I am still in the process of reading into and trying to understand
>> GemStone
>> with the final goal of becoming able to decide if we shall use GemStone
>> for
>> a new product family currently under development targeting a mass-market
>> as
>> well as requiring server installations with a potentially very high
>> number
>> of users and traffic.
>>
>> This means setting the course for our technical environment for the next
>> 10+
>> years.
>>
>> I currently have two questions and I apologize upfront for that it needs
>> quite some longer text to explain the details and also that I did not
>> have
>> the time to cut it short (which in German is a famous quote from the
>> great
>> writer and poet Johann Wolfgang von Goethe):
>>
>> 1) My first major concern arises from our special way of using
>> application
>> data instance variables for searching and selecting objects in GemStone:
>>
>> In my comprehensive Smalltalk application classes (VW5) ALL user data
>> related instance variables:
>> - are kept by data model instances in attribute dictionaries
>> - are typically accessed by just sending the instVar name via
>> doesNotUnderstand (in 98% there are no getter/setter methods and never
>> ever
>> any direct use oft instVars inside methods)
>> - are created from and defined by version dependent and partially
>> user-defined definition data making them totally dynamic at run-time,
>> because all application data instance variables are created from this
>> definition data and held in dictionaries.
>>
>> Converting this form of dynamically defined instance variables to
>> 'classical' hard-coded instance variables would not only be a major step
>> back, in my view, causing many severe and inacceptable disadvantages, but
>> it
>> would just be practically impossible, because it would require an almost
>> total rewrite of my entire software and its underlying complex framework.
>> No
>> chance in this life!
>>
>> Therefore, my question is whether or not such use of instance variables
>> in
>> dictionaries is supported by GemStone?
>>
>> In other words, would or could GemStone access, search, filter etc. these
>> instVars via doesNotUnderstand just as all of our code does or requires
>> GemStone hard-coded instVars?
>
> GemStone runs Smalltalk, and #doesNotUnderstand works as it normally
> does in Smalltalk, so your approach should work just fine. Some of
> GemStone's special indexing features are geared towards instance
> variables, so you'd get less benefit from those features.
>
> GemStone does, however, have dynamic instance variables, which you might
> find attractive to eventually replace your instance variable dictionaries.
>
>>
>> In case the answer is NO, you can also forget about the next question,
>> because this would mean the end for all of my considerations on using
>> GemStone.
>>
>> 2) Performance of loading complex objects through GemStone versus from
>> MariaDB
>
> I haven't benchmarked MariaDB, but generally I'd expect GemStone to have
> better performance loading complex objects, and especially complex
> object graphs. GemStone usually outperforms relational DBs in this area.
>
>>
>> My concerns regarding this subject relate to two very different projects
>> and
>> products, which have different and in some respects even oppositional
>> requirements. I therefore divide this into two sections:
>>
>> a) Loading very large numbers of middle complex objects under extreme
>> traffic on web servers for generating data for client-side UI and html
>> generation
>>
>> I am currently preparing an already existing but previously only desktop
>> application in VW5 to act as web server that must support these
>> conditions,
>> which are best described using a well-known example:
>> - serve a potentially extremely high number of simultaneous users very
>> similar in type and numbers to those accessing LinkedIn
>> - delivering to them data objects somewhat similar to the person profile
>> structures of LinkedIn but much more complex, because they support
>> multiple
>> content languages, have mandator related parts, user-defied data, and
>> object
>> history, for every master data object
>> - in the beginning support for a few 10 million such data objects is
>> required (we have 5 million already) with the potential need for a couple
>> of
>> 100 million (in the end and hopefully)
>> - support for a large number of searching and filtering criteria, which
>> offer substantially more choices than the relatively simple so-called
>> advanced search features of LinkedIn.
>> - of course, answers are expected by users at Google like performance
>>
>> I should mention that besides this project I have another two different
>> application scenarios to follow with similar requirements and all three
>> are
>> truly new and filling huge market niches. Of course, I am trying to
>> develop
>> the still missing software parts so that all can be re-used later as much
>> as
>> possible. All three projects are supporting the Freemium concept and will
>> be
>> implemented and marketed without any involvement of venture capital
>> sharks,
>> legally organized criminal Mafia, generally called "banks", or any other
>> capital parasites who all together have disastrous control over our
>> economies, societies, and politics - but not over me and my ideas!!! (I
>> am
>> NOT a socialist BTW)
>>
>> As for the storing technology I strongly favour the combination of
>> MariaDB
>> for mere content storage and Sphinx for queries, for searching,
>> filtering,
>> sorting, grouping etc. I do not see any viable alternative to Sphinx,
>> which
>> seems to be the best if not the only player in this top league of highest
>> performance web sites. For the near future the application logic will
>> remain
>> in Smalltalk but I keep the option in mind of later moving parts or even
>> all
>> of it to node.js and server-side JavaScript provided that Smalltalk
>> cannot
>> cope well enough with the performance requirements. Experience will show!
>
> Integration between Sphinx and GemStone is not already built, as far as
> I know. I believe that Sphinx will talk PostgresQL wire protocol, and
> there have been some experiments in GemStone of understanding PostgresQL
> wire protocol, so this may not be too difficult.
>
>
>>
>> Now the decisive question:
>>
>> There are two ways of loading the master data objects resulting from a
>> Sphinx query and needed to generate the data for answering the requests
>> by
>> the browser clients:
>> - have Smalltalk collect the data object fragments from relational
>> MariaDB
>> tables, with their ids are already known; one master object and its
>> depending sub-classes must be collected from about 5 to 8 tables with
>> mostly
>> one access per table, in 2 to 3 cases with typically 2 and up to 10
>> dependent instances per sub-class / table
>> - or use GemStone to load these middle-complex objects by their id with
>> no
>> further querying needed.
>>
>> Would you expect substantial performance advantages deriving from
>> GemStone
>> in such cases?
>
> Yes, this is exactly the kind of case where GemStone will make things
> much simpler, and usually faster as well.
>
>>
>> Regarding this I should mention that experience shows that most of the
>> CPU
>> time is consumed not by the database accesses but rather by converting
>> the
>> fields from these ugly rectangle records into nice round objects and
>> their
>> dictionary based instance variables (you certainly noticed my analogy to
>> this famous old Byte [?] article on "How to squeeze nice round objects
>> into
>> an ugly square database"; I have been in OOD since 1986).
>
> Right. This conversion is precisely what GemStone does *not* have to do,
> since objects are stored on disk in essentially the same format as they
> have in memory.
>
>>
>> b) Loading rather few extremely complex objects
>>
>> The other potential use-case for GemStone is my old but still currently
>> not
>> yet marketed database publishing software. It was developed more than ten
>> years ago with an effort of around 7 men years also in VW5 and it was
>> used
>> so far only by this one world-wide known very large Dutch electrical
>> company
>> where it generates a couple of 100.000 product catalogue pages with very
>> complex layout for print, PDF and html. Data came from my product
>> management
>> software storing over 1 million items.
>
> Interesting application!
>
>>
>> This software severely suffered from great performance problems when
>> loading
>> the 500 to more than 2.000 little components that one page was made up
>> of.
>> This resulted in loading times between 1 and up to 5 minutes per page
>> from
>> MySQL even in single user mode.
>>
>> This was the major reason why this software was never marketed beyond
>> this
>> one large customer. That is a pity not only because of the investment but
>> primarily, because there still is no similar solution available on the
>> market to the best of my knowledge. And the software also generated html,
>> too, which makes it suitable for many more purposes today.
>>
>> Of course, both DBMS software and the available hardware (SSD or RAM
>> disks)
>> are multiple times faster today than back from 2002 to 2007. Despite this
>> I
>> would still expect that GemStone could improve loading times of these
>> very
>> complex objects substantially compared to MariaDB.
>
> I would think that yes, you would see performance improvements with
> GemStone.
>
>>
>> Any general comments?
>>
>> Now comes the great BUT:
>>
>> Instead of targeting the professional publishing market, I would rather
>> prefer to first cover a different and newly developing mass market via a
>> browser based solution (I am having a couple of unique ideas and new
>> features in mind), primarily because such an application perfectly fits
>> into
>> and substantially up-values two of my above mentioned server projects.
>>
>> Therefore, I will have to expect and prepare for up to a couple of
>> thousand
>> simultaneous users accessing their own private or group-wide private data
>> (no shared data beyond group borders, but a few simultaneous users per
>> group) stored on web servers.
>>
>> Here the main two questions:
>>
>> I know that this is a very difficult guess but what would you expect that
>> one GemStone server could support in terms of simultaneous users and
>> requests (mostly pages loaded).
>
> It varies by many factors. Experimentation would be required. Individual
> GemStone databases have been scaled up to run many thousands of VMs on
> hundreds of multi-core machines, so it's likely that it will be possible
> to scale up to the level you need.
>
>>
>> And my last question:
>>
>> Would you consider a GemStone only server installation suitable? Or would
>> you recommend to offload the non-DB related work to clients like Pharo
>> ('work' here is essentially extracting data chunks to be sent to the fat
>> clients, which generate UI and hmtl for themselves)?
>>
>> I would very much appreciate a comment.
>>
>
> For web-based solutions, the simplest is to split the application
> between the GemStone server and the browser. Depending on the
> sophistication of the UI, the browser can either just use HTML generated
> on GemStone or can use its JavaScript engine to run Amber or Javascript
> code.
>
> GemStone can either be the web server or can talk FCGI to a webserver
> such as Nginx.
>
>
>> Thank you very much for taking the time to read my long novel.
>> Frank
>>
>
> You're welcome!
> _______________________________________________
> Glass mailing list
> Glass at .gemtalksystems
> http://lists.gemtalksystems.com/mailman/listinfo/glass
GLASS mailing list wrote
> On 02/08/2015 02:32 AM, FrankB via Glass wrote:
>> Hello,
>
> Hi Frank,
>
> I'll give some very quick answers below. I expect I or others will have
> more complete answers later. Do keep asking any questions you might have
> on this forum!
>
> Regards,
>
> -Martin
>
>>
>> I am still in the process of reading into and trying to understand
>> GemStone
>> with the final goal of becoming able to decide if we shall use GemStone
>> for
>> a new product family currently under development targeting a mass-market
>> as
>> well as requiring server installations with a potentially very high
>> number
>> of users and traffic.
>>
>> This means setting the course for our technical environment for the next
>> 10+
>> years.
>>
>> I currently have two questions and I apologize upfront for that it needs
>> quite some longer text to explain the details and also that I did not
>> have
>> the time to cut it short (which in German is a famous quote from the
>> great
>> writer and poet Johann Wolfgang von Goethe):
>>
>> 1) My first major concern arises from our special way of using
>> application
>> data instance variables for searching and selecting objects in GemStone:
>>
>> In my comprehensive Smalltalk application classes (VW5) ALL user data
>> related instance variables:
>> - are kept by data model instances in attribute dictionaries
>> - are typically accessed by just sending the instVar name via
>> doesNotUnderstand (in 98% there are no getter/setter methods and never
>> ever
>> any direct use oft instVars inside methods)
>> - are created from and defined by version dependent and partially
>> user-defined definition data making them totally dynamic at run-time,
>> because all application data instance variables are created from this
>> definition data and held in dictionaries.
>>
>> Converting this form of dynamically defined instance variables to
>> 'classical' hard-coded instance variables would not only be a major step
>> back, in my view, causing many severe and inacceptable disadvantages, but
>> it
>> would just be practically impossible, because it would require an almost
>> total rewrite of my entire software and its underlying complex framework.
>> No
>> chance in this life!
>>
>> Therefore, my question is whether or not such use of instance variables
>> in
>> dictionaries is supported by GemStone?
>>
>> In other words, would or could GemStone access, search, filter etc. these
>> instVars via doesNotUnderstand just as all of our code does or requires
>> GemStone hard-coded instVars?
>
> GemStone runs Smalltalk, and #doesNotUnderstand works as it normally
> does in Smalltalk, so your approach should work just fine. Some of
> GemStone's special indexing features are geared towards instance
> variables, so you'd get less benefit from those features.
>
> GemStone does, however, have dynamic instance variables, which you might
> find attractive to eventually replace your instance variable dictionaries.
>
>>
>> In case the answer is NO, you can also forget about the next question,
>> because this would mean the end for all of my considerations on using
>> GemStone.
>>
>> 2) Performance of loading complex objects through GemStone versus from
>> MariaDB
>
> I haven't benchmarked MariaDB, but generally I'd expect GemStone to have
> better performance loading complex objects, and especially complex
> object graphs. GemStone usually outperforms relational DBs in this area.
>
>>
>> My concerns regarding this subject relate to two very different projects
>> and
>> products, which have different and in some respects even oppositional
>> requirements. I therefore divide this into two sections:
>>
>> a) Loading very large numbers of middle complex objects under extreme
>> traffic on web servers for generating data for client-side UI and html
>> generation
>>
>> I am currently preparing an already existing but previously only desktop
>> application in VW5 to act as web server that must support these
>> conditions,
>> which are best described using a well-known example:
>> - serve a potentially extremely high number of simultaneous users very
>> similar in type and numbers to those accessing LinkedIn
>> - delivering to them data objects somewhat similar to the person profile
>> structures of LinkedIn but much more complex, because they support
>> multiple
>> content languages, have mandator related parts, user-defied data, and
>> object
>> history, for every master data object
>> - in the beginning support for a few 10 million such data objects is
>> required (we have 5 million already) with the potential need for a couple
>> of
>> 100 million (in the end and hopefully)
>> - support for a large number of searching and filtering criteria, which
>> offer substantially more choices than the relatively simple so-called
>> advanced search features of LinkedIn.
>> - of course, answers are expected by users at Google like performance
>>
>> I should mention that besides this project I have another two different
>> application scenarios to follow with similar requirements and all three
>> are
>> truly new and filling huge market niches. Of course, I am trying to
>> develop
>> the still missing software parts so that all can be re-used later as much
>> as
>> possible. All three projects are supporting the Freemium concept and will
>> be
>> implemented and marketed without any involvement of venture capital
>> sharks,
>> legally organized criminal Mafia, generally called "banks", or any other
>> capital parasites who all together have disastrous control over our
>> economies, societies, and politics - but not over me and my ideas!!! (I
>> am
>> NOT a socialist BTW)
>>
>> As for the storing technology I strongly favour the combination of
>> MariaDB
>> for mere content storage and Sphinx for queries, for searching,
>> filtering,
>> sorting, grouping etc. I do not see any viable alternative to Sphinx,
>> which
>> seems to be the best if not the only player in this top league of highest
>> performance web sites. For the near future the application logic will
>> remain
>> in Smalltalk but I keep the option in mind of later moving parts or even
>> all
>> of it to node.js and server-side JavaScript provided that Smalltalk
>> cannot
>> cope well enough with the performance requirements. Experience will show!
>
> Integration between Sphinx and GemStone is not already built, as far as
> I know. I believe that Sphinx will talk PostgresQL wire protocol, and
> there have been some experiments in GemStone of understanding PostgresQL
> wire protocol, so this may not be too difficult.
>
>
>>
>> Now the decisive question:
>>
>> There are two ways of loading the master data objects resulting from a
>> Sphinx query and needed to generate the data for answering the requests
>> by
>> the browser clients:
>> - have Smalltalk collect the data object fragments from relational
>> MariaDB
>> tables, with their ids are already known; one master object and its
>> depending sub-classes must be collected from about 5 to 8 tables with
>> mostly
>> one access per table, in 2 to 3 cases with typically 2 and up to 10
>> dependent instances per sub-class / table
>> - or use GemStone to load these middle-complex objects by their id with
>> no
>> further querying needed.
>>
>> Would you expect substantial performance advantages deriving from
>> GemStone
>> in such cases?
>
> Yes, this is exactly the kind of case where GemStone will make things
> much simpler, and usually faster as well.
>
>>
>> Regarding this I should mention that experience shows that most of the
>> CPU
>> time is consumed not by the database accesses but rather by converting
>> the
>> fields from these ugly rectangle records into nice round objects and
>> their
>> dictionary based instance variables (you certainly noticed my analogy to
>> this famous old Byte [?] article on "How to squeeze nice round objects
>> into
>> an ugly square database"; I have been in OOD since 1986).
>
> Right. This conversion is precisely what GemStone does *not* have to do,
> since objects are stored on disk in essentially the same format as they
> have in memory.
>
>>
>> b) Loading rather few extremely complex objects
>>
>> The other potential use-case for GemStone is my old but still currently
>> not
>> yet marketed database publishing software. It was developed more than ten
>> years ago with an effort of around 7 men years also in VW5 and it was
>> used
>> so far only by this one world-wide known very large Dutch electrical
>> company
>> where it generates a couple of 100.000 product catalogue pages with very
>> complex layout for print, PDF and html. Data came from my product
>> management
>> software storing over 1 million items.
>
> Interesting application!
>
>>
>> This software severely suffered from great performance problems when
>> loading
>> the 500 to more than 2.000 little components that one page was made up
>> of.
>> This resulted in loading times between 1 and up to 5 minutes per page
>> from
>> MySQL even in single user mode.
>>
>> This was the major reason why this software was never marketed beyond
>> this
>> one large customer. That is a pity not only because of the investment but
>> primarily, because there still is no similar solution available on the
>> market to the best of my knowledge. And the software also generated html,
>> too, which makes it suitable for many more purposes today.
>>
>> Of course, both DBMS software and the available hardware (SSD or RAM
>> disks)
>> are multiple times faster today than back from 2002 to 2007. Despite this
>> I
>> would still expect that GemStone could improve loading times of these
>> very
>> complex objects substantially compared to MariaDB.
>
> I would think that yes, you would see performance improvements with
> GemStone.
>
>>
>> Any general comments?
>>
>> Now comes the great BUT:
>>
>> Instead of targeting the professional publishing market, I would rather
>> prefer to first cover a different and newly developing mass market via a
>> browser based solution (I am having a couple of unique ideas and new
>> features in mind), primarily because such an application perfectly fits
>> into
>> and substantially up-values two of my above mentioned server projects.
>>
>> Therefore, I will have to expect and prepare for up to a couple of
>> thousand
>> simultaneous users accessing their own private or group-wide private data
>> (no shared data beyond group borders, but a few simultaneous users per
>> group) stored on web servers.
>>
>> Here the main two questions:
>>
>> I know that this is a very difficult guess but what would you expect that
>> one GemStone server could support in terms of simultaneous users and
>> requests (mostly pages loaded).
>
> It varies by many factors. Experimentation would be required. Individual
> GemStone databases have been scaled up to run many thousands of VMs on
> hundreds of multi-core machines, so it's likely that it will be possible
> to scale up to the level you need.
>
>>
>> And my last question:
>>
>> Would you consider a GemStone only server installation suitable? Or would
>> you recommend to offload the non-DB related work to clients like Pharo
>> ('work' here is essentially extracting data chunks to be sent to the fat
>> clients, which generate UI and hmtl for themselves)?
>>
>> I would very much appreciate a comment.
>>
>
> For web-based solutions, the simplest is to split the application
> between the GemStone server and the browser. Depending on the
> sophistication of the UI, the browser can either just use HTML generated
> on GemStone or can use its JavaScript engine to run Amber or Javascript
> code.
>
> GemStone can either be the web server or can talk FCGI to a webserver
> such as Nginx.
>
>
>> Thank you very much for taking the time to read my long novel.
>> Frank
>>
>
> You're welcome!
> _______________________________________________
> Glass mailing list
> Glass at .gemtalksystems
> http://lists.gemtalksystems.com/mailman/listinfo/glass
--
View this message in context: http://forum.world.st/Instance-variable-handling-and-performance-questions-tp4804451p4804495.html
Sent from the GLASS mailing list archive at Nabble.com.
More information about the Glass
mailing list