[Glass] Where are indices really useful ?

Dale Henrichs via Glass glass at lists.gemtalksystems.com
Sat Jan 28 12:02:03 PST 2017


I suggest that you wait for 3.4 before making up your mind about 
indexing support.

The work that we are doing for 3.4 should significantly speed up the 
execution of queries, but I don't think that query speedup is the only 
thing that can be done ...

The main thrust of my efforts for 3.4 involves eliminating the need for 
the RcIndexDictionary ...  Without going into gory details, eliminating 
the RcIndexDictionary means that we can eliminate quite a bit of object 
faulting and execution time associated with **each object in a query 
result***. In my initial prototype, I saw an overall speedup of 2x in 
running one of the major index tests in our test suite ... this test 
spends most of it's time in query evaluation, but it also sequences 
through the whole life cycle of an index: index creation/removal; 
adding, updating, and removing indexed objects... and my initial 
prototype is written all in Smalltalk ... there are several areas where 
we can add new primitives to speed things up even more.

You have been making several general observations about index 
performance, but unfortunately you haven't provided any examples or test 
cases to back up your observations ...

I am not claiming that your observations are wrong, but without an 
example or test case I cannot know what the root cause may be and 
whether or not the performance issues you are seeing in your particular 
test cases will be addressed or not by the new work or even what 
additional work that _could_ be done to improve things for you.

You have made some observations that are likely to not be directly 
affected by my work, like order of predicate execution.

As you have observed, currently, you can only "hope to reduce the size 
of your result set". One of features we are are adding is a counted b 
plus tree implementation ... with a counted b plus tree, it should be 
possible to calculate the size of a result set without executing the 
entire query ... depending upon the exact nature of your queries, it may 
be possible to define a query optimizer that utilizes the estimated 
result set sizes to determine predicate evaluation order ... otoh, 
without knowing the details of your exact use case, we may not implement 
the optimizer in such a way that addresses your use case ...

If you are willing, I am sure that we could make an alpha release of 3.4 
available to so you could give us better feedback. In fact, I would 
really like to get a closer look at your use cases. Ideally I would get 
a copy of your development extent so that I can work with exactly the 
same sample data and code that you are using ... If you are unable to 
share extents, then a set of test cases that exhibit the same behavior 
as you are seeing would be much appreciated.

This happens to be the perfect time to be looking at index performance 
issues ... I am a couple of weeks from finishing up the initial work on 
the new implementation and being able to get direct feedback from you 
would help us focus and prioritize any additional work that might be 
needed to round out the usability and performance of the new implementation.


On 1/28/17 9:27 AM, Marten Feldtmann via Glass wrote:
> After testing the index support in Gemstone I come (for me) to the 
> following point:
> a) try to define a query with only one predicate to get a streamable 
> result set (and hope to reduce the size of your result set)
> b) if you have more than one predicate -> build your own block and 
> then execute this block against the result set from (a).
> c) Defining more than one index on a set may result in strange 
> (slower) query execution time - the query is executed faster, if no 
> index is defined at all.
> Puuuh, lots of new things to learn. Perhaps it would be nice to ask 
> GsQuery NOT to use indices at all - so you have ONE programming 
> interface for all cases.
> Marten
>> Marten Feldtmann via Glass <glass at lists.gemtalksystems.com> hat am 
>> 28. Januar 2017 um 18:09 geschrieben:
>> But using streamable result sets pop up another interface problem: 
>> they are only allowed to use, if an equalityIndex is defined on that 
>> attribute. In the example I create an index only on larger sets 
>> (>50000 items) - that means sometimes the index is available, but 
>> sometimes not .. so to make it right, you should ask GsQuery (if this 
>> is possible at all) if the query can be executed at all and then 
>> according to the result, use the streaming interface - otherwise the 
>> do: interface.
>> Marten
>>> First streamable queries assume equality indices and I assume, that 
>>> comparing numerical values (with something different than =) can not 
>>> benefit from identity indices.
>>> Marten
>>>> Mariano Martinez Peck <marianopeck at gmail.com> hat am 28. Januar 
>>>> 2017 um 17:25 geschrieben:
>>>> Vía phone.. Quick answer.. Small double are immediate objects so 
>>>> always are identical. Can you measure performance with identity index?
>>> _______________________________________________ Glass mailing list 
>>> Glass at lists.gemtalksystems.com 
>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>> _______________________________________________ Glass mailing list 
>> Glass at lists.gemtalksystems.com 
>> http://lists.gemtalksystems.com/mailman/listinfo/glass
> _______________________________________________
> Glass mailing list
> Glass at lists.gemtalksystems.com
> http://lists.gemtalksystems.com/mailman/listinfo/glass

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/glass/attachments/20170128/dd59bc49/attachment.html>

More information about the Glass mailing list