[Glass] Query results as Streams and indexes question

Fri Jan 6 10:31:08 PST 2017

On 01/06/2017 07:21 AM, BrunoBB via Glass wrote:
> Dale,
>
> My mail was a little confusing i think...
>
> Yes readStream is sent to aGsQuery (using GS 3.3.0):
> ('each.username = ''admin''' asQueryOn: instancesSet) readStream.
> "where <instancesSet> is an RcIdentityBag"
Okay and the complete error message would have been:

   a GsMalformedQueryExpressionError occurred (error 2710), 
reason:acceptPredicate:, Query may not be streamed. Predicate: 
'(each.key = ''admin'')' must use an equality index. 
(#queryIsNotStreamable).

so the error was directly complaining about the lack of an index ... 
there were too many possible error messages ... sorry about that
> There error arise when there is NO index on this collection <instancesSet>.
> After executing the following then i get <aRangeIndexReadStream> from the
> previous sentence (no error):
> GsIndexSpec new
> equalityIndex: 'each.username' lastElementClass: String;
> equalityIndex: 'each.groupname' lastElementClass: String;
> createIndexesOn: instancesSet.
>
> I was trying to figure out how to deal with indexes at code level in a very
> specific situation.
>
> But i think is solved when you said:
> "#do: and #readStream both attempt to avoid scanning the entire result
> set and can both be used if you don't intend to to scan the entire
> result set"
In fact, do: uses a stream underneath the covers for indexed queries. 
For non-indexed queries each element is passed to the do: block instead 
of adding it to the result set. As a result the order of result elements 
encountered while using a do: block may differ between an indexed query 
and a non-indexed query ... indexed queries using equality indexes will 
produce results in "sort order" while non-indexed queries will produce 
results in "collection order"
> I thought (i do not why) that #do: will load all objects to memory :(
> If that NOT case then #do: over aGsQuery will do the job for me :)
Yeah, do: was implemented so that you could get an early exit from a 
query that may have a lot of results.
> Also from that large result i need to copy a small segment of objects (for
> paging purpose on a web page).
> The result maybe has 100.000 objects and i want to get objects from position
> 20 to 30.
>
> There is no #copyFrom:to: in GsQuery. What is the best to do ?
> aGsQuery queryResult copyFrom: 20 to: 30.
> "this will load all result to memory ?, size use #queryResult so it should
> not load all objects to memory"
Yes this will load all results into memory, but also, you will get back 
another UnOrderedCollection so copyFrom:to: isn't implemented.
> Or use: aGsQuery readStream position: 20.
The BtreeReadStreams are not PositionableStreams, so you can use position:

So the best bet (for now) would be to do something like:

   19 timesRepeat: [stream next].
   10 timesRepeat: [col add: stream next]

Currently this is the best that can be done for skipping about within a 
stream ...

For 3.4 we are planning on replacing the current btree implementation 
using something similar to a B+tree structure (linked list of leaf 
nodes) and in the process we may also introduce a counted B+tree 
implementation that would then allow us to introduce an efficient skip 
--- when the result set is large and you need to copy from the end of 
the result set ...
> Sorry for my previous confusing mail ...
No problem ... I think we're now on the same page and that is what is 
important

Dale