[Glass] Position method for BtreePlusReadStream classes

Sat Jun 13 16:10:14 PDT 2020

Marten,
(see my answer in-lined)
For me this point is not critical because the user can NOT go from page 
1 to page 50.000. Only can advance by 1 page or go  to the last (in this 
case reverseReadStream is enough). Although it will very good to have a 
primitive to go position 50.000 without faulting previous "btree" 
structure into memory.
In my case a user usually do a search by field value to find a desired 
form or groups of forms. A very common case is to add "state" field and 
different roles are only interested in forms with a specific "state" 
value (let say "started", "in process", "finished"). With an index of 
that field it seems enough. Other approach is to add an "id" field on 
the form but all of these cases are heavily dependent on your business.

> a) Between each REST call to demand a new page of data, there might be 
> transactional changes in the viewed set - so you have no stable set, 
> but an always changing set. In some domain areas this is not 
> acceptable, in others its ok. In our domain area we do not talk about 
> that :-)
REST is stateless so is impossible to avoid this. But you can do some 
kind of AJAX call for a refresh at client level. Anyway if the user stay 
too much time in a page it sure that it is watching an outdated list of 
objects.
> b) We are thinking about implementing a dual approach - paging on the 
> typical page mechanism and with additional gop help, but we have not 
> done further work there. The idea is, that we also deliver the gop of 
> the first and the last object shown in the current page and then on 
> the server side we find  the starting point of the current page (and 
> the the next page) and a predicatable way. We have sets with more than 
> 500000 elements and paging in such large set is not useable ... the 
> iterations to find the starting object (via the btree-structure) is 
> too expensive (time consuming), the UI is too slow.
In your domain does the time stamp is changed often ? or once a time 
stamp assigned to an object is rarely changed ?
And what kind of queries do you execute on this objects ? Because 
sometime is better to address this with your own object structure rather 
than the use of an index.
> c) Another apporach is to use a SortedCollection ... which gives you 
> one pyhsical sorted collection and then you get the speed to do even 
> buffering paging, which is the highest art in Javascript UI - but of 
> course a linear speed decrease when changing that collection at the 
> server or strange stuff when not being careful when changing object data.
>
> SortedCollection against IdentitySet 
> <https://schrievkrom.wordpress.com/2018/04/14/gemstone-s-3-4-0-speed-comparision-sortedcollection-against-identityset/> 
>
> Problems with SortedCollection 
> <https://schrievkrom.wordpress.com/2017/03/21/gemstones-a-dead-item-in-a-sortedcollection-what-the-hell-is-going-on/>
Good to know this !.  No idea about this !
>
>
> One may like it or not, our users are table oriented persons ... and 
> they like paging a lot - even though it might not be very clever to 
> manual look through hundreds of data items.
>
>
> Marten
>
>> bruno buzzi brassesco via Glass < glass at lists.gemtalksystems.com 
>> <mailto:glass at lists.gemtalksystems.com>> hat am 12. Juni 2020 um 
>> 22:49 geschrieben:
>>
>>
>> Marten,
>>
>> It is a REST layer in GS
>> ( https://github.com/brunobuzzi/OrbeonPersistenceLayer) for a Java
>> Application (www.orbeon.com).
>> Orbeon display forms instances as summaries
>> ( https://doc.orbeon.com/form-builder/summary-page) sorted by 
>> modifiedTime.
>>
>> The summaries has paging buttons to display next bunch of forms. If the
>> collection is small it is ok to fault the entire collection into memory
>> and send #asSortedCollection. But if the collection is large an index by
>> modifiedTime must be used.
>>
>> When a user click on a summary (next/previous button) Orbeon call the
>> REST layer with: form name, form version, page size (forms per page to
>> display) and page number (the index of the current page).
>>
>> In GS i must able to do something like: aBtreePlusReadStream skip:
>> pageSize * pageNumber, in order to read the forms in modifiedTime order.
>>
>> Right now i do not have the numbers for 500.000 position but at some
>> point i going to test the project with a large quantity of forms.
>> I will post the results here when is done.
>>
>> regards,
>> bruno
>>
>>
>>> Bruno,
>>>
>>> would you be able to talk about , why and how do you use this feature
>>> and how the speed is (if you want to go to 500000 position). Paging ?
>>>
>>> I always like to discuss/support enhancements in the GsQuery structure
>>>
>>> Marten
>> On 9/6/2020 13:59, Dale Henrichs via Glass wrote:
>>> Bruno,
>>>
>>> I've submitted an internal feature request (48811), so keep your eyse
>>> peeled.
>>>
>>> Dale
>>>
>>> On 6/9/20 9:20 AM, smalltalk--- via Glass wrote:
>>>> Dale,
>>>>
>>>> That’s exactly was I looking for. A public method in next release it
>>>> will good too,
>>>> Thank very much...
>>>>
>>>> ----- Mensaje original -----
>>>> De: Dale Henrichs via Glass < glass at lists.gemtalksystems.com 
>>>> <mailto:glass at lists.gemtalksystems.com>>
>>>> Para: glass at lists.gemtalksystems.com 
>>>> <mailto:glass at lists.gemtalksystems.com>
>>>> Enviado: Mon, 08 Jun 2020 19:35:37 -0300 (UYT)
>>>> Asunto: Re: [Glass] Position method for BtreePlusReadStream classes
>>>>
>>>> Bruno,
>>>>
>>>> Sorry, I wasn't sure what you were asking ... but now that you mention
>>>> it, there is already a method that will advance the stream cursor,
>>>> without accessing the object at that position (_btreeNextNoValue)
>>>>
>>>>      [ stream atEnd not and: [ pos < collectionSize ] ]
>>>>         whileTrue: [
>>>>           pos := pos + 1.
>>>>           stream _btreeNextNoValue ]
>>>>
>>>> If you want to count backward from the end using a reversedReadStream,
>>>> then you'd have to implement _btreePreviousNoValue:
>>>>
>>>>      _btreePreviousNoValue
>>>>         "Returns the next value on a stream of B-tree values and root
>>>> objects.  Updates the current
>>>>        stack for a subsequent 'next'."
>>>>
>>>>         | val |
>>>>         " get the index into the leaf node and see if it has reached
>>>> the end "
>>>>         currentIndex == 0
>>>>           ifTrue: [ ^ self _errorEndOfStream ].    " get the leaf and
>>>> the value within the leaf "
>>>>         (currentNode == endNode and: [ endIndex == currentIndex ])
>>>>           ifTrue: [
>>>>             currentIndex := 0.
>>>>             ^ self ].    " see if index refers to first entry in this
>>>> leaf "
>>>>         currentIndex == 1
>>>>           ifTrue: [
>>>>             " must look down the stack for the next leaf node "
>>>>             self _previousLeaf ]
>>>>           ifFalse: [ currentIndex := currentIndex - 
>>>> currentEntrySize ].
>>>>
>>>> _btreeNextNoValue and _btreePreviousNoValue both avoid faulting the
>>>> values into the image, just the interior and leaf nodes would be 
>>>> faulted
>>>> in gut that is unavoidable ...
>>>>
>>>> If the these would work for you I can see adding skip: to both
>>>> BtreePlusGsIndexReadStream and BtreePlusGsReversedIndexReadStream to
>>>> make it official ... let me know if this is what you are looking for,
>>>>
>>>> Dale
>>>>
>>>> On 6/8/20 1:11 PM, bruno buzzi brassesco via Glass wrote:
>>>>> Dale,
>>>>>
>>>>> Which is the difference of your solution with the following ?
>>>>> btreePlusReadStream := gsQuery reversedReadStream.
>>>>> position := 1.
>>>>> [btreePlusReadStream atEnd not and: [position < collectionSize]]
>>>>> whileTrue: [btreePlusReadStream next. position := position + 1].
>>>>>
>>>>> What i want is to have a very large (a millon ?) GsQuery result set
>>>>> (in index order) and go to a position (K) without faulting into 
>>>>> memory
>>>>> objects previous to the (K) position.
>>>>>
>>>>> regards
>>>>> bruno
>>>>>
>>>>> On 8/6/2020 16:18, Dale Henrichs via Glass wrote:
>>>>>> Bruno,
>>>>>>
>>>>>> There is no backing collection for the BtreePlusReadStream, so being
>>>>>> able to go to a certain position is not possible without 
>>>>>> counting....
>>>>>>
>>>>>> We should be able to quickly produce a result set of the entire 
>>>>>> query
>>>>>> results, but it would be a set not an ordered collection:( And to 
>>>>>> get
>>>>>> results _in order_ the streaming API is the only solution... To get
>>>>>> the kind of performance that you would want, I would think that it
>>>>>> should be possible to create a primitive that would produce the
>>>>>> result set in the form of an Array (in order) instead of a Set.
>>>>>>
>>>>>> For now you would have to produce the Array yourself using:
>>>>>>
>>>>>>      | result |
>>>>>>      result := {}.
>>>>>>      gsQuery do: [:each | result add: each]
>>>>>>
>>>>>> #do: uses the BtreePlusReadStream api underneath covers, so the #do:
>>>>>> elements are processed in order ...
>>>>>>
>>>>>> Let me know if you you would need a primitive for performance and I
>>>>>> can submit a feature request ...
>>>>>>
>>>>>> Dale
>>>>>>
>>>>>> On 6/8/20 11:53 AM, smalltalk--- via Glass wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> aRcIdentitySet has an index on 'each.modifiedTime' and it can 
>>>>>>> have a
>>>>>>> lot of instances.
>>>>>>>
>>>>>>> In order to get a list of sorted instances (by modifiedTime) i do:
>>>>>>> |gsQuery|
>>>>>>> gsQuery := GsQuery fromString: 'each.modifiedTime <= timeNow'.
>>>>>>> gsQuery bind: 'timeNow' to: TimeStamp now.
>>>>>>> gsQuery on: aRcIdentitySet .
>>>>>>>
>>>>>>> Now i want to 'jump' to a given position in this stream...
>>>>>>> It is possible to use some kind of #position: message in
>>>>>>> aBtreePlusReadStream ?
>>>>>>> (position: does no exist in BtreePlusReadStream)
>>>>>>>
>>>>>>> I could use #next to 'jump' to a given position, but the query can
>>>>>>> be very very large.
>>>>>>> At the end it show a paging web page to a user that can click to 
>>>>>>> get
>>>>>>> the next bunch of objects.
>>>>>>> So i do not want to do #next over a large collection.
>>>>>>>
>>>>>>> regards,
>>>>>>> bruno
>>>>>>> 2.11.0.0
>>>>>>> 2.11.0.0
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Glass mailing list
>>>>>>> Glass at lists.gemtalksystems.com 
>>>>>>> <mailto:Glass at lists.gemtalksystems.com>
>>>>>>> https://lists.gemtalksystems.com/mailman/listinfo/glass
>>>>>> _______________________________________________
>>>>>> Glass mailing list
>>>>>> Glass at lists.gemtalksystems.com 
>>>>>> <mailto:Glass at lists.gemtalksystems.com>
>>>>>> https://lists.gemtalksystems.com/mailman/listinfo/glass
>>>>> _______________________________________________
>>>>> Glass mailing list
>>>>> Glass at lists.gemtalksystems.com 
>>>>> <mailto:Glass at lists.gemtalksystems.com>
>>>>> https://lists.gemtalksystems.com/mailman/listinfo/glass
>>>>
>>>> _______________________________________________
>>>> Glass mailing list
>>>> Glass at lists.gemtalksystems.com <mailto:Glass at lists.gemtalksystems.com>
>>>> https://lists.gemtalksystems.com/mailman/listinfo/glass
>>> _______________________________________________
>>> Glass mailing list
>>> Glass at lists.gemtalksystems.com <mailto:Glass at lists.gemtalksystems.com>
>>> https://lists.gemtalksystems.com/mailman/listinfo/glass
>> _______________________________________________
>> Glass mailing list
>> Glass at lists.gemtalksystems.com <mailto:Glass at lists.gemtalksystems.com>
>> https://lists.gemtalksystems.com/mailman/listinfo/glass
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.gemtalksystems.com/mailman/private/glass/attachments/20200613/a91862a6/attachment-0001.htm>