[Glass] Position method for BtreePlusReadStream classes

Richard Sargent richard.sargent at gemtalksystems.com
Fri Jun 12 14:55:00 PDT 2020


Bruno,

I am responding to the human factors issues that arise from large
collections, not your need for the API. I agree with the need for the API.

See below.
Edit: in retrospect, you may be using the query API poorly with that design.


GLASS mailing list wrote
> Marten,
> 
> It is a REST layer in GS 
> (https://github.com/brunobuzzi/OrbeonPersistenceLayer) for a Java 
> Application (www.orbeon.com).
> Orbeon display forms instances as summaries 
> (https://doc.orbeon.com/form-builder/summary-page) sorted by modifiedTime.
> 
> The summaries has paging buttons to display next bunch of forms. If the 
> collection is small it is ok to fault the entire collection into memory 
> and send #asSortedCollection. But if the collection is large an index by 
> modifiedTime must be used.
> 
> When a user click on a summary (next/previous button) Orbeon call the 
> REST layer with: form name, form version, page size (forms per page to 
> display) and page number (the index of the current page).
> 
> In GS i must able to do something like: aBtreePlusReadStream skip: 
> pageSize * pageNumber, in order to read the forms in modifiedTime order.

Depending on the sort direction of the index, you will encountered different
concerns. If the index is descending from the most recent time stamp, when
paging through the collection, you will skip any newly modified form once
you've gone past the first page. If ascending order, then it's not a
problem, as any newly modified form will come at the end of the index. But,
that also means that your starting offset will exclude one form that you
haven't seen. More than one if many modifications are occurring in the
interval.

Assuming you are reviewing "recently modified forms", you might want to use
the modifiedTime of the last one shown to start the query for the next page.

For example, a really simplified example, let's says that you are looking at
changes since yesterday. So you start with a query giving you everything
since yesterday at perhaps 16:00. The first page has changes ranging from
shortly after 16:00 through 17:30. When you page, you use 17:30 as the
starting time and get the next page. And so on.

(The time stamps probably have much finer granularity than 1 second, so the
number of modified forms with the same time stamp is probably very low. It
/might/ be > 1, in which case you get more that just one repeat for the next
page. Millisecond or microsecond precision on the time stamps will greatly
reduce the changes of that. FYI, the upcoming GemStone 3.6 will introduce
specials for a number of common classes, such as Date and Time, as well as
DateAndTime. The latter has a range from 2001 through 2072 with microsecond
precision. "Specials" means the OOP fully encodes the value, if that wasn't
already clear.)


> Right now i do not have the numbers for 500.000 position but at some 
> point i going to test the project with a large quantity of forms.
> I will post the results here when is done.

500k forms sounds like a horror to interact with, as a user. Years ago, I
had to deal with 12k employees in a "drop list" and 1,200 bank branches
likewise. These sizes were unusable from a user interface perspective.
Hopefully, you have already considered how to make user selection of a form
manageable when the number of forms is large.

In my cases, both tables were small enough to have in memory, so providing a
filtered search capability for the drop down list was entirely manageable.


> regards,
> bruno
> 
> 
>> Bruno,
>>
>> would you be able to talk about , why and how do you use this feature 
>> and how the speed is (if you want to go to 500000 position). Paging ?
>>
>> I always like to discuss/support enhancements in the GsQuery structure
>>
>> Marten
> On 9/6/2020 13:59, Dale Henrichs via Glass wrote:
>> Bruno,
>>
>> I've submitted an internal feature request (48811), so keep your eyse 
>> peeled.
>>
>> Dale
>>
>> On 6/9/20 9:20 AM, smalltalk--- via Glass wrote:
>>> Dale,
>>>
>>> That’s exactly was I looking for. A public method in next release it 
>>> will good too,
>>> Thank very much...
>>>
>>> ----- Mensaje original -----
>>> De: Dale Henrichs via Glass <

> glass at .gemtalksystems

> >
>>> Para: 

> glass at .gemtalksystems

>>> Enviado: Mon, 08 Jun 2020 19:35:37 -0300 (UYT)
>>> Asunto: Re: [Glass] Position method for BtreePlusReadStream classes
>>>
>>> Bruno,
>>>
>>> Sorry, I wasn't sure what you were asking ... but now that you mention
>>> it, there is already a method that will advance the stream cursor,
>>> without accessing the object at that position (_btreeNextNoValue)
>>>
>>>      [ stream atEnd not and: [ pos < collectionSize ] ]
>>>         whileTrue: [
>>>           pos := pos + 1.
>>>           stream _btreeNextNoValue ]
>>>
>>> If you want to count backward from the end using a reversedReadStream,
>>> then you'd have to implement _btreePreviousNoValue:
>>>
>>>      _btreePreviousNoValue
>>>         "Returns the next value on a stream of B-tree values and root 
>>> objects.  Updates the current
>>>        stack for a subsequent 'next'."
>>>
>>>         | val |
>>>         " get the index into the leaf node and see if it has reached 
>>> the end "
>>>         currentIndex == 0
>>>           ifTrue: [ ^ self _errorEndOfStream ].    " get the leaf and 
>>> the value within the leaf "
>>>         (currentNode == endNode and: [ endIndex == currentIndex ])
>>>           ifTrue: [
>>>             currentIndex := 0.
>>>             ^ self ].    " see if index refers to first entry in this 
>>> leaf "
>>>         currentIndex == 1
>>>           ifTrue: [
>>>             " must look down the stack for the next leaf node "
>>>             self _previousLeaf ]
>>>           ifFalse: [ currentIndex := currentIndex - currentEntrySize ].
>>>
>>> _btreeNextNoValue and _btreePreviousNoValue both avoid faulting the
>>> values into the image, just the interior and leaf nodes would be faulted
>>> in gut that is unavoidable ...
>>>
>>> If the these would work for you I can see adding skip: to both
>>> BtreePlusGsIndexReadStream and BtreePlusGsReversedIndexReadStream to
>>> make it official ... let me know if this is what you are looking for,
>>>
>>> Dale
>>>
>>> On 6/8/20 1:11 PM, bruno buzzi brassesco via Glass wrote:
>>>> Dale,
>>>>
>>>> Which is the difference of your solution with the following ?
>>>> btreePlusReadStream := gsQuery reversedReadStream.
>>>> position := 1.
>>>> [btreePlusReadStream atEnd not and: [position < collectionSize]]
>>>> whileTrue: [btreePlusReadStream next. position := position + 1].
>>>>
>>>> What i want is to have a very large (a millon ?) GsQuery result set
>>>> (in index order) and go to a position (K) without faulting into memory
>>>> objects previous to the (K) position.
>>>>
>>>> regards
>>>> bruno
>>>>
>>>> On 8/6/2020 16:18, Dale Henrichs via Glass wrote:
>>>>> Bruno,
>>>>>
>>>>> There is no backing collection for the BtreePlusReadStream, so being
>>>>> able to go to a certain position is not possible without counting....
>>>>>
>>>>> We should be able to quickly produce a result set of the entire query
>>>>> results, but it would be a set not an ordered collection:( And to get
>>>>> results _in order_ the streaming API is the only solution... To get
>>>>> the kind of performance that you would want, I would think that it
>>>>> should be possible to create a primitive that would produce the
>>>>> result set in the form of an Array (in order) instead of a Set.
>>>>>
>>>>> For now you would have to produce the Array yourself using:
>>>>>
>>>>>      | result |
>>>>>      result := {}.
>>>>>      gsQuery do: [:each | result add: each]
>>>>>
>>>>> #do: uses the BtreePlusReadStream api underneath covers, so the #do:
>>>>> elements are processed in order ...
>>>>>
>>>>> Let me know if you you would need a primitive for performance and I
>>>>> can submit a feature request ...
>>>>>
>>>>> Dale
>>>>>
>>>>> On 6/8/20 11:53 AM, smalltalk--- via Glass wrote:
>>>>>> Hi,
>>>>>>
>>>>>> aRcIdentitySet has an index on 'each.modifiedTime' and it can have a
>>>>>> lot of instances.
>>>>>>
>>>>>> In order to get a list of sorted instances (by modifiedTime) i do:
>>>>>> |gsQuery|
>>>>>> gsQuery := GsQuery fromString: 'each.modifiedTime <= timeNow'.
>>>>>> gsQuery bind: 'timeNow' to: TimeStamp now.
>>>>>> gsQuery on: aRcIdentitySet .
>>>>>>
>>>>>> Now i want to 'jump' to a given position in this stream...
>>>>>> It is possible to use some kind of #position: message in
>>>>>> aBtreePlusReadStream ?
>>>>>> (position: does no exist in BtreePlusReadStream)
>>>>>>
>>>>>> I could use #next to 'jump' to a given position, but the query can
>>>>>> be very very large.
>>>>>> At the end it show a paging web page to a user that can click to get
>>>>>> the next bunch of objects.
>>>>>> So i do not want to do #next over a large collection.
>>>>>>
>>>>>> regards,
>>>>>> bruno
>>>>>> 2.11.0.0
>>>>>> 2.11.0.0
>>>>>>
>>>>>> _______________________________________________
>>>>>> Glass mailing list
>>>>>> 

> Glass at .gemtalksystems

>>>>>> https://lists.gemtalksystems.com/mailman/listinfo/glass
>>>>> _______________________________________________
>>>>> Glass mailing list
>>>>> 

> Glass at .gemtalksystems

>>>>> https://lists.gemtalksystems.com/mailman/listinfo/glass
>>>> _______________________________________________
>>>> Glass mailing list
>>>> 

> Glass at .gemtalksystems

>>>> https://lists.gemtalksystems.com/mailman/listinfo/glass
>>>
>>> _______________________________________________
>>> Glass mailing list
>>> 

> Glass at .gemtalksystems

>>> https://lists.gemtalksystems.com/mailman/listinfo/glass
>> _______________________________________________
>> Glass mailing list
>> 

> Glass at .gemtalksystems

>> https://lists.gemtalksystems.com/mailman/listinfo/glass
> _______________________________________________
> Glass mailing list

> Glass at .gemtalksystems

> https://lists.gemtalksystems.com/mailman/listinfo/glass





--
Sent from: http://forum.world.st/GLASS-f1460844.html


More information about the Glass mailing list