[Glass] There is really no way to have an ordered/sorted collection together with indexes?
Mariano Martinez Peck via Glass
glass at lists.gemtalksystems.com
Wed Aug 9 07:19:25 PDT 2017
Thank you all for your ideas and discussions. I learned a lot:
1) I wasn't aware of the streaming API (even less than results could be
sorted using the index) of GsQuery (until now I was only using the { } kind
of syntax).
2) I didn't know that #sortAscending: and friends could also take benefits
of the index for sorting.
Both things are very cool.
For my particular case, I finally decided to keep both collection... a
SortedCollection for getting "last price" and "all price history sorted"
very fast, and a "Bag with indexes" for querying prices for a given date or
closest to it (using both queries as suggested by Dale...with the
reversedReadStream and even caching the parsed query etc) . Of course, it
made my code a little bit more complex and make the extent grow some more.
But it's worth as I need both scenarios to be fast.
Thank you very much to all of you.
On Tue, Aug 8, 2017 at 12:46 PM, Richard Sargent via Glass <
glass at lists.gemtalksystems.com> wrote:
> GLASS mailing list wrote
> > On Mon, Aug 7, 2017 at 10:47 PM, Dale Henrichs via Glass <
>
> > glass at .gemtalksystems
>
> >> wrote:
> >
> >>
> >>
> >> On 8/7/17 1:04 PM, Richard Sargent via Glass wrote:
> >>
> >>> GLASS mailing list wrote
> >>>
> >>>> Hi guys,
> >>>>
> >>>> I am storing huge lists of "prices". For this, it really helps me to
> >>>> store
> >>>> things ordered (by date)...either in an SequenceableCollection or a
> >>>> SortedCollection. On the other hand, I do want to use indexes to
> >>>> speedup
> >>>> my
> >>>> query to find price for a given date (equality index).
> >>>>
> >>>> But I have found no way to have them both. The only workaround I found
> >>>> is
> >>>> to keep 2 collections for each of these collections, one
> >>>> sorted/ordered,
> >>>> and the other one an unordered one for querying via index. But this is
> >>>> a
> >>>> pain from "developing" point of view, as well as for unnecessary
> >>>> repository
> >>>> growth.
> >>>>
> >>>> Am I missing something?
> >>>>
> >>> Mariano, have you read chapter 7 in the GemStone/S 64 Programming
> Guide?
> >>> It's all about indexing.
> >>>
> >>> It looks like you could define a "range" query over the unordered
> >>> collection
> >>> to get the sorted sequence needed to traverse all the prices in date
> >>> order.
> >>> The index would be on the date and the range query would be from some
> >>> "least
> >>> date" through some "greatest date". You would then use the streaming
> >>> results
> >>> or the #do: message I think) to iterate over the query result in date
> >>> order.
> >>> There is a section in chapter 7.2 discussing result order.
> >>>
> >>>
> >>> It might be helpful to discuss the use cases for the two collections.
> >>> When
> >>> would you iterate over all the dates versus when would you search for
> >>> specific dates or ranges of dates?
> >>>
> >>>
> >>> Actually I think you can just set up two queries ... one to search for
> >> the exact date:
> >>
> >> detectQuery := (GsQuery fromString: 'each.value = targetDate' on: nsc)
> >>
> >> and one query to find the first date prior to your targetDate:
> >>
> >> nearestQuery := (GsQuery fromString: 'each.value < targetDate' on:
> nsc)
> >>
> >>
> >>
> >
> > Hi Dale,
> >
> > Thanks for your answers. I am headed bed right now, but I would do a
> quick
> > question so to have as much as possible info for tomorrow...
> >
> > Aside from both needed queries above, I would still need this 2 more
> APIs:
> >
> > 3) get the whole price history sorted.
>
> Mariano, given the size of this collection, how would having all 1 million
> prices (or 10 million?) be usable? #do: would allow you to iterate across
> the collection indexed by date and get the prices in date order as would
> streaming over the result. Dale points out that you can get the result set
> as a collection, but what is it you would do with it such that you need the
> entire sorted collection at one time?
>
>
> > 4) get the newest available price of the collection
> >
> > How can I make those fast too taking advantage of the index? For 3) I can
> > simply do the nearestQuery with a future day (like tomorrow), but maybe
> > there is a cleaner way? And for 4?
> >
> > In summary.... I am comparing whether store things sorted (so that 3 and
> 4
> > are fast...3 is a simple #last and 4 is simple same collection..nothing
> to
> > do ) and make binary search for exact query and nearest query, vs store
> > things in a bag and do everything via index (querying will be fast, but I
> > doubt about 3) and 4)... i mostly doubt about 3)..
> >
> >
> > Thanks a lot!
> >
> >
> >
> >> Then you can use detect:ifNone: sent to the query object itself to
> either
> >> find the first matching date using the index or the first date prior to
> >> the
> >> target data using an index ... in both queries you are using very
> >> efficient
> >> btree lookups and avoiding the need to scan your collection (key is the
> >> price and value if the date):
> >>
> >> | nsc random maxYear detectQuery targetDate result |
> >> nsc := IdentityBag new.
> >> random := HostRandom new.
> >> GsIndexSpec new
> >> equalityIndex: 'value' lastElementClass: Date;
> >> createIndexesOn: nsc.
> >> 1 to: 100 do: [ :index |
> >> nsc
> >> add: (ScaledDecimal for: random float scale: 2) ->
> >> (Date
> >> newDay: (random integerBetween: 1 and: 365)
> >> year: (random integerBetween: 2000 and: 2017)) ].
> >> targetDate := Date newDay: 250 year: 2011.
> >> detectQuery := (GsQuery fromString: 'each.value = targetDate' on:
> >> nsc)
> >> bind: 'targetDate'
> >> to: targetDate.
> >> result := detectQuery
> >> detect: [ :date | true ]
> >> ifNone: [ | nearestQuery |
> >> nearestQuery := (GsQuery fromString: 'each.value <
> targetDate'
> >> on: nsc)
> >> bind: 'targetDate'
> >> to: targetDate.
> >> nearestQuery reversedReadStream next].
> >> {nsc. nsc sortAscending: 'value'. targetDate. result}
> >>
> >> I'm returning the sorted nsc, to make it easy to validate that the
> >> result
> >> is correct, since we're generating random dates.
> >> The parsed query can be persisted (with or without the nsc attached) to
> >> avoid the overhead of parsing the query string each time you run the
> >> query
> >> ...
> >>
> >>
> >> _______________________________________________
> >> Glass mailing list
> >>
>
> > Glass at .gemtalksystems
>
> >> http://lists.gemtalksystems.com/mailman/listinfo/glass
> >>
> >
> >
> >
> > --
> > Mariano
> > http://marianopeck.wordpress.com
> >
> > _______________________________________________
> > Glass mailing list
>
> > Glass at .gemtalksystems
>
> > http://lists.gemtalksystems.com/mailman/listinfo/glass
>
>
>
>
>
> --
> View this message in context: http://forum.world.st/There-
> is-really-no-way-to-have-an-ordered-sorted-collection-
> together-with-indexes-tp4959121p4959196.html
> Sent from the GLASS mailing list archive at Nabble.com.
> _______________________________________________
> Glass mailing list
> Glass at lists.gemtalksystems.com
> http://lists.gemtalksystems.com/mailman/listinfo/glass
>
--
Mariano
http://marianopeck.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/glass/attachments/20170809/bf45680d/attachment.html>
More information about the Glass
mailing list