[GemStone-Smalltalk] Fw: Canonicalization + 2 Spaces

Paul Baumann plbaumann at gmail.com
Sun Jun 28 09:13:45 PDT 2020

Hi Keith,

Nice to hear from you.

I had also implemented Date instance canonicalization much as you
described. It was a big improvement for the project at the time because our
application model had previously replicated many dates into GBS caches.

Didn't gemtalk make Date instances immediate objects since maybe around GS
3.0? For DateTime replication tuning the storage of integers was better
than canonical instances but required consistent use of conversion
accessors. Application code in GS could potentially consume many oops for
the transient DateTime instances that might get created but in practice
that was not a problem where tuned this way. Having DateTime instances as
immediates would reduce the chance that the accessor trick wouldn't be most
efficient for some application usage scenarios in gem.

Oh, speaking of replication tuning, one of the big improvements I achieved
was through self generating custom replication specs. In tuning mode the
replication was one level deep (except some collections) and stubs that got
faulted for a declared context/replication would get recorded into
contextual replication specs. It was an iterative tuning process but the
result was that replication would include no more than was needed and had
only deliberate stub faulting later. Avoiding growth of GBS caches had
become mission critical, and this achieved it until application code could
be reimplemented to run in gems alone. The loss of efficient copy
replication had made this necessary, but it was also used tune VW+GBS
applications to consistently offer near instant response times.

Looking back, one of the biggest problems with smalltalk in general was
that people could too easily write inefficient code. For me it became a
full time job cleaning and tuning code that got produced at a fast rate for
releases. At least the riggors of C coding brought attention to efficiency
from the start. I'm happy to be retired now.

Paul Baumann

On Sun, Jun 28, 2020, 10:35 AM Keith Piraino via GemStone-Smalltalk <
gemstone-smalltalk at lists.gemtalksystems.com> wrote:

> Greetings fellow ghosts of GemStone past. It's been nice seeing some of
> your names in my inbox again after all these years.
> SmallDateAndTime sounds like a great idea. Back in 2004 I posted on
> predecessor to this list about canonicalizing dates and curve descriptors.
> I don't think there's an archive available anywhere for those older posts
> so including it again below in case anyone's interested.
> BTW, the work described was on JPMorgan's Kapital system.  See slide 8 of
> ESUG talk a few months after post below -
> http://www.esug.org/data/ESUG2004/ValueOfSmalltalk.pdf
> -Keith
> ----- Forwarded Message -----
> *From:* Keith Piraino <keith_piraino at yahoo.com>
> *To:* "gemstone-smalltalk at earth.lyris.net" <
> gemstone-smalltalk at earth.lyris.net>
> *Sent:* Monday, January 12, 2004, 05:52:22 PM EST
> *Subject:* Canonicalization + 2 Spaces
> I’ve been working on canonicalizing some objects recently in a
> GemStone/VisualWorks system. I haven’t seen discussion in the past
> about some of the 2 space issues that come up in this context when
> you’re dealing with both GemStone and a client image. I’ll describe the
> work we’ve done, and I’d be interested in comments from anyone about
> how they’ve tackled similar issues…
> The first phase of this work involved dates. We ran a scan and found
> that we had 15 million date instances in one of our databases, but they
> really only represented 17,000 different days. The few hundred MB
> wasted in our (much larger) databases isn’t great, but our bigger
> concern was the tens of MB of memory these duplicate instances took up
> in images when we faulted them in. The canonicalization on each side
> was simple enough. The range of dates we’re interested in is 200 years,
> amounting to about 70K instances. In GS we pre-build an array of the
> canonical instances and the # days since January 1, 1901 is the index
> into the array. In VW we have a similar array that is lazily populated
> as needed.
> The tricky part is the mapping between the two and supporting
> “independent creation” in VW. We don’t want to have to fault in all 70K
> dates up front or worse have our VW date creation code forwarding into
> the gem at arbitrary points to find the right instance. Tests faulting
> all the dates added 30 seconds to our login time, which is definitely
> not desirable. Instead we override the faulting and flushing behavior
> on dates. We override #newFromGSObjectReport: and parse the report to
> get the offset into the canonical array. If a corresponding VW date has
> already been created we map to that instead of the instance in the
> report. If it’s a new instance to that image we just ensure it ends up
> in the local canonical array.
> We hook into flushing by using #asGSObjectInSession:. During the first
> flush we use #privatePerform: to retrieve the encoded oops of all 70K
> canonical dates. As individual dates are flushed we can then create the
> appropriate GbsObjects and map them. This way even if the date instance
> is created locally in VW it will always end up resolving to the single
> corresponding canonical instance in GS. Faulting 70K encoded oops only
> takes about a second since they’re SmallIntegers. We process the report
> ourselves to avoid intermediate GbsObjects which speeds things up a
> little more.
> The next phase of this worked involved objects that function as
> multi-part keys in our application. They hold more complex data but are
> always uniquely identified by a name (Symbol). Years of application
> code have relied on the fact that these objects are canonical, and
> you’ll never have more than one with the same name. Comparisons use ==,
> not =. Until recently this canonicalization was maintained by storing
> the instances in multi-level dictionaries that were faulted into the
> image. This approach became problematic as the number of instances
> increased and a new requirement came along to allow new keys to be
> generated at any time, not at defined points.
> Some of the basics of our new solution are similar to the date
> approach. There’s a canonical structure on each side (dictionary) that
> is not replicated. When we fault an object we check for an existing VW
> instance and if necessary map to that instead. One new wrinkle on the
> faulting side is stubs. Since the application relies so heavily on
> identity comparison we have to handle the case where the object was
> created locally and registered in the image side dictionary, and later
> we attempt to create and map a stub for the real persistent object. If
> we allow the stub to be created we effectively have two of our keys
> with the same name that are no longer identical. To prevent this we’ve
> hacked even more deeply into core replication methods like
> #clientObject:namedBuffer:indexableBuffer:slot:lookupOop:forwarder:secondPassLog:cached:keeper:.
> If we’re about to create a stub for an instance one of our key classes
> we first use fetch operations to retrieve the name, which is one of the
> inst var values. (Note that #privateExecute: at this point can cause
> moreTraversal errors). We then check the local dictionary and if an
> instance with that name has already been created we resolve the
> replication to that instance instead of creating a stub. Otherwise we
> allow the stub to be created, but then add the stub to the local
> dictionary.
> On the flushing side we use #privateExecute: to see if the object
> exists already in the persistent dictionary. If it does we return the
> encoded oop without actually reading the object’s data page using
> #_instVarAsEncodedOop: (thanks Norm). From there we can just create a
> GbsObject and map just like for dates. If the object doesn’t exist
> things get trickier. We have to ensure that it gets added to the
> persistent GS dictionary. In order to get this right in the case of
> things like concurrency conflicts and various failure scenarios
> knowledge of these lazily flushed instances had to be embedded into our
> transaction framework.
> The end result is that these objects can just be created on the fly in
> any image (or gem for that matter) and we always guarantee
> canonicalization in both spaces. We’re happy with the result but
> curious if anyone has addressed this in a way that involved diving less
> deeply into GBS…
> Thanks - Keith
> __________________________________
> Do you Yahoo!?
> Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
> http://hotjobs.sweepstakes.yahoo.com/signingbonus
> _______________________________________________
> GemStone-Smalltalk mailing list
> GemStone-Smalltalk at lists.gemtalksystems.com
> https://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.gemtalksystems.com/mailman/private/gemstone-smalltalk/attachments/20200628/7a8d07fb/attachment-0001.htm>

More information about the GemStone-Smalltalk mailing list