[GemStone-Smalltalk] Fw: Canonicalization + 2 Spaces

Keith Piraino keith_piraino at yahoo.com
Sun Jun 28 07:30:46 PDT 2020


 Greetings fellow ghosts of GemStone past. It's been nice seeing some of your names in my inbox again after all these years. 
SmallDateAndTime sounds like a great idea. Back in 2004 I posted on predecessor to this list about canonicalizing dates and curve descriptors. I don't think there's an archive available anywhere for those older posts so including it again below in case anyone's interested.
BTW, the work described was on JPMorgan's Kapital system.  See slide 8 of ESUG talk a few months after post below - http://www.esug.org/data/ESUG2004/ValueOfSmalltalk.pdf
-Keith

   ----- Forwarded Message ----- From: Keith Piraino <keith_piraino at yahoo.com>To: "gemstone-smalltalk at earth.lyris.net" <gemstone-smalltalk at earth.lyris.net>Sent: Monday, January 12, 2004, 05:52:22 PM ESTSubject: Canonicalization + 2 Spaces
 I’ve been working on canonicalizing some objects recently in a
GemStone/VisualWorks system. I haven’t seen discussion in the past
about some of the 2 space issues that come up in this context when
you’re dealing with both GemStone and a client image. I’ll describe the
work we’ve done, and I’d be interested in comments from anyone about
how they’ve tackled similar issues… 

The first phase of this work involved dates. We ran a scan and found
that we had 15 million date instances in one of our databases, but they
really only represented 17,000 different days. The few hundred MB
wasted in our (much larger) databases isn’t great, but our bigger
concern was the tens of MB of memory these duplicate instances took up
in images when we faulted them in. The canonicalization on each side
was simple enough. The range of dates we’re interested in is 200 years,
amounting to about 70K instances. In GS we pre-build an array of the
canonical instances and the # days since January 1, 1901 is the index
into the array. In VW we have a similar array that is lazily populated
as needed. 

The tricky part is the mapping between the two and supporting
“independent creation” in VW. We don’t want to have to fault in all 70K
dates up front or worse have our VW date creation code forwarding into
the gem at arbitrary points to find the right instance. Tests faulting
all the dates added 30 seconds to our login time, which is definitely
not desirable. Instead we override the faulting and flushing behavior
on dates. We override #newFromGSObjectReport: and parse the report to
get the offset into the canonical array. If a corresponding VW date has
already been created we map to that instead of the instance in the
report. If it’s a new instance to that image we just ensure it ends up
in the local canonical array. 

We hook into flushing by using #asGSObjectInSession:. During the first
flush we use #privatePerform: to retrieve the encoded oops of all 70K
canonical dates. As individual dates are flushed we can then create the
appropriate GbsObjects and map them. This way even if the date instance
is created locally in VW it will always end up resolving to the single
corresponding canonical instance in GS. Faulting 70K encoded oops only
takes about a second since they’re SmallIntegers. We process the report
ourselves to avoid intermediate GbsObjects which speeds things up a
little more. 

The next phase of this worked involved objects that function as
multi-part keys in our application. They hold more complex data but are
always uniquely identified by a name (Symbol). Years of application
code have relied on the fact that these objects are canonical, and
you’ll never have more than one with the same name. Comparisons use ==,
not =. Until recently this canonicalization was maintained by storing
the instances in multi-level dictionaries that were faulted into the
image. This approach became problematic as the number of instances
increased and a new requirement came along to allow new keys to be
generated at any time, not at defined points. 

Some of the basics of our new solution are similar to the date
approach. There’s a canonical structure on each side (dictionary) that
is not replicated. When we fault an object we check for an existing VW
instance and if necessary map to that instead. One new wrinkle on the
faulting side is stubs. Since the application relies so heavily on
identity comparison we have to handle the case where the object was
created locally and registered in the image side dictionary, and later
we attempt to create and map a stub for the real persistent object. If
we allow the stub to be created we effectively have two of our keys
with the same name that are no longer identical. To prevent this we’ve
hacked even more deeply into core replication methods like
#clientObject:namedBuffer:indexableBuffer:slot:lookupOop:forwarder:secondPassLog:cached:keeper:.
If we’re about to create a stub for an instance one of our key classes
we first use fetch operations to retrieve the name, which is one of the
inst var values. (Note that #privateExecute: at this point can cause
moreTraversal errors). We then check the local dictionary and if an
instance with that name has already been created we resolve the
replication to that instance instead of creating a stub. Otherwise we
allow the stub to be created, but then add the stub to the local
dictionary. 

On the flushing side we use #privateExecute: to see if the object
exists already in the persistent dictionary. If it does we return the
encoded oop without actually reading the object’s data page using
#_instVarAsEncodedOop: (thanks Norm). From there we can just create a
GbsObject and map just like for dates. If the object doesn’t exist
things get trickier. We have to ensure that it gets added to the
persistent GS dictionary. In order to get this right in the case of
things like concurrency conflicts and various failure scenarios
knowledge of these lazily flushed instances had to be embedded into our
transaction framework. 

The end result is that these objects can just be created on the fly in
any image (or gem for that matter) and we always guarantee
canonicalization in both spaces. We’re happy with the result but
curious if anyone has addressed this in a way that involved diving less
deeply into GBS…

Thanks - Keith



__________________________________
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.gemtalksystems.com/mailman/private/gemstone-smalltalk/attachments/20200628/76498d52/attachment.htm>


More information about the GemStone-Smalltalk mailing list