[GemStone-Smalltalk] objectForOop: question about some errors we get + how to implement a weak reference?

Mon Jul 25 08:29:40 PDT 2022

Hi Martin,

Thanks for that extensive feedback!
I think this is going to help us in finding a solution here. 

Besides checking for oopIsDead: first, we’re thinking of completely avoiding the problem by keeping the ‘deleted’ objects connected for the time that the user log displays actions in the past. We can then remove them later on when it’s safe to do so.

Given that I noticed that the problem disappeared without a gem crash/restart: do you think it would also be ok to just capture the ObjDoesNotExist error and do nothing with it? 
That would allow us to prevent an entire crash of the application session while we create a solution to prevent the error altogether.

Because it does look like we are doing things in such a way that make it is more likely for this case to appear….

The objectForOop: is used in code that is continuously executed (i.e. every minute for every user session in our Seaside app) because it displays the list of user actions that happened in the last xxx hours in the application.
Because ‘delete actions’ are definitely on that ‘user action log’, the oops that are retrieved can be of disconnected/dead objects.
We notice that many of our users just keep their app session logged-in overnight and therefore this gets executed many times concurrently with an MFC+reclaim run.
Finally, the Seaside error handler logs the continuation of the error to the objectLog, so a commit happens when the dead object retrieved via the objectForOop: is on the stack. In fact, because the continuation is saved, it is probably even committed again?

Best regards
Johan

> On 23 Jul 2022, at 23:52, Martin McClure <martin.mcclure at gemtalksystems.com> wrote:
> 
> Hi Johan,
> 
> This does sound very much like our internal issue 50024. It was just a couple of months ago that I first suspected such a scenario was possible, and just a couple of weeks ago that I filed the issue. I'm very sorry to hear that it is not just theoretical.
> 
> From what you've said, what may be happening is a sequence like this:
> 1) MFC completes, and voting starts
> 2) Gem A votes
> 3) Gem A retrieves disconnected object B with objectForOop:.
> 4) Object B is promoted to dead, since it was disconnected and not voted down by any session. Object C referenced by object B is also promoted to dead at the same time.
> 5) The reclaim gem commits the removal of object C (and possibly also object B) from the object table.
> 6) Gem A aborts or commits. Object C no longer exists in Gem A's view of the object table. However, object B still exists in Gem A's private memory, but object C does not.
> 7) Gem A once again retrieves object B with objectForOop:. This succeeds because it is still Gem A's private memory.
> 8) Gem A sends a message to object B. 
> 9) The resulting Smalltalk execution retrieves an instance variable of object B that refers to object C. Object C does not exist in Gem A's private memory or in Gem A's view of the object table, so the result is an ObjDoesNotExistError.
> 
> This specific scenario can be avoided by sending _oopIsDead: before sending objectForOop: in step 7, and ending the sequence there if the object has been promoted to dead. Sending _oopIsDead: only after _objectForOop: answers nil may not help -- I haven't tested this but I think that in the case where the object is still in the gem's memory the _objectForOop: will succeed even if the object is dead. As is documented, _oopIsDead: is slower than objectForOop: since it must consult the stone. Whether this will noticeably affect your application depends on how often you are executing objectForOop:. Solutions that we're considering will also have a similar slowdown for objectForOop:, though probably only during the period between the end of MFC and end of reclaim.
> 
> However, sending _oopIsDead: before each objectForOop: does not make it *entirely* safe. You could leave step 7 out of the scenario above and still have the same outcome. At step 3, _oopIsDead: would answer false, and the objectForOop: would succeed. If that reference to object B was stored in a transient object or a stack temporary variable that is used to send the message in step 8, you could still get the ObjDoesNotExist error.
> To avoid this, you could, besides sending _oopIsDead: before each objectForOop:, ensure that no references to an object retrieved through objectForOop: are retained on the stack or in a transient object across a commit or abort.
> 
> I hope this helps you reach a solution that works in your application. We're still discussing ways to fix this in future GemStone versions.
> 
> Regards,
> -Martin
> 
> On 7/22/22 05:29, Johan Brichau via GemStone-Smalltalk wrote:
>> Hi,
>> 
>> We use the objectForOop: method to implement weak references but are hitting some errors while using it. 
>> I want to better understand what we are seeing.
>> The following is in Gemstone 3.4.5
>> 
>> First off: we understand the dangers mentioned in the method's comment, well… to some extent otherwise I would not ask this question:-). To counter trouble with oop reuse, we also store the object creation time together with the oop (and in the referenced object). A check on class and creation timestamp allows us to guarantee when we get an object back from objectForOop: that it was the one we originally referenced or not.
>> 
>> The reason we use this concept is to implement weak references. We store log items in the database which reference a series of objects. The log item will stay in the database for a very long time but should not keep the referenced objects from being garbage collected. Hence, we store the oop (+ the creation timestamp, as mentioned) instead of the object in the log items, which would otherwise keep these from being collected in an MFC.
>> 
>> The trouble is we sometimes encounter an error after an MFC cycle has concluded.
>> The _objectForOop: method throws a ‘primitive failed’ that the object with object ID xxxx does not exist. See stack trace below.
>> 
>> What is interesting is that the oop in the error message is _not_ the argument passed to _objectForOop:. Instead, it is the oop of an object held in an instVar of the object we try to retrieve.
>> It is an instance of DateAndTime. Thus we seem to retrieve a dead object that can still be retrieved but of which the objects referenced by it are already collected and that this errors the primitive?
>> We tried to counter the problem of retrieving dead objects by using _oopIsDead: but only do so _after_ invoking objectForOop:. 
>> Perhaps the straightforward solution is to first check _oopIsDead: and only then use _objectForOop: ? For now, we have wrapped it with an exception handler to handle the error.
>> 
>> Next, we saw this problem in Gemstone 2.4.4.1 as well but there it would crash the entire gem. Since our upgrade to GemStone 3.4.5, this error no longer crashes the gem and is captured by the error handler.
>> This, however, has now shown that because the gem continues working, the problem may occur even many hours after the MFC cycle. Is this to be expected? I understand how in between a collect and a reclaim, this may happen but not how it can still occur many hours later?
>> 
>> And finally… we are aware this is all rather hacking. Now that we finally are moving up towards the latest version of GemStone: are there better ways to implement weak references in GemStone? ;-)
>> 
>> Thanks for any thoughts!
>> Johan
>> 
>> ----------- Continuation saved to object log ERROR Encountered: 2022-07-22T07:29:21.16192007064819+02:00
>> a InternalError occurred (error 2101), The object with object ID 5754897665 does not exist.
>> 1 GRGemStonePlatform >> logError:title:shouldCommit: @3 line 4  [GsNMethod 551003905]
>> 2 GRGemStonePlatform >> logError:title: @2 line 3  [GsNMethod 550999041]
>> 3 NPGemStoneErrorHandler (WAErrorHandler) >> saveExceptionContinuation: @9 line 6  [GsNMethod 603180545]
>> 4 NPGemStoneErrorHandler >> handleDefault: @6 line 7  [GsNMethod 632646401]
>> 5 NPGemStoneErrorHandler (WAErrorHandler) >> handleError: @2 line 2  [GsNMethod 603181313]
>> 6 NPGemStoneErrorHandler (WAErrorHandler) >> handleGemStoneException: @5 line 4  [GsNMethod 603182081]
>> 7 NPGemStoneErrorHandler (WAHtmlHaltAndErrorHandler) >> handleException: @2 line 2  [GsNMethod 613492481]
>> 8 NPGemStoneErrorHandler >> handleException: @6 line 5  [GsNMethod 632646145]
>> 9 [] in WAExceptionHandler >> handleExceptionsDuring: @11 line 5  [GsNMethod 1320949761]
>> 10 ExecBlock0 (ExecBlock) >> on:do: @3 line 44  [GsNMethod 53734657]
>> 11 [] in WAExceptionHandler >> handleExceptionsDuring: @7 line 8  [GsNMethod 968955649]
>> 12 [] in ExecBlock >> on:do: @16 line 53  [GsNMethod 65134849]
>> 13 InternalError (AbstractException) >> _executeHandler: @8 line 11  [GsNMethod 61046017]
>> 14 InternalError (AbstractException) >> _signalFromPrimitive: @1 line 1  [GsNMethod 61048321]
>> 15 NPLogEntry >> privateObjectAtUniqueIdentifier:in:ifAbsent: @1 line 1  [GsNMethod 651626241]
>> 16 NPLogEntry >> originalObjectInDB:ifAbsent: @3 line 3  [GsNMethod 651630337]
>> 17 NPLogEntry >> originalObjectInDB: @2 line 2  [GsNMethod 651613953]
>> 18 NPLogEntry >> concernedObjectsInDB: @5 line 6  [GsNMethod 651637249]
>> 19 NPLoggingManagerForGemStone (NPLoggingManager) >> isCurrentUserFollowingEntry: @3 line 3  [GsNMethod 623065857]
>> …
>> 
>> 
>> 
>> _______________________________________________
>> GemStone-Smalltalk mailing list
>> GemStone-Smalltalk at lists.gemtalksystems.com <mailto:GemStone-Smalltalk at lists.gemtalksystems.com>
>> https://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk <https://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/archives/gemstone-smalltalk/attachments/20220725/7fbd6da6/attachment.htm>