[Glass] Grrrr cannot migrate (class rename with subclasses and with a name of a deleted class)

Mariano Martinez Peck via Glass glass at lists.gemtalksystems.com
Wed Sep 9 06:24:41 PDT 2015


On Tue, Sep 8, 2015 at 7:00 PM, Dale Henrichs <
dale.henrichs at gemtalksystems.com> wrote:

> Mariano,
>
> I just talked with engineering and they concur that this is likely to be a
> malloc failure and the this area of the code has been substantially
> reworked in recent releases to attempt to reduce the amount of RAM consumed
> during list instances ...
>
> So for 3.1.0.6, you might try this operation with more RAM available or
> perhaps just adding more swap space will allow the malloc to complete ...
> running statmon with a 1 second interval and looking at the heap
> consumption of the gem, might show  growth and a "sudden decline" when the
> malloc fails ...
>

Hi Dale,

Just for the record, I tried with this scenario:

[marianopeck at quuveserver1 ~]$ free -m
              total        used        free      shared  buff/cache
available
Mem:           8014         388        6850         359         775
 7205
Swap:         16639           0       16639

And still didn't work. Note that I have 7GB of RAM free. At the end, when
the system crashed, this was the resulting state:

[marianopeck at quuveserver1 ~]$ free -m
              total        used        free      shared  buff/cache
available
Mem:           8014         338        1316         973        6359
 6639
Swap:         16639           0       16639


Anyway, no problem, I would assume this is a problem in 3.1.0.6 and
hopefully I will never need to list instances / migrate this class until I
am in 3.2/3.3...

Thanks for the effort!



>
>
> Dale
>
>
> On 09/08/2015 02:51 PM, Dale Henrichs wrote:
>
> Thanks Mariano - yeah the args look okay - At this point, I'm suspicious
> that we're running out of memory during the scan and not failing
> "gracefully", but no evidence of that quite yet ...
>
> Dale
>
> On 09/08/2015 02:00 PM, Mariano Martinez Peck wrote:
>
> OK Dale, I found out which was the problem, the code of printing should
> have been placed inside the scanBlock. Anyway..I did that, and then it did
> not work either because gem was crashing and so I couldn't see the log from
> GemTools. So I then replaced Transcript show: with "GsFile gciLogServer: "
> and now I got it the log:
>
> --LIST-FAILURE--_scanPomWithMaxThreads failure: 1 95 anIdentitySet(
> FaSecurityAdjustedClosingPriceRecord) 0 0 nil
>
> Doesn't look like wrong, does it?
>
> Cheers,
>
>
> On Tue, Sep 8, 2015 at 5:03 PM, Dale Henrichs <
> dale.henrichs at gemtalksystems.com> wrote:
>
>> Just rename the temps to ones that compile:)
>>
>> This time around we are not suspecting that blockClosures and block temps
>> are the problem, we are just trying to get the args to the primitive call
>> when it fails, so we can trace things further in the C code and try
>> determine the code path that leads to a nil return value ...
>>
>> Dale
>>
>>
>> On 09/08/2015 12:49 PM, Mariano Martinez Peck wrote:
>>
>>
>>
>> On Tue, Sep 8, 2015 at 4:26 PM, Dale Henrichs <
>> dale.henrichs at gemtalksystems.com> wrote:
>>
>>> Mariano,
>>>
>>> Sorry for the delay, but I'm back in the office today and what we would
>>> like to do is capture the args that are being used for the primitive so
>>> replaicing the `memOnlyBool` block logic in the listInstances:.... method
>>> with the following will help us get them:
>>>
>>>
>> Hi Dale, no worries, thanks for pushing!
>>
>>
>>>   memOnlyBool
>>>     ifFalse: [
>>>       scanBlk := [ :scanSetThisTime | | ret sKind |
>>>       sKind := (directoryString ifNotNil:[ 2 ] ifNil:[ 0 ]).
>>>       ret := self
>>>         _scanPomWithMaxThreads: maxThreads
>>>         waitForLock: 60
>>>         pageBufSize: 8
>>>         percentCpuActiveLimit: aPercentage
>>>         identSet: scanSetThisTime
>>>         limit: aSmallInt
>>>         scanKind: sKind
>>>         toDirectory: directoryString ].
>>>     ret ifNil: [
>>>        Transcript cr; show: '_scanPomWithMaxThreads failure: ',
>>>                 maxThreads printString, ' ',
>>>                 aPercentage printString, ' ',
>>>                 scanSetThisTime printString, ' ',
>>>                 aSmallInt printString, ' ',
>>>                 sKind printString, ' ',
>>>                 directoryString printString ].
>>>     ret ].
>>>
>>>
>> This doesn't compile because 'sKind' was defined inside the 'scanBlk' and
>> 'scanSetThisTime' is the argument to the closure. Since this problem was
>> related to temp vars, I am not sure which is the correct solution.
>>
>> Let me know,
>>
>>
>>
>>> We thought the problem might have been related to the method temp
>>> reference for `(directoryString ifNotNil:[ 2 ] ifNil:[ 0 ])`, but since the
>>> prim is still failing with that expression inlined there must be a
>>> different (less obvious) failure mechanism.
>>>
>>> Dale
>>>
>>>
>>> On 09/01/2015 11:45 AM, Mariano Martinez Peck wrote:
>>>
>>> OK then. Perfect. Let me know.
>>> Thanks!
>>>
>>> On Tue, Sep 1, 2015 at 3:28 PM, Dale Henrichs <
>>> dale.henrichs at gemtalksystems.com> wrote:
>>>
>>>>
>>>>
>>>> On 9/1/15 10:59 AM, Mariano Martinez Peck wrote:
>>>>
>>>>
>>>> On Tue, Sep 1, 2015 at 2:14 PM, Dale Henrichs <
>>>> dale.henrichs at gemtalksystems.com> wrote:
>>>>
>>>>> Could you arrange to get a stack trace from your most recent error and
>>>>> a listing of the method that you used ... I want to make sure that we
>>>>> understand the failure mechanism ... if it is related to block temps then
>>>>> it is fixed in 3.2.x, but if it is not related to block temps then it could
>>>>> be present in later versions of GemStone and we'll want to characterize the
>>>>> problem .... Obviously, this particular call doesn't reproduce very
>>>>> frequently (I wasn't able to make it break with trivial examples) so there
>>>>> is likely to be something a little more complex going on ...
>>>>>
>>>>>
>>>> Dale, the exception I get is the one I original shared with you and you
>>>> got to the same conclusion as I did.
>>>>
>>>> What I can offer you is this that I log the error (continuation) in the
>>>> object log and the provide you a user for the web user for our app and from
>>>> there I can allow you open a kind of Seaside debugger/inspector which will
>>>> be much richer than a plain string stack and at least you can also
>>>> print/inspect from there. I cannot send you the extent because its quite
>>>> big.
>>>>
>>>> If you think this is OK, then I please need you to ask you to only
>>>> share the login info with GemTalks engineer. Since the site is a bit on use
>>>> (but with a working extent) I must recover from backup and so the system
>>>> will be running with a "broken" extent for a while. No problem with this
>>>> but if this will be only a couple of hours or 1-2 day max. So if we will do
>>>> this, I would appreciate that you let me know when (you or the engineer)
>>>> would be available to take a look.
>>>>
>>>> Let me know if you want this.
>>>>
>>>> Thanks for the offer ... we might want to instrument up the method a
>>>> bit more instead of looking at a continuation ... so I will get back to you
>>>> ... I won't be in the office until Thursday, and that's when I will talk
>>>> things over with the engineer ...
>>>>
>>>> Dale
>>>>
>>>
>>>
>>>
>>> --
>>> Mariano
>>> http://marianopeck.wordpress.com
>>>
>>>
>>>
>>
>>
>> --
>> Mariano
>> http://marianopeck.wordpress.com
>>
>>
>>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>
>
>
>


-- 
Mariano
http://marianopeck.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/glass/attachments/20150909/444f3f8a/attachment-0001.html>


More information about the Glass mailing list