[Glass] case insensitive search broken for Unicode7

Dale Henrichs dale.henrichs at gemtalksystems.com
Wed Apr 16 15:46:46 PDT 2014


Mariano,

You are correct that Unicode to Unicode should work and the error that you
are seeing is definitely a bug ...

You are also correct that for glass one should use 'Unicode comparison
mode' ...

The gotcha is that in 3.1 and earlier we allowed Unicode* and *String
instances to be intermixed. Basically we ignored the unicode-ness of
instances and used simple code point comparisons in certain circumstances
... the main implication of this is that sorted collections may not be
sorted correctly so one will have to be aware of this when upgrading to the
3.2 ...

I will be doing a number of experiments with GLASS upgrades to 3.2 in the
next week or so to see if I can identify any issues and come up with ways
to compensate ...

Dale


On Wed, Apr 16, 2014 at 1:24 PM, Mariano Martinez Peck <
marianopeck at gmail.com> wrote:

>
>
>
> On Fri, Mar 28, 2014 at 2:55 PM, Dale Henrichs <
> dale.henrichs at gemtalksystems.com> wrote:
>
>> Pieter,
>>
>> The engineer responsible for the ICU implementation has been on vacation
>> all week, so I haven't had a chance to  discuss the
>> #_findString:startingAt:ignoreCase: issue with him ...
>>
>> With that said we have been "discovering" things about mixed
>> Unicode[7|16|32] (where a collator is always used) and
>> [DoubleByte|QuadByte]String classes and the basic conclusion that we've
>> come to is that:
>>
>>    "it does not make sense to attempt to perfomed mixed comparisons
>>     between Unicode* and *String instances"
>>
>>
>>
> Dale,
>
> While I understand such conclusion, I think this is a different discussion
> than the one I originally pasted, isn't it?
>
> ('Newmont' asUnicodeString _findString: 'newm' asUnicodeString startingAt:
> 1 ignoreCase: true) > 0
>
> answers false and in this case I am not mixing anything...both are
> Unicode7. So what I mean is that this is broken even without mixing.
>
> Also...note that users may be comparing/mixing Unicode* and String*
> WITHOUT knowing. For example...in my case, I don't know how but I get
> Unicode7 from a combo list from a magritte form.... (of course, in Pharo I
> get a String) and then I search over that result.... So if you were to
> choose "Legacy comparison mode" for GLASS...then we should at least avoid
> using Unicode classes in Seaside/magritte. Otherwise it would be a pain to
> maintain a working system for Pharo and GemStone.  So..this is just to
> agree that  "Unicode comparison mode" would be better for GLASS ?
>
> Thanks,
>
>
>
>> Dale
>>
>>
>> On Thu, Mar 27, 2014 at 5:19 AM, Pieter Nagel <pieter at nagel.co.za> wrote:
>>
>>> The currently buggy UnicodeX >> #_findString:startingAt:ignoreCase:
>>> delegates
>>> to the ICU collator only in the case where ignoreCase is true. Is this
>>> correct?
>>
>>
>>> The reasoning seems to be that a case-sensitive match in Unicode can be
>>> done
>>> by just comparing the byte values of the two strings for identity, as the
>>> super
>>> implementation presumably does. But since some letters can be decomposed
>>> into
>>> multiple codepoints in canonical and non-canonical ways[1], that's not
>>> true. And
>>> I suppose surrogate escapes factor in here as true.
>>>
>>> To be honest, I'm not familiar enough with ICU to know whether it
>>> (optionally?)
>>> takes character decomposition into account on comparison, but I would
>>> guess that
>>> issues like these are precisely why an industrial-strength Unicode
>>> handling
>>> library was needed in the first place.
>>>
>>> What does this look like in 3.2? Does your testing include comparisons
>>> where the
>>> two strings have characters decomposed in different ways?
>>>
>>> [1] I.e. U+00E9 LATIN SMALL LETTER E WITH ACUTE can also be decomposed as
>>> U+0065 LATIN SMALL LETTER E + U+0301 COMBINING ACUTE ACCENT
>>>
>>> _______________________________________________
>>> Glass mailing list
>>> Glass at lists.gemtalksystems.com
>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>>
>>
>>
>> _______________________________________________
>> Glass mailing list
>> Glass at lists.gemtalksystems.com
>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>
>>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/glass/attachments/20140416/4b79a4ee/attachment-0001.html>


More information about the Glass mailing list