[Glass] case insensitive search broken for Unicode7

Pieter Nagel pieter at nagel.co.za
Mon Mar 31 05:15:52 PDT 2014


Hi Dale,

>    "it does not make sense to attempt to perfomed mixed comparisons
>     between Unicode* and *String instances"

I've been trying to wrap my head around that statement, and they way that
"Legacy comparison mode" and "Unicode comparison mode" seem to hinge
around whether a collator needs to be specified or defaulted.

To my mind the choice of collator is a totally orthogonal concern to
whether mixed comparisons "make sense". Well defined comparisons are
purely dependent on whether the _encoding_ of *String instances is known.

Unicode is a strict superset of all string encodings, and therefore mixed
comparisons between the two should be a well-defined operation, *assuming*
the encoding used by *String instances is known. No matter what character
appears in a *String instance, it will correspond to a Unicode character
and can thus be used as part of a comparison to a Unicode string.

The choice of collator has no bearing on whether comparison is a
well-defined operation or not. It only affects issues like
language-dependent conventions where, for example, Germans want "ß" to
sort equal to "ss".

How passing in a collator will transform a mixed String/Unicode comparison
from an error to not does not make sense to me. If the String instance's
encoding was known, the comparison was already well-defined and not really
an error in the first place. If the String's encoding is not known (or not
specified by the programmer), then the collator will be fed garbage input
- how should it know whether a specific byte 0xBC in the input was
supposed to mean "Vulgar Fraction One Quarter" (ISO-8859-1) or "Latin
Capital Ligature OE" (ISO-8859-15)?




More information about the Glass mailing list