[Glass] Possible Bug: String>>#= treats nulls as a terminator

Dale Henrichs via Glass glass at lists.gemtalksystems.com
Sat Jan 27 09:18:27 PST 2018


Monty,

Good points ... this "unexpected" behavior of Unicode strings with 
respect to control characters has been hard for us to grapple with 
internally as well, but this is unicode being unicode. I did notice that 
with the exception of code point 173, all of the code points you list 
are indeed control characters according the Unicode character table[1].

Code point 173 is a "Soft Hypen"[2] and doesn't really seem to fit the 
description of a control character, so I'm now curious if we might have 
a bug here, either in our implementation, the implementation of libICU 
or my understanding:)

I'm curious how you ran across this behavior? The control characters 
wouldn't seem to be a normal part of strings intended for display ...

I'm asking because if there is a use case for providing the old literal 
byte comparison operators we can make them available.

Dale

[1] https://unicode-table.com/en/#control-character
[2] https://unicode-table.com/en/00AD/

On 01/27/2018 01:57 AM, monty via Glass wrote:
> My example and thread title were wrong. It skips null *and* various control chars entirely when comparing:
> (0 to: 255) select: [:each |
> 	(String with: $a with: $b) =
> 		(String with: $a with: each asCharacter with: $b)]
>
> which yields:
> anArray( 0, 1, 2, 3, 4, 5, 6, 7, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 128, 129, 130, 131, 132, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 173)
>
> The GS Prog Guide (p. 77) says the ICU lib handles string comparisons internally, and it seems to ignore these characters for the sake of normalization.
>
> But that means it's possible for two Strings to be #= while having different #sizes and indexable characters, and that comparisons between Strings containing binary data aren't reliable, and that other String methods aren't consistent with #=:
> | one two |
> one := String with: $a with: 0 asCharacter with: $b.
> two := String with: $a with: $b.
> one = two
> 	and: [(one at: 1 equals: two) not
> 		and: [(two at: 1 equals: one) not]]
>
> And since GsFile #next and #contents are character based:
> (GsFile open: 'bin.one' mode: 'wb' onClient: false)
> 	nextPutAll: #[100 25 200];
> 	close.
> (GsFile open: 'bin.two' mode: 'wb' onClient: false)
> 	nextPutAll: #[100 200];
> 	close.
> (GsFile open: 'bin.one' mode: 'rb' onClient: false) contents =
> 	(GsFile open: 'bin.two' mode: 'rb' onClient: false) contents.
>
> Consider this more as a "heads-up" for users than a bug report, since this is apparently the intended, documented behavior.
>
>> Sent: Friday, January 26, 2018 at 2:20 AM
>> From: "monty via Glass" <glass at lists.gemtalksystems.com>
>> To: glass at lists.gemtalksystems.com
>> Subject: [Glass] Possible Bug: String>>#= treats nulls as a terminator
>>
>> Is this correct?
>>
>> (String with: 12 asCharacter with: 0 asCharacter) =
>>      (String with: 12 asCharacter with: 0 asCharacter with: 32 asCharacter)
>>
>> Other string methods, like #copyAfter:, don't treat null the same way.
>> _______________________________________________
>> Glass mailing list
>> Glass at lists.gemtalksystems.com
>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>
> _______________________________________________
> Glass mailing list
> Glass at lists.gemtalksystems.com
> http://lists.gemtalksystems.com/mailman/listinfo/glass



More information about the Glass mailing list