[Glass] Possible Bug: String>>#= treats nulls as a terminator

Dale Henrichs via Glass glass at lists.gemtalksystems.com
Mon Jan 29 08:44:05 PST 2018



On 01/29/2018 01:16 AM, monty via Glass wrote:
> I was writing tests for stream converter classes that do encoding/decoding from various encodings. But any use of Strings to store binary data is a use case. ByteArray is more appropriate, but GsFile is still byte-character based by default, even when you open files in binary mode (which I assume just disables line ending normalization on Windows).
This seems like a GemStone bug at the end of the day ... ByteArray and 
Utf8 are the two classes that _should_ be used, but if GsFile is not 
handling them well, then that is an issue for us ... I will check this 
out ...

Thanks,

Dale

>
>> Sent: Saturday, January 27, 2018 at 12:18 PM
>> From: "Dale Henrichs via Glass" <glass at lists.gemtalksystems.com>
>> To: glass at lists.gemtalksystems.com
>> Subject: Re: [Glass] Possible Bug: String>>#= treats nulls as a terminator
>>
>> Monty,
>>
>> Good points ... this "unexpected" behavior of Unicode strings with
>> respect to control characters has been hard for us to grapple with
>> internally as well, but this is unicode being unicode. I did notice that
>> with the exception of code point 173, all of the code points you list
>> are indeed control characters according the Unicode character table[1].
>>
>> Code point 173 is a "Soft Hypen"[2] and doesn't really seem to fit the
>> description of a control character, so I'm now curious if we might have
>> a bug here, either in our implementation, the implementation of libICU
>> or my understanding:)
>>
>> I'm curious how you ran across this behavior? The control characters
>> wouldn't seem to be a normal part of strings intended for display ...
>>
>> I'm asking because if there is a use case for providing the old literal
>> byte comparison operators we can make them available.
>>
>> Dale
>>
>> [1] https://unicode-table.com/en/#control-character
>> [2] https://unicode-table.com/en/00AD/
>>
>> On 01/27/2018 01:57 AM, monty via Glass wrote:
>>> My example and thread title were wrong. It skips null *and* various control chars entirely when comparing:
>>> (0 to: 255) select: [:each |
>>> 	(String with: $a with: $b) =
>>> 		(String with: $a with: each asCharacter with: $b)]
>>>
>>> which yields:
>>> anArray( 0, 1, 2, 3, 4, 5, 6, 7, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 128, 129, 130, 131, 132, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 173)
>>>
>>> The GS Prog Guide (p. 77) says the ICU lib handles string comparisons internally, and it seems to ignore these characters for the sake of normalization.
>>>
>>> But that means it's possible for two Strings to be #= while having different #sizes and indexable characters, and that comparisons between Strings containing binary data aren't reliable, and that other String methods aren't consistent with #=:
>>> | one two |
>>> one := String with: $a with: 0 asCharacter with: $b.
>>> two := String with: $a with: $b.
>>> one = two
>>> 	and: [(one at: 1 equals: two) not
>>> 		and: [(two at: 1 equals: one) not]]
>>>
>>> And since GsFile #next and #contents are character based:
>>> (GsFile open: 'bin.one' mode: 'wb' onClient: false)
>>> 	nextPutAll: #[100 25 200];
>>> 	close.
>>> (GsFile open: 'bin.two' mode: 'wb' onClient: false)
>>> 	nextPutAll: #[100 200];
>>> 	close.
>>> (GsFile open: 'bin.one' mode: 'rb' onClient: false) contents =
>>> 	(GsFile open: 'bin.two' mode: 'rb' onClient: false) contents.
>>>
>>> Consider this more as a "heads-up" for users than a bug report, since this is apparently the intended, documented behavior.
>>>
>>>> Sent: Friday, January 26, 2018 at 2:20 AM
>>>> From: "monty via Glass" <glass at lists.gemtalksystems.com>
>>>> To: glass at lists.gemtalksystems.com
>>>> Subject: [Glass] Possible Bug: String>>#= treats nulls as a terminator
>>>>
>>>> Is this correct?
>>>>
>>>> (String with: 12 asCharacter with: 0 asCharacter) =
>>>>       (String with: 12 asCharacter with: 0 asCharacter with: 32 asCharacter)
>>>>
>>>> Other string methods, like #copyAfter:, don't treat null the same way.
>>>> _______________________________________________
>>>> Glass mailing list
>>>> Glass at lists.gemtalksystems.com
>>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>>>
>>> _______________________________________________
>>> Glass mailing list
>>> Glass at lists.gemtalksystems.com
>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>> _______________________________________________
>> Glass mailing list
>> Glass at lists.gemtalksystems.com
>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>
> _______________________________________________
> Glass mailing list
> Glass at lists.gemtalksystems.com
> http://lists.gemtalksystems.com/mailman/listinfo/glass



More information about the Glass mailing list