[Glass] Files and UTF8 but not using String #encodeAsUTF8

Mariano Martinez Peck via Glass glass at lists.gemtalksystems.com
Mon Mar 2 07:45:37 PST 2015


On Mon, Mar 2, 2015 at 11:58 AM, Richard Sargent <
richard.sargent at gemtalksystems.com> wrote:

> Mariano,
> UTF-8 *is* a binary encoding. Yes, it is 8-bit, but binary. (It uses code
> points in the 228-255 range to identify the encoded characters.)
>
Indeed, you are right. But even doing so GsFile does not work while
FileStream does. So...I guess my problem is that I have no way to force the
open of GsFile as binary...I tried 'r+b' and 'w+b' as follows:

(GsFile open: '/Users/mariano/test.txt' mode: 'r+b' onClient: false).

But GsFile #next will always answer a character. On the contrary, * (FileStream
oldFileNamed: aFilename) binary.  *does answer a number when sending #next.
At least, for FileStream I can force the binary mode with #binary, thing
that I cannot do (or doesn't work) with GsFile.

Best,










> On Mar 2, 2015 6:20 AM, "Mariano Martinez Peck via Glass" <
> glass at lists.gemtalksystems.com> wrote:
>
>>
>>
>> On Mon, Mar 2, 2015 at 11:08 AM, Mariano Martinez Peck <
>> marianopeck at gmail.com> wrote:
>>
>>>
>>>
>>> On Thu, Feb 26, 2015 at 5:38 PM, Johan Brichau <johan at yesplan.be> wrote:
>>>
>>>> Mariano,
>>>>
>>>> Does this help you?
>>>>
>>>> |codec stream|
>>>> codec := GRCodec forEncoding: 'utf8'.
>>>> stream := codec encoderFor: (GsFile open: 'bla.xml' mode: 'r' onClient:
>>>> false).
>>>> stream binary.
>>>> stream contents
>>>>
>>>> I guess you can convert this to writing easily.
>>>>
>>>>
>>> Hi Johan,
>>>
>>> Yes, it did help. However, I am still getting more errors. Your above
>>> code works only correct for me if I send #contents. If I send #next for
>>> example, it fails. In my case, I cannot send #contents but instead pass the
>>> stream to SIXX and Sixx will read the stream and materialize. So sixx for
>>> example, sends #next:. The problem is that the UTF8 magritte readers
>>> expects the streams to answer a number (ascii value) to  #next. However,
>>> GsFile answers an instance of character. See the attached screenshot.
>>>
>>> I know GsFile understands #nextByte which indeed answers the ascii
>>> value, but as you can see in the stack I don't have control over which
>>> messages are sent, so #next: is sent.
>>>
>>> Reproducing the error is very easy:
>>>
>>> | stream codec  |
>>> codec := GRCodec forEncoding: 'utf8'.
>>> stream := codec encoderFor: (GsFile open: '/Users/mariano/test.txt'
>>> mode: 'w' onClient: false).
>>> stream text.
>>> stream nextPutAll: '<mariano>'.
>>> stream flush.
>>>
>>> codec := GRCodec forEncoding: 'utf8'.
>>> stream := codec decoderFor: (GsFile open: '/Users/mariano/test.txt'
>>> mode: 'r' onClient: false).
>>> stream next
>>>
>>> There you will get the DNU.
>>>
>>> Any ideas how to workaround this?
>>>
>>>
>> I forgot to said...I tried sending r+b ir w+b as open mode in my OSX and
>> didn't work.
>>
>> If I replace the reading with this:
>>
>> *stream := codec decoderFor: (FileStream oldFileNamed: aFilename) binary.*
>> it works correct....becasue FileStream #next answers asciiValue (if I
>> send #binary).
>>
>> What a mess...  then it means that GRUtf8CodecStream expects that I
>> define the stream as binary even if they are text?
>>
>> Thanks in advance for any clarification!
>>
>> Best,
>>
>>
>>
>>>
>>>
>>>> Johan
>>>>
>>>> On 26 Feb 2015, at 21:22, Johan Brichau <johan at yesplan.be> wrote:
>>>>
>>>> For UTF8, I think you can better use Grease GRUtf8CodecStream
>>>> Let me see if I can extract something from what we did because we do
>>>> read files in several encodings in Gemstone.
>>>> There’s just stream wrappers all over the place… I’m looking into it
>>>> right now.
>>>>
>>>> Johan
>>>>
>>>> On 26 Feb 2015, at 20:56, Mariano Martinez Peck <marianopeck at gmail.com>
>>>> wrote:
>>>>
>>>> Hi Johan,
>>>>
>>>> But do you have any example of using a UTF8TextConvertor with files?
>>>> because I found no way :(
>>>>
>>>> Thanks!
>>>>
>>>> On Thu, Feb 26, 2015 at 4:45 PM, Johan Brichau <johan at yesplan.be>
>>>> wrote:
>>>>
>>>>> It’s in the PharoCompatibility project on github.
>>>>>
>>>>> Though, I must say it merits some love.
>>>>> It works (using it for years already) but we have sometimes done some
>>>>> hacks to make it work with the Stream classes of GS. Or at least that’s
>>>>> what I remember right off the top of my head.
>>>>>
>>>>> But I have not used this with SIXX. My last experience with SIXX was
>>>>> when we moved the Vooruit database from Pharo+GOODS to Gemstone and I used
>>>>> the commitOnAlmostOutOfMemory trick.
>>>>>
>>>>> Johan
>>>>>
>>>>> On 26 Feb 2015, at 18:46, Dale Henrichs <
>>>>> dale.henrichs at gemtalksystems.com> wrote:
>>>>>
>>>>>
>>>>>  What package is that class located in?
>>>>>
>>>>> Dale
>>>>>
>>>>> On 2/26/15 9:42 AM, Mariano Martinez Peck wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Feb 26, 2015 at 2:31 PM, Dale Henrichs via Glass <
>>>>> glass at lists.gemtalksystems.com> wrote:
>>>>>
>>>>>>  Mariano,
>>>>>>
>>>>>>
>>>>>> Hmmm, this is a bit of a sticky wicket ....
>>>>>>
>>>>>> I'm afraid that the best way to solve this on is make a major change
>>>>>> to SIXX and force all output and input to be utf8 ... but if you are using
>>>>>> SIXX to move data between pharo and gemstone, then you'll have to make sure
>>>>>> thatSIXX on the pharo size will properly decode utf8 ...
>>>>>>
>>>>>> Alternatively, we could try porting the the whole TextConverter
>>>>>> scheme to GemStone ...
>>>>>>
>>>>>>
>>>>>  Hi Dale,
>>>>>
>>>>>  This is already ported, Johan did it. I just don't know how to use
>>>>> the MultiByteBinaryOrTextStream (to which I can set a #converter:) together
>>>>> with a GsFile backend.. but i guess Johan did something because I cannot
>>>>> imagine he did everything in memory, right?
>>>>>
>>>>>
>>>>>
>>>>>>  I suppose it's about time we did something in this area ...
>>>>>>
>>>>>> Dale
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2/26/15 7:03 AM, Mariano Martinez Peck via Glass wrote:
>>>>>>
>>>>>>  Hi guys,
>>>>>>
>>>>>>  I am trying to implement the solution provided by Dale for
>>>>>> exporting and importing large objects with SIXX:
>>>>>> https://github.com/glassdb/SIXX?files=1
>>>>>>
>>>>>>  In his example he does:
>>>>>>
>>>>>>   strm := WriteStream on: String new.
>>>>>>  #( 1 2 3) sixxOn: strm persistentRoot: (UserGlobals at:
>>>>>> #'MY_SIXX_ROOT_ARRAY')
>>>>>>
>>>>>>  That stream 'strm' is in memory. I need files. And I want those
>>>>>> files to be encoded with UTF8. In addition, in my experience, I have been
>>>>>> trying to use GsFile as much as possible since it was way faster than other
>>>>>> classes when I tested it. So...so far I was using the following approach to
>>>>>> write a UTF8 file:
>>>>>>
>>>>>>   file := GsFile openWrite: aFilename.
>>>>>>  file nextPutAll: aString encodeAsUTF8.
>>>>>>
>>>>>>  However, I cannot use that approach in the SIXX scenario. Why?
>>>>>> Because I cannot easily hook in the parts where sixx gets the string of an
>>>>>> object and writes it to the stream. So I kind of need to create the File
>>>>>> stream with UTF8 from the beginning.
>>>>>>
>>>>>>  I do have UT8TextConverter, but GsFile dnu #converter:. I tried:
>>>>>>
>>>>>>  | stream |
>>>>>>  *stream := MultiByteBinaryOrTextStream on: (GsFile openWrite:
>>>>>> aFilename).*
>>>>>> * stream converter: UTF8TextConverter new. *
>>>>>> * stream text. *
>>>>>>   MCPlatformSupport commitOnAlmostOutOfMemoryDuring: [
>>>>>>     UserGlobals at: #'MY_SIXX_ROOT_ARRAY' put: Array new.
>>>>>>         #( 1 2 3) sixxOn: stream persistentRoot: (UserGlobals at:
>>>>>> #'MY_SIXX_ROOT_ARRAY')
>>>>>>   ].
>>>>>>   stream close.
>>>>>>
>>>>>>  But it doesn't work. Ok..I did see GsFile >> contentsAsUtf8   so I
>>>>>> could write all the file first, then grab the contents as UT8 and then do
>>>>>> what I did above (a new file doing #nextPutAll: of the UTF8). But...since I
>>>>>> am doing all this code because the object graph I am trying to serialize is
>>>>>> big I am afraid I will run out of memory while trying to have all the
>>>>>> contents as UTF8. So I would really like the "streaming" possibility.
>>>>>>
>>>>>>  Any ideas how can I do that?
>>>>>>
>>>>>>  Thanks,
>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>> Mariano
>>>>>> http://marianopeck.wordpress.com
>>>>>>
>>>>>>
>>>>>>  _______________________________________________
>>>>>> Glass mailing listGlass at lists.gemtalksystems.comhttp://lists.gemtalksystems.com/mailman/listinfo/glass
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Glass mailing list
>>>>>> Glass at lists.gemtalksystems.com
>>>>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> Mariano
>>>>> http://marianopeck.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Mariano
>>>> http://marianopeck.wordpress.com
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Mariano
>>> http://marianopeck.wordpress.com
>>>
>>
>>
>>
>> --
>> Mariano
>> http://marianopeck.wordpress.com
>>
>> _______________________________________________
>> Glass mailing list
>> Glass at lists.gemtalksystems.com
>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>
>>


-- 
Mariano
http://marianopeck.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/glass/attachments/20150302/81c72ca7/attachment-0001.html>


More information about the Glass mailing list