[Glass] Files and UTF8 but not using String #encodeAsUTF8
Johan Brichau via Glass
glass at lists.gemtalksystems.com
Mon Mar 2 09:41:11 PST 2015
Mariano,
I think you can wrap the stream and convert messages, no?
At least, that's what we do all over the place in this area, but I need to check if we have your use case.
At least, I identified that I should publish these wrappers with the TextConverters code because it's needed. So I am checking that too but that will require some more time.
Johan
> On 02 Mar 2015, at 15:08, Mariano Martinez Peck <marianopeck at gmail.com> wrote:
>
>
>
>> On Thu, Feb 26, 2015 at 5:38 PM, Johan Brichau <johan at yesplan.be> wrote:
>> Mariano,
>>
>> Does this help you?
>>
>> |codec stream|
>> codec := GRCodec forEncoding: 'utf8'.
>> stream := codec encoderFor: (GsFile open: 'bla.xml' mode: 'r' onClient: false).
>> stream binary.
>> stream contents
>>
>> I guess you can convert this to writing easily.
>
> Hi Johan,
>
> Yes, it did help. However, I am still getting more errors. Your above code works only correct for me if I send #contents. If I send #next for example, it fails. In my case, I cannot send #contents but instead pass the stream to SIXX and Sixx will read the stream and materialize. So sixx for example, sends #next:. The problem is that the UTF8 magritte readers expects the streams to answer a number (ascii value) to #next. However, GsFile answers an instance of character. See the attached screenshot.
>
> I know GsFile understands #nextByte which indeed answers the ascii value, but as you can see in the stack I don't have control over which messages are sent, so #next: is sent.
>
> Reproducing the error is very easy:
>
> | stream codec |
> codec := GRCodec forEncoding: 'utf8'.
> stream := codec encoderFor: (GsFile open: '/Users/mariano/test.txt' mode: 'w' onClient: false).
> stream text.
> stream nextPutAll: '<mariano>'.
> stream flush.
>
> codec := GRCodec forEncoding: 'utf8'.
> stream := codec decoderFor: (GsFile open: '/Users/mariano/test.txt' mode: 'r' onClient: false).
> stream next
>
> There you will get the DNU.
>
> Any ideas how to workaround this?
>
> Thanks in advance,
>
>
>
>
>> Johan
>>
>>> On 26 Feb 2015, at 21:22, Johan Brichau <johan at yesplan.be> wrote:
>>>
>>> For UTF8, I think you can better use Grease GRUtf8CodecStream
>>> Let me see if I can extract something from what we did because we do read files in several encodings in Gemstone.
>>> There’s just stream wrappers all over the place… I’m looking into it right now.
>>>
>>> Johan
>>>
>>>> On 26 Feb 2015, at 20:56, Mariano Martinez Peck <marianopeck at gmail.com> wrote:
>>>>
>>>> Hi Johan,
>>>>
>>>> But do you have any example of using a UTF8TextConvertor with files? because I found no way :(
>>>>
>>>> Thanks!
>>>>
>>>>> On Thu, Feb 26, 2015 at 4:45 PM, Johan Brichau <johan at yesplan.be> wrote:
>>>>> It’s in the PharoCompatibility project on github.
>>>>>
>>>>> Though, I must say it merits some love.
>>>>> It works (using it for years already) but we have sometimes done some hacks to make it work with the Stream classes of GS. Or at least that’s what I remember right off the top of my head.
>>>>>
>>>>> But I have not used this with SIXX. My last experience with SIXX was when we moved the Vooruit database from Pharo+GOODS to Gemstone and I used the commitOnAlmostOutOfMemory trick.
>>>>>
>>>>> Johan
>>>>>
>>>>>> On 26 Feb 2015, at 18:46, Dale Henrichs <dale.henrichs at gemtalksystems.com> wrote:
>>>>>>
>>>>>> What package is that class located in?
>>>>>>
>>>>>> Dale
>>>>>>
>>>>>>> On 2/26/15 9:42 AM, Mariano Martinez Peck wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On Thu, Feb 26, 2015 at 2:31 PM, Dale Henrichs via Glass <glass at lists.gemtalksystems.com> wrote:
>>>>>>>> Mariano,
>>>>>>>>
>>>>>>>>
>>>>>>>> Hmmm, this is a bit of a sticky wicket ....
>>>>>>>>
>>>>>>>> I'm afraid that the best way to solve this on is make a major change to SIXX and force all output and input to be utf8 ... but if you are using SIXX to move data between pharo and gemstone, then you'll have to make sure thatSIXX on the pharo size will properly decode utf8 ...
>>>>>>>>
>>>>>>>> Alternatively, we could try porting the the whole TextConverter scheme to GemStone ...
>>>>>>>
>>>>>>> Hi Dale,
>>>>>>>
>>>>>>> This is already ported, Johan did it. I just don't know how to use the MultiByteBinaryOrTextStream (to which I can set a #converter:) together with a GsFile backend.. but i guess Johan did something because I cannot imagine he did everything in memory, right?
>>>>>>>
>>>>>>>
>>>>>>>> I suppose it's about time we did something in this area ...
>>>>>>>>
>>>>>>>> Dale
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2/26/15 7:03 AM, Mariano Martinez Peck via Glass wrote:
>>>>>>>>> Hi guys,
>>>>>>>>>
>>>>>>>>> I am trying to implement the solution provided by Dale for exporting and importing large objects with SIXX: https://github.com/glassdb/SIXX?files=1
>>>>>>>>>
>>>>>>>>> In his example he does:
>>>>>>>>>
>>>>>>>>> strm := WriteStream on: String new.
>>>>>>>>> #( 1 2 3) sixxOn: strm persistentRoot: (UserGlobals at: #'MY_SIXX_ROOT_ARRAY')
>>>>>>>>>
>>>>>>>>> That stream 'strm' is in memory. I need files. And I want those files to be encoded with UTF8. In addition, in my experience, I have been trying to use GsFile as much as possible since it was way faster than other classes when I tested it. So...so far I was using the following approach to write a UTF8 file:
>>>>>>>>>
>>>>>>>>> file := GsFile openWrite: aFilename.
>>>>>>>>> file nextPutAll: aString encodeAsUTF8.
>>>>>>>>>
>>>>>>>>> However, I cannot use that approach in the SIXX scenario. Why? Because I cannot easily hook in the parts where sixx gets the string of an object and writes it to the stream. So I kind of need to create the File stream with UTF8 from the beginning.
>>>>>>>>>
>>>>>>>>> I do have UT8TextConverter, but GsFile dnu #converter:. I tried:
>>>>>>>>>
>>>>>>>>> | stream |
>>>>>>>>> stream := MultiByteBinaryOrTextStream on: (GsFile openWrite: aFilename).
>>>>>>>>>
>>>>>>>>> stream converter: UTF8TextConverter new.
>>>>>>>>>
>>>>>>>>> stream text.
>>>>>>>>>
>>>>>>>>> MCPlatformSupport commitOnAlmostOutOfMemoryDuring: [
>>>>>>>>>
>>>>>>>>> UserGlobals at: #'MY_SIXX_ROOT_ARRAY' put: Array new.
>>>>>>>>>
>>>>>>>>> #( 1 2 3) sixxOn: stream persistentRoot: (UserGlobals at: #'MY_SIXX_ROOT_ARRAY')
>>>>>>>>>
>>>>>>>>> ].
>>>>>>>>>
>>>>>>>>> stream close.
>>>>>>>>>
>>>>>>>>> But it doesn't work. Ok..I did see GsFile >> contentsAsUtf8 so I could write all the file first, then grab the contents as UT8 and then do what I did above (a new file doing #nextPutAll: of the UTF8). But...since I am doing all this code because the object graph I am trying to serialize is big I am afraid I will run out of memory while trying to have all the contents as UTF8. So I would really like the "streaming" possibility.
>>>>>>>>>
>>>>>>>>> Any ideas how can I do that?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Mariano
>>>>>>>>> http://marianopeck.wordpress.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Glass mailing list
>>>>>>>>> Glass at lists.gemtalksystems.com
>>>>>>>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Glass mailing list
>>>>>>>> Glass at lists.gemtalksystems.com
>>>>>>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Mariano
>>>>>>> http://marianopeck.wordpress.com
>>>>
>>>>
>>>>
>>>> --
>>>> Mariano
>>>> http://marianopeck.wordpress.com
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
> <Screen Shot 2015-03-02 at 10.56.45 AM.png>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/glass/attachments/20150302/d11498ff/attachment.html>
More information about the Glass
mailing list