[Glass] Files and UTF8 but not using String #encodeAsUTF8

Mariano Martinez Peck via Glass glass at lists.gemtalksystems.com
Mon Mar 2 06:20:29 PST 2015


On Mon, Mar 2, 2015 at 11:08 AM, Mariano Martinez Peck <
marianopeck at gmail.com> wrote:

>
>
> On Thu, Feb 26, 2015 at 5:38 PM, Johan Brichau <johan at yesplan.be> wrote:
>
>> Mariano,
>>
>> Does this help you?
>>
>> |codec stream|
>> codec := GRCodec forEncoding: 'utf8'.
>> stream := codec encoderFor: (GsFile open: 'bla.xml' mode: 'r' onClient:
>> false).
>> stream binary.
>> stream contents
>>
>> I guess you can convert this to writing easily.
>>
>>
> Hi Johan,
>
> Yes, it did help. However, I am still getting more errors. Your above code
> works only correct for me if I send #contents. If I send #next for example,
> it fails. In my case, I cannot send #contents but instead pass the stream
> to SIXX and Sixx will read the stream and materialize. So sixx for example,
> sends #next:. The problem is that the UTF8 magritte readers expects the
> streams to answer a number (ascii value) to  #next. However, GsFile answers
> an instance of character. See the attached screenshot.
>
> I know GsFile understands #nextByte which indeed answers the ascii value,
> but as you can see in the stack I don't have control over which messages
> are sent, so #next: is sent.
>
> Reproducing the error is very easy:
>
> | stream codec  |
> codec := GRCodec forEncoding: 'utf8'.
> stream := codec encoderFor: (GsFile open: '/Users/mariano/test.txt' mode:
> 'w' onClient: false).
> stream text.
> stream nextPutAll: '<mariano>'.
> stream flush.
>
> codec := GRCodec forEncoding: 'utf8'.
> stream := codec decoderFor: (GsFile open: '/Users/mariano/test.txt' mode:
> 'r' onClient: false).
> stream next
>
> There you will get the DNU.
>
> Any ideas how to workaround this?
>
>
I forgot to said...I tried sending r+b ir w+b as open mode in my OSX and
didn't work.

If I replace the reading with this:

*stream := codec decoderFor: (FileStream oldFileNamed: aFilename) binary.*
it works correct....becasue FileStream #next answers asciiValue (if I send
#binary).

What a mess...  then it means that GRUtf8CodecStream expects that I define
the stream as binary even if they are text?

Thanks in advance for any clarification!

Best,



>
>
>> Johan
>>
>> On 26 Feb 2015, at 21:22, Johan Brichau <johan at yesplan.be> wrote:
>>
>> For UTF8, I think you can better use Grease GRUtf8CodecStream
>> Let me see if I can extract something from what we did because we do read
>> files in several encodings in Gemstone.
>> There’s just stream wrappers all over the place… I’m looking into it
>> right now.
>>
>> Johan
>>
>> On 26 Feb 2015, at 20:56, Mariano Martinez Peck <marianopeck at gmail.com>
>> wrote:
>>
>> Hi Johan,
>>
>> But do you have any example of using a UTF8TextConvertor with files?
>> because I found no way :(
>>
>> Thanks!
>>
>> On Thu, Feb 26, 2015 at 4:45 PM, Johan Brichau <johan at yesplan.be> wrote:
>>
>>> It’s in the PharoCompatibility project on github.
>>>
>>> Though, I must say it merits some love.
>>> It works (using it for years already) but we have sometimes done some
>>> hacks to make it work with the Stream classes of GS. Or at least that’s
>>> what I remember right off the top of my head.
>>>
>>> But I have not used this with SIXX. My last experience with SIXX was
>>> when we moved the Vooruit database from Pharo+GOODS to Gemstone and I used
>>> the commitOnAlmostOutOfMemory trick.
>>>
>>> Johan
>>>
>>> On 26 Feb 2015, at 18:46, Dale Henrichs <
>>> dale.henrichs at gemtalksystems.com> wrote:
>>>
>>>
>>>  What package is that class located in?
>>>
>>> Dale
>>>
>>> On 2/26/15 9:42 AM, Mariano Martinez Peck wrote:
>>>
>>>
>>>
>>> On Thu, Feb 26, 2015 at 2:31 PM, Dale Henrichs via Glass <
>>> glass at lists.gemtalksystems.com> wrote:
>>>
>>>>  Mariano,
>>>>
>>>>
>>>> Hmmm, this is a bit of a sticky wicket ....
>>>>
>>>> I'm afraid that the best way to solve this on is make a major change to
>>>> SIXX and force all output and input to be utf8 ... but if you are using
>>>> SIXX to move data between pharo and gemstone, then you'll have to make sure
>>>> thatSIXX on the pharo size will properly decode utf8 ...
>>>>
>>>> Alternatively, we could try porting the the whole TextConverter scheme
>>>> to GemStone ...
>>>>
>>>>
>>>  Hi Dale,
>>>
>>>  This is already ported, Johan did it. I just don't know how to use the
>>> MultiByteBinaryOrTextStream (to which I can set a #converter:) together
>>> with a GsFile backend.. but i guess Johan did something because I cannot
>>> imagine he did everything in memory, right?
>>>
>>>
>>>
>>>>  I suppose it's about time we did something in this area ...
>>>>
>>>> Dale
>>>>
>>>>
>>>>
>>>> On 2/26/15 7:03 AM, Mariano Martinez Peck via Glass wrote:
>>>>
>>>>  Hi guys,
>>>>
>>>>  I am trying to implement the solution provided by Dale for exporting
>>>> and importing large objects with SIXX:
>>>> https://github.com/glassdb/SIXX?files=1
>>>>
>>>>  In his example he does:
>>>>
>>>>   strm := WriteStream on: String new.
>>>>  #( 1 2 3) sixxOn: strm persistentRoot: (UserGlobals at:
>>>> #'MY_SIXX_ROOT_ARRAY')
>>>>
>>>>  That stream 'strm' is in memory. I need files. And I want those files
>>>> to be encoded with UTF8. In addition, in my experience, I have been trying
>>>> to use GsFile as much as possible since it was way faster than other
>>>> classes when I tested it. So...so far I was using the following approach to
>>>> write a UTF8 file:
>>>>
>>>>   file := GsFile openWrite: aFilename.
>>>>  file nextPutAll: aString encodeAsUTF8.
>>>>
>>>>  However, I cannot use that approach in the SIXX scenario. Why?
>>>> Because I cannot easily hook in the parts where sixx gets the string of an
>>>> object and writes it to the stream. So I kind of need to create the File
>>>> stream with UTF8 from the beginning.
>>>>
>>>>  I do have UT8TextConverter, but GsFile dnu #converter:. I tried:
>>>>
>>>>  | stream |
>>>>  *stream := MultiByteBinaryOrTextStream on: (GsFile openWrite:
>>>> aFilename).*
>>>> * stream converter: UTF8TextConverter new. *
>>>> * stream text. *
>>>>   MCPlatformSupport commitOnAlmostOutOfMemoryDuring: [
>>>>     UserGlobals at: #'MY_SIXX_ROOT_ARRAY' put: Array new.
>>>>         #( 1 2 3) sixxOn: stream persistentRoot: (UserGlobals at:
>>>> #'MY_SIXX_ROOT_ARRAY')
>>>>   ].
>>>>   stream close.
>>>>
>>>>  But it doesn't work. Ok..I did see GsFile >> contentsAsUtf8   so I
>>>> could write all the file first, then grab the contents as UT8 and then do
>>>> what I did above (a new file doing #nextPutAll: of the UTF8). But...since I
>>>> am doing all this code because the object graph I am trying to serialize is
>>>> big I am afraid I will run out of memory while trying to have all the
>>>> contents as UTF8. So I would really like the "streaming" possibility.
>>>>
>>>>  Any ideas how can I do that?
>>>>
>>>>  Thanks,
>>>>
>>>>
>>>>
>>>>  --
>>>> Mariano
>>>> http://marianopeck.wordpress.com
>>>>
>>>>
>>>>  _______________________________________________
>>>> Glass mailing listGlass at lists.gemtalksystems.comhttp://lists.gemtalksystems.com/mailman/listinfo/glass
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Glass mailing list
>>>> Glass at lists.gemtalksystems.com
>>>> http://lists.gemtalksystems.com/mailman/listinfo/glass
>>>>
>>>>
>>>
>>>
>>>  --
>>> Mariano
>>> http://marianopeck.wordpress.com
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Mariano
>> http://marianopeck.wordpress.com
>>
>>
>>
>>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>



-- 
Mariano
http://marianopeck.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gemtalksystems.com/mailman/private/glass/attachments/20150302/b043bd23/attachment.html>


More information about the Glass mailing list