Internal string storage and Encoding::Converter#convpath [Ruby]

From: Patrick Thomson on 21 May 2010 18:02

Hi, everyone:

In #rubyspec we were discussing whether the specifications are correct for Encoding::Converter's convpath method. Since MRI uses UTF-8 internally, the #convpath method shows that it converts to UTF-8 for an intermediate step:

Encoding::Converter.new('ascii','Big5').convpath
=> [[Encoding::US_ASCII, Encoding::UTF_8], [Encoding::UTF_8, Encoding::Big5]]

Is the fact that MRI uses UTF-8 for its intermediate steps between incompatible encodings an implementation detail, or is it desired Ruby behavior?

Thanks very much,
-- Patrick Thomson

From: Tanaka Akira on 21 May 2010 21:38

2010/5/22 Patrick Thomson <pthomson(a)apple.com>:
>
> In #rubyspec we were discussing whether the specifications are correct for Encoding::Converter's convpath method. Since MRI uses UTF-8 internally, the #convpath method shows that it converts to UTF-8 for an intermediate step:

UTF-8 is not required:

% ruby -e 'p Encoding::Converter.new("euc-jp", "shift_jis").convpath'
[[#<Encoding:EUC-JP>, #<Encoding:Shift_JIS>]]
--
Tanaka Akira

From: Patrick Thomson on 24 May 2010 12:59

For encodings that can be converted directly (like EUC-JP to SJIS), I understand that no UTF-8 internal storage is required. However, what about encodings that do require an intermediate step? Is the choice of UTF-8 as an intermediate representation an implementation detail?

Thanks,
-- Patrick Thomson

On May 21, 2010, at 6:38 PM, Tanaka Akira wrote:

> 2010/5/22 Patrick Thomson <pthomson(a)apple.com>:
>>
>> In #rubyspec we were discussing whether the specifications are correct for Encoding::Converter's convpath method. Since MRI uses UTF-8 internally, the #convpath method shows that it converts to UTF-8 for an intermediate step:
>
> UTF-8 is not required:
>
> % ruby -e 'p Encoding::Converter.new("euc-jp", "shift_jis").convpath'
> [[#<Encoding:EUC-JP>, #<Encoding:Shift_JIS>]]
> --
> Tanaka Akira
>

|
Pages: 1
Prev: smpp
Next: SMPP from Perl to Ruby