From: Robert Klemme on
On 27.07.2010 18:47, Mike Pe wrote:
> Robert Klemme wrote:
>> 2010/7/24 Mike Pe<mikep123(a)gmail.com>:
>>>>> puts doc.root.attributes["test"] --> �nil
>>>> Can you show what exactly you did?
>>> The issue is that the first line of my input file:
>>>
>>> <?xml version="1.0" encoding="UTF-16"?>
>>>
>>> Causes the file to be read as an "xml application". Basically, I just
>>> want to be able to use REXML to parse out this xml file, but it does not
>>> parse properly with this line in the beginning of my input file.
>>> (otherwise it works fine).
>>
>> Please provide the code you are using so others can try this out
>> themselves. I asked for this already (see above).
>>
>>> I tried converting the files using iconv commands from your link, but it
>>> UTF-16 and UTF-8, the same error occurs, without regard for format.
>>>
>>> Why is this line interfering with the parser and how would I fix it?
>>> Thank you for your help.
>>
>> It seems there is no UTF-16 support:
>>
>> irb(main):009:0> f=File.open "x", "r:UTF-16"
>> (irb):9: warning: Unsupported encoding UTF-16 ignored
>> => #<File:x>
>>
>> So there is no point in trying to import a UTF-16 encoded file in Ruby.

> As for the code that I am using, I simplified the code in my original
> post. The first line:
>
> doc = REXML::Document.new error

What is "error"? How do you obtain it?

> Should parse in the XML document and recognize all of the roots,
> elements, attributes, etc. from the input document.
>
> i.e.:
> puts doc.root.attributes["test"]
>
> Should return "yes" because the attribute in the error xml file (see
> above) is "yes. With the extra line, it puts "nil". (because the parser
> did not do its job).
>
> I tried converting all of the files to UTF-8 and they still did not
> work. (If you remove the extra line, it does work) I do not think the
> problem with is in the unicode.

Hmm...

robert


--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

From: Mike Pe on
Robert Klemme wrote:
> On 27.07.2010 18:47, Mike Pe wrote:
>>>> parse properly with this line in the beginning of my input file.
>>>
>>> It seems there is no UTF-16 support:
>>>
>>> irb(main):009:0> f=File.open "x", "r:UTF-16"
>>> (irb):9: warning: Unsupported encoding UTF-16 ignored
>>> => #<File:x>
>>>
>>> So there is no point in trying to import a UTF-16 encoded file in Ruby.
>
>> As for the code that I am using, I simplified the code in my original
>> post. The first line:
>>
>> doc = REXML::Document.new error
>
> What is "error"? How do you obtain it?


By "error", I meant my file called error from my first post:

error = <<EOF
<?xml version="1.0" encoding="UTF-16"?>
<document test="yes">
</document>
EOF

>
>> I tried converting all of the files to UTF-8 and they still did not
>> work. (If you remove the extra line, it does work) I do not think the
>> problem with is in the unicode.
>
> Hmm...
>
> robert


--
Posted via http://www.ruby-forum.com/.

From: brabuhr on
>>>>> Can you show what exactly you did?
>>>
>>> Please provide the code you are using so others can try this out
>>> themselves.  I asked for this already (see above).

Could you provide a link to a zip file that contains an original input
that fails, a re-encoded input file that fails, and an input file that
does not fail and a script that loads them?

Or, provide a more detailed step-by-step of what you did, e.g.:

# poke at the original file to see what it looks like
ls -l orig-utf16.xml
file orig-utf16.xml
wc -c orig-utf16.xml
enca orig-utf16.xml
head orig-utf16.xml

# convert the file
iconv -t UTF8 -f UTF16 < orig-utf16.xml > new-utf8.xml

# poke at the new file to see what it looks like
ls -l new-utf8.xml
file new-utf8.xml
wc -c new-utf8.xml
enca new-utf8.xml
head new-utf8.xml

# load the files in the script
cat rexmltest.rb
ruby rexmltest.rb old-utf16.xml
ruby rexmltest.rb new-utf8.xml

Thanks.