From: Peter Otten on
C. Benson Manica wrote:

> On Apr 21, 2:25 pm, Peter Otten <__pete...(a)web.de> wrote:
>
>> Are you sure that your script has
>>
>> str = u"..."
>>
>> like in your post and not just
>>
>> str = "..."
>
> No :-)
>
> str=u"<?xml version=\"1.0\" encoding=\"utf-8\"?><elements><elem attrib=
> \"ó\"/></elements>"
> doc=xml.dom.minidom.parseString( str.encode("utf-8") )
> xml=doc.toxml( encoding="utf-8")
> file=codecs.open( "foo.xml", "w", "utf-8" )
> file.write( xml )
> file.close()
>
> fails:
>
> File "./demo.py", line 12, in <module>
> file.write( xml )
> File "/usr/lib/python2.5/codecs.py", line 638, in write
> return self.writer.write(data)
> File "/usr/lib/python2.5/codecs.py", line 303, in write
> data, consumed = self.encode(object, self.errors)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
> 62: ordinal not in range(128)

But that's a different error (codecs.open().write()) on a different line.
What you said was failing (xml.dom.minidom.parseString()) worked.

> but dropping the encoding argument to doc.toxml() seems to finally
> work. I'd be curious to know why the code you posted (that worked for
> you) didn't for me, but at this point I'm just happy with something
> functional. Thank you very kindly!

The following worked for me an should work for you, too:

$ cat tmp.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-

import xml.dom.minidom

str = u"<?xml version=\"1.0\" encoding=\"utf-8\"?><elements><elem
attrib=\"ó\"/></elements>"
doc = xml.dom.minidom.parseString(str.encode("utf-8"))

xml = doc.toxml(encoding="utf-8")

file = open("foo.xml", "w")
file.write( xml )
file.close()
$ python2.5 tmp.py
$ cat foo.xml
<?xml version="1.0" encoding="utf-8"?><elements><elem
attrib="ó"/></elements>$

Btw., str is a bad variable name because it shadows the builtin str type.

Peter