From: Antoine Pitrou on

Hello,

> I have to read the contents of a binary file (a PNG file exactly), and
> dump it into an RTF file.
>
> The RTF-file has been opened with codecs.open in utf-8 mode.

You should use the built-in open() function. codecs.open() is outdated in
Python 3.

> As I expected, the utf-8 decoder chokes on some combinations of bits;
> how can I tell python to dump the bytes as they are, without
> interpreting them?

Well, the one thing you have to be careful about is to flush text buffers
before writing binary data. But, for example:

>>> f = open("TEST", "w", encoding='utf8')
>>> f.write("héhé")
4
>>> f.flush()
>>> f.buffer.write(b"\xff\x00")
2
>>> f.close()

gives you:

$ hexdump -C TEST
00000000 68 c3 a9 68 c3 a9 ff 00 |h..h....|

(utf-8 encoded text and then two raw bytes which are invalid utf-8)

Another possibility is to open the file in binary mode and do the
encoding yourself when writing text. This might actually be a better
solution, since I'm not sure RTF uses utf-8 by default.

Regards

Antoine.


From: Stefan Behnel on
Antoine Pitrou, 25.04.2010 02:16:
> Another possibility is to open the file in binary mode and do the
> encoding yourself when writing text. This might actually be a better
> solution, since I'm not sure RTF uses utf-8 by default.

That's a lot cleaner as it doesn't use two interfaces to write to the same
file, and doesn't rely on any specific coordination between those two
interfaces.

Stefan