Unicode blues in Python3 [Python]

Prev: using message loop for hotkey capturing
Next: Miracles of the devil and beat of the revolution of religious reform

From: Antoine Pitrou on 24 Mar 2010 09:41

Le Tue, 23 Mar 2010 10:33:33 -0700, nn a écrit :

> I know that unicode is the way to go in Python 3.1, but it is getting in
> my way right now in my Unix scripts. How do I write a chr(253) to a
> file?
>
> #nntst2.py
> import sys,codecs
> mychar=chr(253)
> print(sys.stdout.encoding)
> print(mychar)

print() writes to the text (unicode) layer of sys.stdout.
If you want to access the binary (bytes) layer, you must use
sys.stdout.buffer. So:

sys.stdout.buffer.write(chr(253).encode('latin1'))

or:

sys.stdout.buffer.write(bytes([253]))

See http://docs.python.org/py3k/library/io.html#io.TextIOBase.buffer

From: Michael Torrie on 24 Mar 2010 13:33

Steven D'Aprano wrote:
> I think your question is malformed. You need to work out what behaviour
> you actually want, before you can ask for help on how to get it.

It may or may not be malformed, but I understand the question. So let
eme translate for you. How can he write arbitrary bytes ( 0x0 through
0xff) to stdout without having them mangled by encodings. It's a very
simple question, really. Looks like Antoine Pitrou has answered this
question quite nicely as well.

From: nn on 24 Mar 2010 13:34

Antoine Pitrou wrote:
> Le Tue, 23 Mar 2010 10:33:33 -0700, nn a écrit :
>
> > I know that unicode is the way to go in Python 3.1, but it is getting in
> > my way right now in my Unix scripts. How do I write a chr(253) to a
> > file?
> >
> > #nntst2.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > print(mychar)
>
> print() writes to the text (unicode) layer of sys.stdout.
> If you want to access the binary (bytes) layer, you must use
> sys.stdout.buffer. So:
>
> sys.stdout.buffer.write(chr(253).encode('latin1'))
>
> or:
>
> sys.stdout.buffer.write(bytes([253]))
>
> See http://docs.python.org/py3k/library/io.html#io.TextIOBase.buffer

Just what I needed! Now I full control of the output.

Thanks Antoine. The new io stack is still a bit of a mystery to me.

Thanks everybody else, and sorry for confusing the issue. Latin1 just
happens to be very convenient to manipulate bytes and is what I
thought of initially to handle my mix of textual and non-textual data.

From: John Nagle on 24 Mar 2010 15:35

nn wrote:

> To be more informative I am both writing text and binary data
> together. That is I am embedding text from another source into stream
> that uses non-ascii characters as "control" characters. In Python2 I
> was processing it mostly as text containing a few "funny" characters.

OK. Then you need to be writing arrays of bytes, not strings.
Encoding is your problem. This has nothing to do with Unicode.

John Nagle

First | Prev |
Pages: 1 2 3
Prev: using message loop for hotkey capturing
Next: Miracles of the devil and beat of the revolution of religious reform