From: RG on
I thought it was hard-coded into the Python executable at compile time,
but that is apparently not the case:

[ron(a)mickey:~]$ python
Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys;print sys.stdin.encoding
UTF-8
>>> ^D
[ron(a)mickey:~]$ echo 'import sys;print sys.stdin.encoding' | python
None
[ron(a)mickey:~]$

And indeed, trying to pipe unicode into Python doesn't work, even though
it works fine when Python runs interactively. So how can I make this
work?

Thanks,
rg
From: Benjamin Kaplan on
On Wed, Aug 11, 2010 at 6:21 PM, RG <rNOSPAMon(a)flownet.com> wrote:
> I thought it was hard-coded into the Python executable at compile time,
> but that is apparently not the case:
>
> [ron(a)mickey:~]$ python
> Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
> [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import sys;print sys.stdin.encoding
> UTF-8
>>>> ^D
> [ron(a)mickey:~]$ echo 'import sys;print sys.stdin.encoding' | python
> None
> [ron(a)mickey:~]$
>
> And indeed, trying to pipe unicode into Python doesn't work, even though
> it works fine when Python runs interactively.  So how can I make this
> work?
>

Sys.stdin and stdout are files, just like any other. There's nothing
special about them at compile time. When the interpreter starts, it
checks to see if they are ttys. If they are, then it tries to figure
out the terminal's encoding based on the environment. The code for
this is in pythonrun.c if you want to see exactly what it's doing. If
stdout and stdin aren't ttys, then their encoding stays as None and
the interpreter will use sys.getdefaultencoding() if you try printing
Unicode strings.

By the way, there is no such thing as piping Unicode into Python.
Unicode is an abstract concept where each character maps to a
codepoint. Pipes can only deal with bytes. You may be using one of the
5 encodings capable of holding the entire range of Unicode characters
(UTF-8, UTF-16 LE, UTF-16 BE, UTF-32 LE, and UTF-32 BE), but that's
not the same thing as Unicode. You really have to watch your encodings
when you pass data around between programs. There's no way to avoid
it.
From: RG on
In article <mailman.1988.1281579897.1673.python-list(a)python.org>,
Benjamin Kaplan <benjamin.kaplan(a)case.edu> wrote:

> On Wed, Aug 11, 2010 at 6:21 PM, RG <rNOSPAMon(a)flownet.com> wrote:
> > I thought it was hard-coded into the Python executable at compile time,
> > but that is apparently not the case:
> >
> > [ron(a)mickey:~]$ python
> > Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
> > [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
> > Type "help", "copyright", "credits" or "license" for more information.
> >>>> import sys;print sys.stdin.encoding
> > UTF-8
> >>>> ^D
> > [ron(a)mickey:~]$ echo 'import sys;print sys.stdin.encoding' | python
> > None
> > [ron(a)mickey:~]$
> >
> > And indeed, trying to pipe unicode into Python doesn't work, even though
> > it works fine when Python runs interactively.  So how can I make this
> > work?
> >
>
> Sys.stdin and stdout are files, just like any other. There's nothing
> special about them at compile time. When the interpreter starts, it
> checks to see if they are ttys. If they are, then it tries to figure
> out the terminal's encoding based on the environment. The code for
> this is in pythonrun.c if you want to see exactly what it's doing.

Thanks. Looks like the magic incantation is:

export PYTHONIOENCODING='utf-8'

> By the way, there is no such thing as piping Unicode into Python.

Yeah, I know. I should have said "piping UTF-8 encoded unicode" or
something like that.

> You really have to watch your encodings
> when you pass data around between programs. There's no way to avoid
> it.

Yeah, I keep re-learning that lesson again and again.

rg
From: Anssi Saari on
Benjamin Kaplan <benjamin.kaplan(a)case.edu> writes:

> Sys.stdin and stdout are files, just like any other. There's nothing
> special about them at compile time. When the interpreter starts, it
> checks to see if they are ttys. If they are, then it tries to figure
> out the terminal's encoding based on the environment.

Just a related question, is looking at sys.stdin.encoding the proper
way of doing things? I've been working on a script to display some
email headers, some of which are encoded in MIME to various charsets.

Until now I have used whatever locale.getdefaultlocale() returns as
the target encoding, since "it seemed to work". Although on one
computer the call returns ISO-8859-15 even though I don't quite
understand why.
 | 
Pages: 1
Prev: Python Tkinter Simple Qn
Next: Deer Esurance