From: Matthias Kievernagel on
Hello,

I stumbled upon this one while porting some of my programs
to Python 3.1. The program receives messages from a socket
and displays them in a tkinter Text. Works fine in Python 2
and Python 3.1. The problems arrived when I wanted to know
the details...

First surprise: Text.insert accepts not only str
but also bytes.

So I looked into the sources to see how it is done.
I found no magic in 'tkinter.__init__.py'. All python
objects seem to go unchanged to _tkinter.c.
There they are turned into Tcl objects using Tcl_NewUnicodeObj
(for str) and Tcl_NewStringObj (for bytes).
The man page for Tcl_NewStringObj says that it creates
a tcl string from utf-8 encoded bytes.
So I continued to test...

Second surprise: Text.insert also works for latin-1 encoded bytes.
It even works with mixed utf-8 and latin-1 encoded bytes.
At least it works for me.

Anyone can enlighten me, where this magic is done?
Is it tcl magic or did I miss something in the python sources?
Is this somewhere documented?

Thanks for any hints,
Matthias Kievernagel

From: eb303 on
On Apr 23, 2:00 pm, Matthias Kievernagel <mkie...(a)Pirx.sirius.org>
wrote:
> Hello,
>
> I stumbled upon this one while porting some of my programs
> to Python 3.1. The program receives messages from a socket
> and displays them in a tkinter Text. Works fine in Python 2
> and Python 3.1. The problems arrived when I wanted to know
> the details...
>
> First surprise: Text.insert accepts not only str
> but also bytes.
>
> So I looked into the sources to see how it is done.
> I found no magic in 'tkinter.__init__.py'. All python
> objects seem to go unchanged to _tkinter.c.
> There they are turned into Tcl objects using Tcl_NewUnicodeObj
> (for str) and Tcl_NewStringObj (for bytes).
> The man page for Tcl_NewStringObj says that it creates
> a tcl string from utf-8 encoded bytes.
> So I continued to test...
>
> Second surprise: Text.insert also works for latin-1 encoded bytes.
> It even works with mixed utf-8 and latin-1 encoded bytes.
> At least it works for me.
>
> Anyone can enlighten me, where this magic is done?
> Is it tcl magic or did I miss something in the python sources?
> Is this somewhere documented?
>
> Thanks for any hints,
> Matthias Kievernagel

Let me guess: you're on Windows? ;-)

There is nothing in the Python sources that can help you here.
Everything is handled by the underlying tcl/tk interpreter. The
default encoding for strings in tcl happens to be UTF-8. So putting
bytestrings with a UTF-8 encoding in a Text widget will just work. For
latin-1 strings, there is some magic going on, but apparently, this
magic happens only on Windows (hence my guess above…), which seems to
recognize its default encoding by some means. My advice is: don't
count on it. It won't work on any other platform, and it might even
stop working on Windows one day.

HTH
- Eric -
From: Matthias Kievernagel on
eb303 <eric.brunel.pragmadev(a)gmail.com> wrote:
> On Apr 23, 2:00�pm, Matthias Kievernagel <mkie...(a)Pirx.sirius.org>
> wrote:
>> Hello,
>>
>> I stumbled upon this one while porting some of my programs
>> to Python 3.1. The program receives messages from a socket
>> and displays them in a tkinter Text. Works fine in Python 2
>> and Python 3.1. The problems arrived when I wanted to know
>> the details...
>>
>> First surprise: Text.insert accepts not only str
>> but also bytes.
>>
>> So I looked into the sources to see how it is done.
>> I found no magic in 'tkinter.__init__.py'. All python
>> objects seem to go unchanged to _tkinter.c.
>> There they are turned into Tcl objects using Tcl_NewUnicodeObj
>> (for str) and Tcl_NewStringObj (for bytes).
>> The man page for Tcl_NewStringObj says that it creates
>> a tcl string from utf-8 encoded bytes.
>> So I continued to test...
>>
>> Second surprise: Text.insert also works for latin-1 encoded bytes.
>> It even works with mixed utf-8 and latin-1 encoded bytes.
>> At least it works for me.
>>
>> Anyone can enlighten me, where this magic is done?
>> Is it tcl magic or did I miss something in the python sources?
>> Is this somewhere documented?
>>
>> Thanks for any hints,
>> Matthias Kievernagel
>
> Let me guess: you're on Windows? ;-)
>
> There is nothing in the Python sources that can help you here.
> Everything is handled by the underlying tcl/tk interpreter. The
> default encoding for strings in tcl happens to be UTF-8. So putting
> bytestrings with a UTF-8 encoding in a Text widget will just work. For
> latin-1 strings, there is some magic going on, but apparently, this
> magic happens only on Windows (hence my guess above???), which seems to
> recognize its default encoding by some means. My advice is: don't
> count on it. It won't work on any other platform, and it might even
> stop working on Windows one day.
>
> HTH
> - Eric -

Thanks for the info, Eric.
Funny it's working for me, because I'm on Linux.
So I'll take a look at the tcl/tk sources (8.4 btw.)
I don't like this magic at all, run-time errors waiting for you
at the most inconvenient moment.

Best regards,
Matthias Kievernagel.