From: gb345 on



I'm getting a UnicodeEncodeError during a call to repr:

Traceback (most recent call last):
File "bug.py", line 142, in <module>
element = parser.parse(INPUT)
File "bug.py", line 136, in parse
ps = Parser.Parse(open(filename,'r').read(), 1)
File "bug.py", line 97, in end_item
r = repr(CURRENT_ENTRY)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u3003' in position 0: o\
rdinal not in range(128)

This is what CURRENT_ENTRY.__repr__ looks like:

def __repr__(self):
k = SEP.join(self.k)
r = SEP.join(self.r)
s = SEP.join(self.s)
ret = u'\t'.join((k, r, s))
print type(ret) # prints "<type 'unicode'>", as expected
return ret

If I "inline" this CURRENT_ENTRY.__repr__ code so that the call to
repr(CURRENT_ENTRY) can be bypassed altogether, then the error
disappears.

Therefore, it is clear from the above that the problem, whatever
it is, occurs during the execution of the repr() built-in *after*
it gets the value returned by CURRENT_ENTRY.__repr__. It is also
clearly that repr is trying to encode something using the ascii
codec, but I don't understand why it needs to encode anything.

Do I need to do something especial to get repr to work strictly
with unicode?

Or should __repr__ *always* return bytes rather than unicode? What
about __str__ ? If both of these are supposed to return bytes,
then what method should I use to define the unicode representation
for instances of a class?

Thanks!

Gabe
From: Martin v. Loewis on
> Do I need to do something especial to get repr to work strictly
> with unicode?

Yes, you need to switch to Python 3 :-)

> Or should __repr__ *always* return bytes rather than unicode?

In Python 2.x: yes.

> What about __str__ ?

Likewise.

> If both of these are supposed to return bytes,
> then what method should I use to define the unicode representation
> for instances of a class?

__unicode__.

HTH,
Martin
From: gb345 on
In <hqguja$tt$1(a)online.de> "Martin v. Loewis" <martin(a)v.loewis.de> writes:

>> Do I need to do something especial to get repr to work strictly
>> with unicode?

>Yes, you need to switch to Python 3 :-)

>> Or should __repr__ *always* return bytes rather than unicode?

>In Python 2.x: yes.

>> What about __str__ ?

>Likewise.

>> If both of these are supposed to return bytes,
>> then what method should I use to define the unicode representation
>> for instances of a class?

>__unicode__.

Thanks!

From: Dave Angel on
gb345 wrote:
> In <hqguja$tt$1(a)online.de> "Martin v. Loewis" <martin(a)v.loewis.de> writes:
>
>
>>> Do I need to do something especial to get repr to work strictly
>>> with unicode?
>>>
>
>
>> Yes, you need to switch to Python 3 :-)
>>
>
>
>>> Or should __repr__ *always* return bytes rather than unicode?
>>>
>
>
>> In Python 2.x: yes.
>>
>
>
>>> What about __str__ ?
>>>
>
>
>> Likewise.
>>
>
>
>>> If both of these are supposed to return bytes,
>>> then what method should I use to define the unicode representation
>>> for instances of a class?
>>>
>
>
>> __unicode__.
>>
>
> Thanks!
>
>
>
More precisely, __str__() and __repr__() return characters. Those
characters are 8 bits on Python 2.x, and Unicode on 3.x. If you need
unicode on 2.x, use __unicode__().

DaveA