From: dirknbr on
I am having some problems with unicode from json.

This is the error I get

UnicodeEncodeError: 'ascii' codec can't encode character u'\x93' in
position 61: ordinal not in range(128)

I have kind of developped this but obviously it's not nice, any better
ideas?

try:
text=texts[i]
text=text.encode('latin-1')
text=text.encode('utf-8')
except:
text=' '

Dirk
From: Steven D'Aprano on
On Fri, 23 Jul 2010 03:14:11 -0700, dirknbr wrote:

> I am having some problems with unicode from json.
>
> This is the error I get
>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\x93' in
> position 61: ordinal not in range(128)
>
> I have kind of developped this but obviously it's not nice, any better
> ideas?
>
> try:
> text=texts[i]
> text=text.encode('latin-1')
> text=text.encode('utf-8')
> except:
> text=' '

Don't write bare excepts, always catch the error you want and nothing
else. As you've written it, the result of encoding with latin-1 is thrown
away, even if it succeeds.


text = texts[i] # Don't hide errors here.
try:
text = text.encode('latin-1')
except UnicodeEncodeError:
try:
text = text.encode('utf-8')
except UnicodeEncodeError:
text = ' '
do_something_with(text)


Another thing you might consider is setting the error handler:

text = text.encode('utf-8', errors='ignore')

Other error handlers are 'strict' (the default), 'replace' and
'xmlcharrefreplace'.


--
Steven
From: Chris Rebert on
On Fri, Jul 23, 2010 at 3:14 AM, dirknbr <dirknbr(a)gmail.com> wrote:
> I am having some problems with unicode from json.
>
> This is the error I get
>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\x93' in
> position 61: ordinal not in range(128)

Please include the full Traceback and the actual code that's causing
the error! We aren't mind readers.

This error basically indicates that you're incorrectly mixing byte
strings and Unicode strings somewhere.

Cheers,
Chris
--
http://blog.rebertia.com
From: dirknbr on
To give a bit of context. I am using twython which is a wrapper for
the JSON API


search=twitter.searchTwitter(s,rpp=100,page=str(it),result_type='recent',lang='en')
for u in search[u'results']:
ids.append(u[u'id'])
texts.append(u[u'text'])

This is where texts comes from.

When I then want to write texts to a file I get the unicode error.

Dirk
From: Thomas Jollans on
On 07/23/2010 12:56 PM, dirknbr wrote:
> To give a bit of context. I am using twython which is a wrapper for
> the JSON API
>
>
> search=twitter.searchTwitter(s,rpp=100,page=str(it),result_type='recent',lang='en')
> for u in search[u'results']:
> ids.append(u[u'id'])
> texts.append(u[u'text'])
>
> This is where texts comes from.
>
> When I then want to write texts to a file I get the unicode error.

So your data is unicode? Good.

Well, files are just streams of bytes, so to write unicode data to one
you have to encode it. Since Python can't know which encoding you want
to use (utf-8, by the way, if you ask me), you have to do it manually.

something like:

outfile.write(text.encode('utf-8'))