Ascii to Unicode. [Python]

Prev: Linear nterpolation in 3D
Next: Newbie question regarding SSL and certificate verification

From: John Machin on 28 Jul 2010 17:39

On Jul 29, 4:32 am, "Joe Goldthwaite" <j...(a)goldthwaites.com> wrote:
> Hi,
>
> I've got an Ascii file with some latin characters. Specifically \xe1 and
> \xfc. I'm trying to import it into a Postgresql database that's running in
> Unicode mode. The Unicode converter chokes on those two characters.
>
> I could just manually replace those to characters with something valid but
> if any other invalid characters show up in later versions of the file, I'd
> like to handle them correctly.
>
> I've been playing with the Unicode stuff and I found out that I could
> convert both those characters correctly using the latin1 encoder like this;
>
> import unicodedata
>
> s = '\xe1\xfc'
> print unicode(s,'latin1')
>
> The above works. When I try to convert my file however, I still get an
> error;
>
> import unicodedata
>
> input = file('ascii.csv', 'r')
> output = file('unicode.csv','w')
>
> for line in input.xreadlines():
> output.write(unicode(line,'latin1'))
>
> input.close()
> output.close()
>
> Traceback (most recent call last):
> File "C:\Users\jgold\CloudmartFiles\UnicodeTest.py", line 10, in __main__
> output.write(unicode(line,'latin1'))
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position
> 295: ordinal not in range(128)
>
> I'm stuck using Python 2.4.4 which may be handling the strings differently
> depending on if they're in the program or coming from the file. I just
> haven't been able to figure out how to get the Unicode conversion working
> from the file data.
>
> Can anyone explain what is going on?

Hello hello ... you are running on Windows; the likelihood that you
actually have data encoded in latin1 is very very small. Follow MRAB's
answer but replace "latin1" by "cp1252".

First | Prev |
Pages: 1 2
Prev: Linear nterpolation in 3D
Next: Newbie question regarding SSL and certificate verification