write xml to txt encoding [Python]

Prev: measuring a function time
Next: solving Tix problem in ubuntu jaunty

From: Stefan Behnel on 29 Jul 2010 08:54

William Johnston, 29.07.2010 14:12:
> I have a Python app that parses XML files and then writes to text files.

XML or HTML?

> However, the output text file is "sometimes" encoded in some Asian language.
>
> Here is my code:
>
>
> encoding = "iso-8859-1"
>
> clean_sent = nltk.clean_html(sent.text)
>
> clean_sent = clean_sent.encode(encoding, "ignore");
>
>
> I also tried "UTF-8" encoding, but received the same results.

What result?

Maybe the NLTK cannot determine the encoding of the HTML file (because the
file is broken and/or doesn't correctly specify its own encoding) and thus
fails to decode it?

Stefan

|
Pages: 1
Prev: measuring a function time
Next: solving Tix problem in ubuntu jaunty