From: John Machin on 5 Jun 2010 23:05 On Jun 6, 12:14 pm, MRAB <pyt...(a)mrabarnett.plus.com> wrote: > Paulo da Silva wrote: > > Em 06-06-2010 00:41, Chris Rebert escreveu: > >> On Sat, Jun 5, 2010 at 4:03 PM, Paulo da Silva > >> <psdasilva.nos...(a)netcabonospam.pt> wrote: > > ... > > >> Specify the encoding of the text when opening the file using the > >> `encoding` parameter. For Windows-1252 for example: > > >> your_file = open("path/to/file.ext", 'r', encoding='cp1252') > > > OK! This fixes my current problem. I used encoding="iso-8859-15". This > > is how my text files are encoded. > > But what about a more general case where the encoding of the text file > > is unknown? Is there anything like "autodetect"? > > > > An encoding like 'cp1252' uses 1 byte/character, but so does 'cp1250'. > How could you tell which was the correct encoding? > > Well, if the file contained words in a certain language and some of the > characters were wrong, then you'd know that the encoding was wrong. This > does imply, though, that you'd need to know what the language should > look like! > > You could try different encodings, and for each one try to identify what > could be words, then look them up in dictionaries for various languages > to see whether they are real words... This has been automated (semi-successfully, with caveats) by the chardet package ... see http://chardet.feedparser.org/
From: Paulo da Silva on 5 Jun 2010 23:27 Em 06-06-2010 04:05, John Machin escreveu: > On Jun 6, 12:14 pm, MRAB <pyt...(a)mrabarnett.plus.com> wrote: >> Paulo da Silva wrote: .... >>> OK! This fixes my current problem. I used encoding="iso-8859-15". This >>> is how my text files are encoded. >>> But what about a more general case where the encoding of the text file >>> is unknown? Is there anything like "autodetect"? >> .... > > This has been automated (semi-successfully, with caveats) by the > chardet package ... see http://chardet.feedparser.org/ This seems nice! Thanks
First
|
Prev
|
Pages: 1 2 Prev: modify XMP data (Python/Windows) Next: save xls to csv/dbf without Excel/win32com.client |