From: Terry Reedy on 13 Aug 2010 18:25
A short background to MRAB's answer which I will try to get right.
The byte-order-mark was invented for UTF-16 encodings so the reader
could determine whether the pairs of bytes are in little or big endiean
order, depending on whether the first two bute are fe and ff or ff and
fe (or maybe vice versa, does not matter here). The concept is
meaningless for utf-8 which consists only of bytes in a defined order.
This is part of the Unicode standard.
However, Microsoft (or whoever) re-purposed (hijacked) that pair of
bytes to serve as a non-standard indicator of utf-8 versus any
non-unicode encoding. The result is a corrupted utf-8 stream that python
accommodates with the utf-8-sig(nature) codec (versus the standard utf-8
Terry Jan Reedy
First | Prev |
Pages: 1 2
Prev: How do I get number of files in a particular directory.
Next: Deditor -- pythonic text-editor