From: John Machin on
On Oct 28, 2:51 am, Ethan Furman <et...(a)stoneleaf.us> wrote:
> John Machin wrote:
> > On Oct 27, 7:15 am, Ethan Furman <et...(a)stoneleaf.us> wrote:
>
> >>Let me rephrase -- say I get a dbf file with an LDID of \x0f that maps
> >>to a cp437, and the file came from a german oem machine... could that
> >>file have upper-ascii codes that will not map to anything reasonable on
> >>my \x01 cp437 machine?  If so, is there anything I can do about it?
>
> > ASCII is defined over the first 128 codepoints; "upper-ascii codes" is
> > meaningless. As for the rest of your question, if the file's encoded
> > in cpXXX, it's encoded in cpXXX. If either the creator or the reader
> > or both are lying, then all bets are off.
>
> My confusion is this -- is there a difference between any of the various
> cp437s?

What various cp437s???

>  Going down the list at ESRI: 0x01, 0x09, 0x0b, 0x0d, 0x0f,
> 0x11, 0x15, 0x18, 0x19, and 0x1b all map to cp437,

Yes, this is called a "many-to-*one*" relationship.

> and they have names

"they" being the Language Drivers, not the codepages.

> such as US, Dutch, Finnish, French, German, Italian, Swedish, Spanish,
> English (Britain & US)... are these all the same?

When you read the Wikipedia page on cp437, did you see any reference
to different versions for French, German, Finnish, etc? I saw only one
mapping table; how many did you see? If there are multiple language
versions of a codepage, how do you expect to handle this given Python
has only one codec per codepage?

Trying again: *ONE* attribute of a Language Driver ID (LDID) is the
character set (codepage) that it uses. Other attributes may be things
like the collating (sorting) sequence, whether they use a dot or a
comma as the decimal point, etc. Many different languages in Western
Europe can use the same codepage. Initially the common one was cp 437,
then 850, then 1252.

There may possibly different interpretations of a codepage out there
somewhere, but they are all *intended* to be the same, and I advise
you to cross the different-cp437s bridge *if* it exists and you ever
come to it.

Have you got access to files with LDID not in (0, 1) that you can try
out?

Cheers,
John
From: Ethan Furman on
John Machin wrote:
> There may possibly different interpretations of a codepage out there
> somewhere, but they are all *intended* to be the same, and I advise
> you to cross the different-cp437s bridge *if* it exists and you ever
> come to it.
>
> Have you got access to files with LDID not in (0, 1) that you can try
> out?

Alas, I do not. And I probably never will, making the whole thing academic.

Speaking of tables I do not have access to, and documentation for that
matter, I would love to get information on db4, 5, 7, etc.

Many thanks for your time and knowledge, and my apologies for seeming so
dense. :)

Cheers!

~Ethan~