From: Ethan Furman on
Greetings, all!

I would like to add unicode support to my dbf project. The dbf header
has a one-byte field to hold the encoding of the file. For example,
\x03 is code-page 437 MS-DOS.

My google-fu is apparently not up to the task of locating a complete
resource that has a list of the 256 possible values and their
corresponding code pages.

So far I have found this, plus variations:
http://support.microsoft.com/kb/129631

Does anyone know of anything more complete?

~Ethan~
From: John Machin on
On Oct 23, 7:28 am, Ethan Furman <et...(a)stoneleaf.us> wrote:
> Greetings, all!
>
> I would like to add unicode support to my dbf project.  The dbf header
> has a one-byte field to hold the encoding of the file.  For example,
> \x03 is code-page 437 MS-DOS.
>
> My google-fu is apparently not up to the task of locating a complete
> resource that has a list of the 256 possible values and their
> corresponding code pages.

What makes you imagine that all 256 possible values are mapped to code
pages?

> So far I have found this, plus variations:http://support.microsoft.com/kb/129631
>
> Does anyone know of anything more complete?

That is for VFP3. Try the VFP9 equivalent.

dBase 5,5,6,7 use others which are not defined in publicly available
dBase docs AFAICT. Look for "language driver ID" and "LDID". Secondary
source: ESRI support site.
From: Ethan Furman on
John Machin wrote:
> On Oct 23, 7:28 am, Ethan Furman <et...(a)stoneleaf.us> wrote:
>
>>Greetings, all!
>>
>>I would like to add unicode support to my dbf project. The dbf header
>>has a one-byte field to hold the encoding of the file. For example,
>>\x03 is code-page 437 MS-DOS.
>>
>>My google-fu is apparently not up to the task of locating a complete
>>resource that has a list of the 256 possible values and their
>>corresponding code pages.
>
> What makes you imagine that all 256 possible values are mapped to code
> pages?

I'm just wanting to make sure I have whatever is available, and
preferably standard. :D


>>So far I have found this, plus variations:http://support.microsoft.com/kb/129631
>>
>>Does anyone know of anything more complete?
>
> That is for VFP3. Try the VFP9 equivalent.
>
> dBase 5,5,6,7 use others which are not defined in publicly available
> dBase docs AFAICT. Look for "language driver ID" and "LDID". Secondary
> source: ESRI support site.

Well, a couple hours later and still not more than I started with.
Thanks for trying, though!

~Ethan~
From: John Machin on
On Oct 23, 3:03 pm, Ethan Furman <et...(a)stoneleaf.us> wrote:
> John Machin wrote:
> > On Oct 23, 7:28 am, Ethan Furman <et...(a)stoneleaf.us> wrote:
>
> >>Greetings, all!
>
> >>I would like to add unicode support to my dbf project.  The dbf header
> >>has a one-byte field to hold the encoding of the file.  For example,
> >>\x03 is code-page 437 MS-DOS.
>
> >>My google-fu is apparently not up to the task of locating a complete
> >>resource that has a list of the 256 possible values and their
> >>corresponding code pages.
>
> > What makes you imagine that all 256 possible values are mapped to code
> > pages?
>
> I'm just wanting to make sure I have whatever is available, and
> preferably standard.  :D
>
> >>So far I have found this, plus variations:http://support.microsoft.com/kb/129631
>
> >>Does anyone know of anything more complete?
>
> > That is for VFP3. Try the VFP9 equivalent.
>
> > dBase 5,5,6,7 use others which are not defined in publicly available
> > dBase docs AFAICT. Look for "language driver ID" and "LDID". Secondary
> > source: ESRI support site.
>
> Well, a couple hours later and still not more than I started with.
> Thanks for trying, though!

Huh? You got tips to (1) the VFP9 docs (2) the ESRI site (3) search
keywords and you couldn't come up with anything??
From: Ethan Furman on
John Machin wrote:
> On Oct 23, 3:03 pm, Ethan Furman <et...(a)stoneleaf.us> wrote:
>
>>John Machin wrote:
>>
>>>On Oct 23, 7:28 am, Ethan Furman <et...(a)stoneleaf.us> wrote:
>>
>>>>Greetings, all!
>>
>>>>I would like to add unicode support to my dbf project. The dbf header
>>>>has a one-byte field to hold the encoding of the file. For example,
>>>>\x03 is code-page 437 MS-DOS.
>>
>>>>My google-fu is apparently not up to the task of locating a complete
>>>>resource that has a list of the 256 possible values and their
>>>>corresponding code pages.
>>
>>>What makes you imagine that all 256 possible values are mapped to code
>>>pages?
>>
>>I'm just wanting to make sure I have whatever is available, and
>>preferably standard. :D
>>
>>
>>>>So far I have found this, plus variations:http://support.microsoft.com/kb/129631
>>
>>>>Does anyone know of anything more complete?
>>
>>>That is for VFP3. Try the VFP9 equivalent.
>>
>>>dBase 5,5,6,7 use others which are not defined in publicly available
>>>dBase docs AFAICT. Look for "language driver ID" and "LDID". Secondary
>>>source: ESRI support site.
>>
>>Well, a couple hours later and still not more than I started with.
>>Thanks for trying, though!
>
>
> Huh? You got tips to (1) the VFP9 docs (2) the ESRI site (3) search
> keywords and you couldn't come up with anything??

Perhaps "nothing new" would have been a better description. I'd already
seen the clicketyclick site (good info there), and all I found at ESRI
were folks trying to figure it out, plus one link to a list that was no
different from the vfp3 list (or was it that the list did not give the
hex values? Either way, of no use to me.)

I looked at dbase.com, but came up empty-handed there (not surprising,
since they are a commercial company).

I searched some more on Microsoft's site in the VFP9 section, and was
able to find the code page section this time. Sadly, it only added
about seven codes.

At any rate, here is what I have come up with so far. Any corrections
and/or additions greatly appreciated.

code_pages = {
'\x01' : ('ascii', 'U.S. MS-DOS'),
'\x02' : ('cp850', 'International MS-DOS'),
'\x03' : ('cp1252', 'Windows ANSI'),
'\x04' : ('mac_roman', 'Standard Macintosh'),
'\x64' : ('cp852', 'Eastern European MS-DOS'),
'\x65' : ('cp866', 'Russian MS-DOS'),
'\x66' : ('cp865', 'Nordic MS-DOS'),
'\x67' : ('cp861', 'Icelandic MS-DOS'),
'\x68' : ('cp895', 'Kamenicky (Czech) MS-DOS'), # iffy
'\x69' : ('cp852', 'Mazovia (Polish) MS-DOS'), # iffy
'\x6a' : ('cp737', 'Greek MS-DOS (437G)'),
'\x6b' : ('cp857', 'Turkish MS-DOS'),

'\x78' : ('big5', 'Traditional Chinese (Hong Kong SAR, Taiwan)\
Windows'), # wag
'\x79' : ('iso2022_kr', 'Korean Windows'), # wag
'\x7a' : ('iso2022_jp_2', 'Chinese Simplified (PRC, Singapore)\
Windows'), # wag
'\x7b' : ('iso2022_jp', 'Japanese Windows'), # wag
'\x7c' : ('cp874', 'Thai Windows'), # wag
'\x7d' : ('cp1255', 'Hebrew Windows'),
'\x7e' : ('cp1256', 'Arabic Windows'),
'\xc8' : ('cp1250', 'Eastern European Windows'),
'\xc9' : ('cp1251', 'Russian Windows'),
'\xca' : ('cp1254', 'Turkish Windows'),
'\xcb' : ('cp1253', 'Greek Windows'),
'\x96' : ('mac_cyrillic', 'Russian Macintosh'),
'\x97' : ('mac_latin2', 'Macintosh EE'),
'\x98' : ('mac_greek', 'Greek Macintosh') }

~Ethan~