From: Mihai N. on
I think my answer to this was lost.
Trying to re-post a short version here:

> So what? Where we intuitively think that the stated limit of MAX_PATH
> characters means MAX_PATH chars in ANSI, Microsoft informed me that the
> limit really is MAX_PATH characters even if it takes twice that many bytes.
Then probably our intuition is wrong :-)


> You asked for examples of cases where we had been wrong in nearly always
> assuming that MSDN's statements about characters meant TCHARs, and this
> is a big example.
No. I asked for examples because I know "<<There are in fact very few APIs
that deal with the "user character">> (I did not say "no APIs")
And because I could not think of one from the top of my head, and it never
hurts to learn something.
It was not "I don't believe it, prove-it." Sorry, is sometimes difficult by
email.


> You suspect that Microsoft's e-mail to me was accurate, and as mentioned, I
> have the same impression. Though they send a lot of unbelievable e-mails,
> they send some believable e-mails too and this was one.
Yes, I think the e-mail was accurate, and you are right, this is a valid
example of API saying characters and meaning WCHAR even in ANSI context.

--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
From: Norman Diamond on
"Since you can't use any multibyte encoding in CreateFile,"

Dr. Newcomer, I am amazed to see that from you.

CreateFile has worked with Japanese filenames since Windows 95 and NT 3.1
(though not in some foreign language versions of those operating systems).

OpenFile worked with Japanese filenames before that (though not in some
foreign versions of Windows 3.1 or MS-DOS 5 or whatever).

On FAT12 and FAT16 and NTFS.


"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
news:nnidd2dckt42edmn498eemd3idlg53cpjb(a)4ax.com...
> But the limit is MAX_PATH characters. THat's what we've been discussing.
> In Unicode
> mode, the limit is MAX_PATH characters, which would occupy 2*MAX_PATH
> bytes. That is,
> MAX_PATH TCHARs, and therefore their comment is completely CONSISTENT with
> the fact that a
> 'character' is a 'TCHAR'. Since you can't use any multibyte encoding in
> CreateFile, I
> don't see where there is any problem here. 'character', in nearly every
> context we've
> discussed, means 'TCHAR'.
>
> This also means that if you are using Unicode to represent Kanji, then you
> should be able
> to use MAX_PATH Kanji characters to name a file.
> joe
>
> On Thu, 3 Aug 2006 17:13:52 +0900, "Norman Diamond"
> <ndiamond(a)community.nospam> wrote:
>
>>"Mihai N." <nmihai_year_2000(a)yahoo.com> wrote in message
>>news:Xns9813EB11E9700MihaiN(a)207.46.248.16...
>>>> The one for which Microsoft sent personal e-mail was CreateFile.
>>>> Microsoft assured me that even the ANSI version (CreateFileA) uses
>>>> Unicode internally and MAX_PATH is the limit on the number of
>>>> characters
>>>> internally, so if an ANSI application needs more than MAX_PATH bytes to
>>>> specify a usable filename then it can indeed do so. I've been a bit
>>>> negligent in not writing a test program to test this answer yet.
>>>
>>> But CreateFile does not take a number of chars as parameter.
>>
>>So what? Where we intuitively think that the stated limit of MAX_PATH
>>characters means MAX_PATH chars in ANSI, Microsoft informed me that the
>>limit really is MAX_PATH characters even if it takes twice that many
>>bytes.
>>You asked for examples of cases where we had been wrong in nearly always
>>assuming that MSDN's statements about characters meant TCHARs, and this is
>>a
>>big example.
>>
>>> What I suspect is happening is that the MAX_PATH is the limit if you
>>> don't
>>> use "\\?\" and is there both in the W and A versions.
>>> And since the A version does a conversion to Unicode and calls the W
>>> one,
>>> the limit is probably there and expressed in utf16 code units, indeed.
>>
>>You suspect that Microsoft's e-mail to me was accurate, and as mentioned,
>>I
>>have the same impression. Though they send a lot of unbelievable e-mails,
>>they send some believable e-mails too and this was one.
>>
>>> Interesting for some week-end experiments :-)
>>
>>Yup. By the way, considering that VFAT can store a filename consisting of
>>around 250 Kanji, one weekend experiment would be to try opening the file
>>under Windows 98 (Japanese version of course). But really I'll consider
>>it
>>close enough if it works under Windows 2000, XP, 2003, and Vista beta. I
>>haven't had time to test it and I do believe that mail.
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Norman Diamond on
"I think my answer to this was lost."

I don't think your posting was lost. I saw both copies and agree 100%.


"Mihai N." <nmihai_year_2000(a)yahoo.com> wrote in message
news:Xns981849F4A860MihaiN(a)207.46.248.16...
>I think my answer to this was lost.
> Trying to re-post a short version here:
>
>> So what? Where we intuitively think that the stated limit of MAX_PATH
>> characters means MAX_PATH chars in ANSI, Microsoft informed me that the
>> limit really is MAX_PATH characters even if it takes twice that many
>> bytes.
> Then probably our intuition is wrong :-)
>
>
>> You asked for examples of cases where we had been wrong in nearly always
>> assuming that MSDN's statements about characters meant TCHARs, and this
>> is a big example.
> No. I asked for examples because I know "<<There are in fact very few APIs
> that deal with the "user character">> (I did not say "no APIs")
> And because I could not think of one from the top of my head, and it never
> hurts to learn something.
> It was not "I don't believe it, prove-it." Sorry, is sometimes difficult
> by
> email.
>
>
>> You suspect that Microsoft's e-mail to me was accurate, and as mentioned,
>> I
>> have the same impression. Though they send a lot of unbelievable
>> e-mails,
>> they send some believable e-mails too and this was one.
> Yes, I think the e-mail was accurate, and you are right, this is a valid
> example of API saying characters and meaning WCHAR even in ANSI context.
>
> --
> Mihai Nita [Microsoft MVP, Windows - SDK]
> http://www.mihai-nita.net
> ------------------------------------------
> Replace _year_ with _ to get the real email

From: Norman Diamond on
"READING THE CODE"

I don't have the source code to StringCchPrintf. (Source code to some of
Microsoft's versions of ISO printf and stuff like that yes, this one no.)

"PERFORMING THE EXPERIMENT"

And getting a result which works today in one version of Windows XP with one
version of MS Office and one version of Internet Explorer and four versions
of Visual Studio to muddy the waters. It won't work tomorrow. For some of
the incorrect statements in MSDN this kind of experiment is useful in
proving them incorrect, but it's not a reliable way to make reliable code
myself.

"Now, if you have a working system with code page 932 in place,"

give or take an order of magnitude (in the quantity of such systems)...


"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
news:5thdd297fn5d56bd53jnfl78pql63nu1gm(a)4ax.com...
>I think the confusion here is that you are interpreting "character" in one
>context as "a
> sequence of bytes representing a glyph", and StringCchPrintf, as I said,
> when %c is used,
> does NOT interpret the word 'character' this way. So you can interpret it
> any way you
> want, but the only interpretation that matters is the interpretation given
> by
> StringCchPrintf, and you can see that easily, as I said, by READING THE
> CODE and
> PERFORMING THE EXPERIMENT. Now, if you have a working system with code
> page 932 in place,
> try the experiments I did, and tell us what you get. Try %c, in an ANSI
> code page, using
> any bit value of your choice for the character value, and tell us what
> StringCchPrintf
> does with respect to %c. I was not discussing %s, but %c, which you
> insist won't work. So
> if you're convinced it produces more than one 8-bit character or 16-bit
> character of
> output, please demonstrate this. Note that %lc and %C *do* expand wide
> character codes to
> multibyte representations, but that was not what we were discussing.
> joe
>
> On Thu, 3 Aug 2006 10:34:15 +0900, "Norman Diamond"
> <ndiamond(a)community.nospam> wrote:
>
>>> Multibyte Character Set is an *encoding* of a character set.
>>
>>Yes, ANSI code page 932 is an encoding just like other ANSI code pages
>>such
>>as (I might not be remembering these numbers correctly) 1252 and 850.
>>
>>> however, StringCchPrintf, sprintf, etc. do only convert characters using
>>> code pages in special cases, e.g., %lc or %C format.
>>
>>And %s and stuff like that. (If you're compiling in an ANSI environment
>>then simply use %s, but if you're compiling in a Unicode environment and
>>want to produce an ANSI encoded string then use %S.)
>>
>>> For ANSI mode, this means that 'character' is 'byte'. In ANSI mode, one
>>> character is one byte.
>>
>>For some reason I thought that you had sometimes written code targetting
>>ANSI code pages in which you knew that these statements are not true. It
>>looks like I misremembered. OK, then it seems that this is your
>>introduction to such code pages. In ANSI mode, one character is one or
>>more
>>bytes. In the ANSI code pages that Microsoft implemented, one character
>>is
>>one or two bytes, no more than two.
>>
>>I haven't been using Japanese Microsoft systems for nearly 20 years, I've
>>only been using them for half that length of time and occasionally seen
>>them
>>in use the other half of that time while I was using Japanese Unix and
>>Japanese VMS systems. I've used %s format in printf in Japanese Unix and
>>VMS and Windows systems. This is one kind of experiment that you don't
>>need
>>to tell me to do.
>>
>>I will continue to respect your expertise on matters other than character
>>encodings.
>>
>>
>>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
>>news:b9i1d2p7ca3n59258h63bc1mavfgjngicd(a)4ax.com...
>>> Multibyte Character Set is an *encoding* of a character set. In ANSI
>>> mode, MBCS can be
>>> used to encode 'characters' in an extended set; however,
>>> StringCchPrintf,
>>> sprintf, etc. do
>>> only convert characters using code pages in special cases, e.g., %lc or
>>> %C
>>> format. The
>>> formal definition for %c, the formatting code being discussed in this
>>> example, is that
>>> the int argument is converted to 'unsigned char' and formatted as a
>>> character. For ANSI
>>> mode, this means that 'character' is 'byte'. In ANSI mode, one
>>> character
>>> is one byte.
>>>
>>> In a multibyte character set, a glyph might be represented by one to
>>> four
>>> successive 8-bit
>>> bytes. Note that using %c would be erroneous for formatting an integer
>>> value, if the
>>> intent was to produce a multibyte sequence representing a single logical
>>> character.
>>>
>>> This can easily be seen by looking at the %c formatting code in output.c
>>> in the CRT
>>> source. %c formats exactly one byte in ANSI mode. So arguing that %c
>>> requires two bytes
>>> for a character is not correct.
>>>
>>> The exact code executed for %c formatting is
>>> unsigned short temp;
>>> temp = (unsigned short) get_int_arg(&argptr);
>>> {
>>> buffer.sz[0] = (char) temp;
>>> textlen = 1;
>>> }
>>>
>>> I see nothing here that can generate more than one byte of output. Note
>>> that the %C and
>>> %lc formats, which take wide character values and format them in
>>> accordance with the code
>>> page, *can* generate more than one byte of character, which does satisfy
>>> the objection
>>> raised. But the format here is clearly %c, and %c is clearly defined,
>>> and
>>> the
>>> implementation reflects that definition. So I'm not sure what the issue
>>> is here.
>>>
>>> StringCchPrintf is defined in terms of 8-bit characters and 16-bit
>>> characters, not in
>>> terms of logical characters encoded in an MBCS. MBCS does not enter the
>>> discussion; if
>>> you format using %lc or %C it will actually truncate the multibyte
>>> string
>>> to fit in the
>>> buffer. Thus, it obeys its requirement of not allowing a buffer
>>> overrun.
>>>
>>> This can be seen trivially simply by--get this--DOING THE
>>> EXPERIMENT!!!!!
>>> So while you
>>> can contend until the cows come home that you think that you know how to
>>> read the
>>> documentation, it is a matter of a couple minutes to actually do the
>>> experiment. I found
>>> that even when th
From: Norman Diamond on
I think my reply got lost or maybe accidentally sent personally to Dr.
Newcomer, sorry.

"Since you can't use any multibyte encoding in CreateFile,"

I am astounded to see that assertion.

CreateFile has created Japanese filenames since Windows 95 and NT 3.1
(though not in some foreign versions of those operating systems).

OpenFile created Japanese filenames in Windows 3.1, MS-DOS, etc. (though not
in some foreign versions of those operating systems).

On FAT12, FAT16, and NTFS.


"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
news:nnidd2dckt42edmn498eemd3idlg53cpjb(a)4ax.com...
> But the limit is MAX_PATH characters. THat's what we've been discussing.
> In Unicode
> mode, the limit is MAX_PATH characters, which would occupy 2*MAX_PATH
> bytes. That is,
> MAX_PATH TCHARs, and therefore their comment is completely CONSISTENT with
> the fact that a
> 'character' is a 'TCHAR'. Since you can't use any multibyte encoding in
> CreateFile, I
> don't see where there is any problem here. 'character', in nearly every
> context we've
> discussed, means 'TCHAR'.
>
> This also means that if you are using Unicode to represent Kanji, then you
> should be able
> to use MAX_PATH Kanji characters to name a file.
> joe
>
> On Thu, 3 Aug 2006 17:13:52 +0900, "Norman Diamond"
> <ndiamond(a)community.nospam> wrote:
>
>>"Mihai N." <nmihai_year_2000(a)yahoo.com> wrote in message
>>news:Xns9813EB11E9700MihaiN(a)207.46.248.16...
>>>> The one for which Microsoft sent personal e-mail was CreateFile.
>>>> Microsoft assured me that even the ANSI version (CreateFileA) uses
>>>> Unicode internally and MAX_PATH is the limit on the number of
>>>> characters
>>>> internally, so if an ANSI application needs more than MAX_PATH bytes to
>>>> specify a usable filename then it can indeed do so. I've been a bit
>>>> negligent in not writing a test program to test this answer yet.
>>>
>>> But CreateFile does not take a number of chars as parameter.
>>
>>So what? Where we intuitively think that the stated limit of MAX_PATH
>>characters means MAX_PATH chars in ANSI, Microsoft informed me that the
>>limit really is MAX_PATH characters even if it takes twice that many
>>bytes.
>>You asked for examples of cases where we had been wrong in nearly always
>>assuming that MSDN's statements about characters meant TCHARs, and this is
>>a
>>big example.
>>
>>> What I suspect is happening is that the MAX_PATH is the limit if you
>>> don't
>>> use "\\?\" and is there both in the W and A versions.
>>> And since the A version does a conversion to Unicode and calls the W
>>> one,
>>> the limit is probably there and expressed in utf16 code units, indeed.
>>
>>You suspect that Microsoft's e-mail to me was accurate, and as mentioned,
>>I
>>have the same impression. Though they send a lot of unbelievable e-mails,
>>they send some believable e-mails too and this was one.
>>
>>> Interesting for some week-end experiments :-)
>>
>>Yup. By the way, considering that VFAT can store a filename consisting of
>>around 250 Kanji, one weekend experiment would be to try opening the file
>>under Windows 98 (Japanese version of course). But really I'll consider
>>it
>>close enough if it works under Windows 2000, XP, 2003, and Vista beta. I
>>haven't had time to test it and I do believe that mail.
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm