From: Joseph M. Newcomer on
CharNext, CharPrev, and that set of "APIs" that deal with multibyte representations think
of 'characters' as multibyte sequences. This class of calls are the only exceptions I
know to the rule that 'character' == TCHAR.
joe

On Fri, 04 Aug 2006 00:46:08 -0700, "Mihai N." <nmihai_year_2000(a)yahoo.com> wrote:

>> So what? Where we intuitively think that the stated limit of MAX_PATH
>> characters means MAX_PATH chars in ANSI, Microsoft informed me that the
>> limit really is MAX_PATH characters even if it takes twice that many bytes.
>This means our intuition is wrong :-)
>It is an internal limitation, so we should think how is Windows working
>internaly. And that is Unicode.
>I bet in Windows 9x the limit is MAX_PATH char (the 1 byte programming char,
>not the user "character")
>
>> You asked for examples of cases where we had been wrong in nearly always
>> assuming that MSDN's statements about characters meant TCHARs, and this is
>> a big example.
>True, the example is good, the the doc is not clear.
>
>
>> You suspect that Microsoft's e-mail to me was accurate, and as mentioned, I
>> have the same impression. Though they send a lot of unbelievable e-mails,
>> they send some believable e-mails too and this was one.
>Yes, I think the email is accurate, and you are right, the doc is not clear.
>Just noting that here the limit is "in the belly", so it might be a bit
>different than the something you pass as a parameter.
>For instance the internal implementation of some ANSI API might be:
>
>int BlaBlaA( char * wideBuff, int nBufLen ) { // here nBufLen is char count
> WCHAR myWideBuffer = new WCHAR [nBufLen];
> MultiByteToWideChar( GetACP(), flags,buffer, nBufLen, wideBuff, BufLen );
> int nRez = BlaBlaW( wideBuff, nBufLen ); // here nBufLen is WCHAR count
> delete [] wideBuff;
> return nRez;
>}
>
>Ok, I guess the whole thing has some error checking and does some king of
>memory reuse, not new/delete for each API :-) but this is the idea.
>So for APIs that take the length as param the limit tends to really be in
>char in the ANSI API.
>
>
>> Yup. By the way, considering that VFAT can store a filename consisting of
>> around 250 Kanji, one weekend experiment would be to try opening the file
>> under Windows 98 (Japanese version of course).
>I am quite sure the limit is in chars there.
>
>> But really I'll consider it
>> close enough if it works under Windows 2000, XP, 2003, and Vista beta. I
>> haven't had time to test it and I do believe that mail.
>I also believe the email :-)
>
>
>Ok, this is getting fuzzy.
>So, in the end, I am not arguing with you.
>My initial affirmation ("fact very few APIs") means I know there are some
>APIs, just that I could not think of one on the top of my head.
>And I have asked you for examples to learn something.
>And yes, you are also right that for the example the doc is unclear.
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Mihai N. on
> CharNext, CharPrev, and that set of "APIs" that deal with multibyte
> representations think of 'characters' as multibyte sequences.
> This class of calls are the only exceptions I
> know to the rule that 'character' == TCHAR.

And the ANSI versions of these 2 APIs are broken in XP. Oups! :-)

--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
From: Mihai N. on
> CharNext, CharPrev, and that set of "APIs" that deal with multibyte
> representations think
> of 'characters' as multibyte sequences. This class of calls are the only
> exceptions I know to the rule that 'character' == TCHAR.

A bunch of CRT api that depend on _MBCS being defined:
_mbsinc (the CRT brother of CharNext), _mbslen, _mbsnbcnt, _mbsnccnt

And (of course), the MSDN doc is not always clear if we are talking
"programmer characters" (char) or "user characters" (sometime meaning two
bytes)


--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
From: Mihai N. on
> But the limit is MAX_PATH characters. THat's what we've been discussing.
> In Unicode mode, the limit is MAX_PATH characters, which would occupy
> 2*MAX_PATH bytes. That is, MAX_PATH TCHARs, and therefore their comment
> is completely CONSISTENT with the fact that a 'character' is a 'TCHAR'.
> Since you can't use any multibyte encoding in CreateFile, I
> don't see where there is any problem here. 'character', in nearly every
> context we've discussed, means 'TCHAR'.

Nope. Norman has a point here.

Let's say I have an ANSI application. And it calls CreateFileA.

If I pass MAX_PATH low-ascii characters (let's say "aaaa...aaa"), all is nice
and dandy.

If I pass MAX_PATH Kanji characters, they get converted to Unicode, I get
MAX_PATH Unicode code points, and all is well again. But MAX_PATH Kanji means
2 x MAX_PATH char in ANSI, meaning 2 x MAX_PATH TCHARs.
So in this case it the MAX_PATH characters limit mean really "user
characters", not TCHARs.

> Since you can't use any multibyte encoding in CreateFile,
You can if you are on a MBCS system (ie Japanese), because you pass ANSI
strings, which are MBCS.


--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
From: Mihai N. on
> All A-suffix APIs use Unicode internally. The entire kernel is written in
> terms of Unicode, so all A-suffix APIs first convert the ANSI text to
> Unicode and then call the actual internal implementation of the API.
I know :-) http://www.mihai-nita.net/20050306b.shtml


> This means that if you pass in a UTF8 string,
> it isn't seen as UTF8, it's seen as 8-bit ANSI bytes, and will be converted
> to 16-bit bytes as if it were a sequence of 8-bit characters, which leads
> to the comment that "UTF-8 is not supported".
So it's true :-)

> It *is* supported, but not at the kernel API interface level.
Which means that the level of support for UTF-8 is lower than the level of
support for Shift-JIS (which is ANSI cp for Japanese).
This is unlinke the Unix/Linux world, where if I set the locale to
ja_JP.euc_JP or ja_JP.UTF-8, everything works the same, all APIs are ok.
I can do strupr, fopen a file name with Japanese name, and so on.
The two charsets are equally supported.

We can say "in Win there is partial support for UTF-8" and call it a day :-)
We both understand what it means :-)


--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email