From: Mihai N. on
> This is still just guesswork on my part, but I would rate it high in the
> list of probable causes.

Right on.

The encoding is UTF-8 (RFC 2640), and the buffer sizes in APIs
are usualy expressed in coding units (in this case meaning bytes).


The Japanese Kanji normally take 3 bytes to encode.
Stuff beyond BMP (with surrogates in UTF-16) needs 4 bytes,
but there is no much there used for Japanese (ok, there are
some characters in a 2004 JIS standard, but I have big doubts
you will see them in "real life" :-)




--
Mihai Nita [Microsoft MVP, Visual C++]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

From: Joseph M. Newcomer on
Check the WideCharToMultiByte option. Click the arrow that points down to the MultiByte
window; the hex for the text in the lower window appears below that window.

CP_UTF8 should be the last code page in the "Code Page" list.
joe
On Tue, 29 Sep 2009 15:07:47 -0700 (PDT), ksr <sujatha.kokkirala(a)gmail.com> wrote:

>On Sep 29, 12:27�pm, Joseph M. Newcomer <newco...(a)flounder.com> wrote:
>> It looks like you might have hit some buffer limitation on the other side of the
>> connection. �It has occurred to me that the buffer might be MAX_PATH but the ftp system
>> might choose to use UTF-8 encoding. �Therfore, a 260-character Japanese name would need to
>> be encoded using 520 characters in UTF-8, and this might be where the problem is.
>>
>> You can check for surrogates (although I suspect this is now NOT the problem!) by getting
>> the file name as a string and printing out the bytes of the string. �The look to see if
>> any of the bytes are in the surrogate range. �But I suspect that a UTF-8 encoding might be
>> the cuplrit.
>>
>> Try encoding the filename in UTF-8. �Note that you can do this by using my Locale
>> Explorer, choosing the MultiByte tab, pasting the Japanese filename in the top window, and
>> using WideCharToMultiByte with the UTF-8 locale selected. �Then look at the result and see
>> what would happen if you accepted, say, the first 260 UTF-8 characters. �Since some UTF-8
>> encodings take more than 2 characters, there is a possibility that this is what you are
>> seeing that creates the even smaller limit (128 vs. 130)
>>
>> This is still just guesswork on my part, but I would rate it high in the list of probable
>> causes.
>> � � � � � � � � � � � � � � � � � � � � joe
>>
>>
>>
>>
>>
>> On Tue, 29 Sep 2009 09:20:23 -0700 (PDT), ksr <sujatha.kokkir...(a)gmail.com> wrote:
>> >On Sep 29, 1:38�am, "Mihai N." <nmihai_year_2...(a)yahoo.com> wrote:
>> >> > I am using WinInet API FtpFindFirstFile to enumerate files and folders
>> >> > on FTP server. It works fine for filenames that have english
>> >> > characters and filepath upto 260 characters. But for filenames that
>> >> > have Japanese characters it fails.
>> >> > For Japanese filenames it works fine upto 128 characters, but fails on
>> >> > longer filenames. It is a unicode compiled project, my question is,
>> >> > why is it failing to read upto 260 characters for japanese filenames.
>> >> > I tried by explicitly using FtpFindFirstFileW, but it does not work.
>> >> > Please help.
>>
>> >> I would try to connect with a telnet to the ftp server and see if
>> >> it supports RFC 2640 ("Internationalization of the File Transfer Protocol")
>> >> Most servers don't.
>>
>> >> If it is supported, then I would do some digging to see if FtpFindFirstFile
>> >> understands it. It is possible that it is not.
>>
>> >> If it works for short Japanese file names, but not for longer ones,
>> >> I would suspect some buffer lenght parameter is wrong.
>>
>> >> --
>> >> Mihai Nita [Microsoft MVP, Visual C++]http://www.mihai-nita.net
>> >> ------------------------------------------
>> >> Replace _year_ with _ to get the real email
>>
>> >Thank you for your responses.
>>
>> >Joe, to your questions:
>>
>> >On a non-Japanese windows, using characters A-Z0-9, I can read
>> >filenames upto 260 characters (path+filename+ext), explorer limits the
>> >length to 260 characters and FTP can read this path.
>> >On a Japanese windows, using characters A-Z0-9, again I can read
>> >filenames upto 260 characters (path+filename+ext)
>>
>> >However on a Japanese windows, using Japanese characters I can
>> >consistently see that it can read filenames upto 128 characters
>> >(excluding extension). This is irrespective of path length (ie, path
>> >could contain Japanese or A-Z0-9 characters). The number of characters
>> >in the filename is 128 but byte count is 256, if you include file
>> >extension, number of characters is 132 and byte count is 260.
>>
>> >It looks like there is a bug.
>>
>> >Can you explain how I can check this?
>> >"Also, check whether or not your �Japanese characters require Unicode
>> >surrogates for UTF-16 encoding. � "
>>
>> >Let me know.
>>
>> >Thanks,
>> >ksr
>>
>> Joseph M. Newcomer [MVP]
>> email: newco...(a)flounder.com
>> Web:http://www.flounder.com
>> MVP Tips:http://www.flounder.com/mvp_tips.htm- Hide quoted text -
>>
>> - Show quoted text -
>
>In locale explorer, should I select UTF-8 under CodePage? I don't see
>UTF-8 in the locale list. I pasted the 260 character Japanese filename
>using WideCharToMultiByte. The hex values are showing in the window
>below. Where will the result in UTF-8 be displayed?
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
From: ksr on
On Sep 30, 4:47 am, Joseph M. Newcomer <newco...(a)flounder.com> wrote:
> Check the WideCharToMultiByte option. Click the arrow that points down to the MultiByte
> window; the hex for the text in the lower window appears below that window.
>
> CP_UTF8 should be the last code page in the "Code Page" list.
>                                 joe
>
>
>
>
>
> On Tue, 29 Sep 2009 15:07:47 -0700 (PDT), ksr <sujatha.kokkir...(a)gmail.com> wrote:
> >On Sep 29, 12:27 pm, Joseph M. Newcomer <newco...(a)flounder.com> wrote:
> >> It looks like you might have hit some buffer limitation on the other side of the
> >> connection.  It has occurred to me that the buffer might be MAX_PATH but the ftp system
> >> might choose to use UTF-8 encoding.  Therfore, a 260-character Japanese name would need to
> >> be encoded using 520 characters in UTF-8, and this might be where the problem is.
>
> >> You can check for surrogates (although I suspect this is now NOT the problem!) by getting
> >> the file name as a string and printing out the bytes of the string.  The look to see if
> >> any of the bytes are in the surrogate range.  But I suspect that a UTF-8 encoding might be
> >> the cuplrit.
>
> >> Try encoding the filename in UTF-8.  Note that you can do this by using my Locale
> >> Explorer, choosing the MultiByte tab, pasting the Japanese filename in the top window, and
> >> using WideCharToMultiByte with the UTF-8 locale selected.  Then look at the result and see
> >> what would happen if you accepted, say, the first 260 UTF-8 characters..  Since some UTF-8
> >> encodings take more than 2 characters, there is a possibility that this is what you are
> >> seeing that creates the even smaller limit (128 vs. 130)
>
> >> This is still just guesswork on my part, but I would rate it high in the list of probable
> >> causes.
> >>                                         joe
>
> >> On Tue, 29 Sep 2009 09:20:23 -0700 (PDT), ksr <sujatha.kokkir...(a)gmail..com> wrote:
> >> >On Sep 29, 1:38 am, "Mihai N." <nmihai_year_2...(a)yahoo.com> wrote:
> >> >> > I am using WinInet API FtpFindFirstFile to enumerate files and folders
> >> >> > on FTP server. It works fine for filenames that have english
> >> >> > characters and filepath upto 260 characters. But for filenames that
> >> >> > have Japanese characters it fails.
> >> >> > For Japanese filenames it works fine upto 128 characters, but fails on
> >> >> > longer filenames. It is a unicode compiled project, my question is,
> >> >> > why is it failing to read upto 260 characters for japanese filenames.
> >> >> > I tried by explicitly using FtpFindFirstFileW, but it does not work.
> >> >> > Please help.
>
> >> >> I would try to connect with a telnet to the ftp server and see if
> >> >> it supports RFC 2640 ("Internationalization of the File Transfer Protocol")
> >> >> Most servers don't.
>
> >> >> If it is supported, then I would do some digging to see if FtpFindFirstFile
> >> >> understands it. It is possible that it is not.
>
> >> >> If it works for short Japanese file names, but not for longer ones,
> >> >> I would suspect some buffer lenght parameter is wrong.
>
> >> >> --
> >> >> Mihai Nita [Microsoft MVP, Visual C++]http://www.mihai-nita.net
> >> >> ------------------------------------------
> >> >> Replace _year_ with _ to get the real email
>
> >> >Thank you for your responses.
>
> >> >Joe, to your questions:
>
> >> >On a non-Japanese windows, using characters A-Z0-9, I can read
> >> >filenames upto 260 characters (path+filename+ext), explorer limits the
> >> >length to 260 characters and FTP can read this path.
> >> >On a Japanese windows, using characters A-Z0-9, again I can read
> >> >filenames upto 260 characters (path+filename+ext)
>
> >> >However on a Japanese windows, using Japanese characters I can
> >> >consistently see that it can read filenames upto 128 characters
> >> >(excluding extension). This is irrespective of path length (ie, path
> >> >could contain Japanese or A-Z0-9 characters). The number of characters
> >> >in the filename is 128 but byte count is 256, if you include file
> >> >extension, number of characters is 132 and byte count is 260.
>
> >> >It looks like there is a bug.
>
> >> >Can you explain how I can check this?
> >> >"Also, check whether or not your  Japanese characters require Unicode
> >> >surrogates for UTF-16 encoding.   "
>
> >> >Let me know.
>
> >> >Thanks,
> >> >ksr
>
> >> Joseph M. Newcomer [MVP]
> >> email: newco...(a)flounder.com
> >> Web:http://www.flounder.com
> >> MVP Tips:http://www.flounder.com/mvp_tips.htm-Hide quoted text -
>
> >> - Show quoted text -
>
> >In locale explorer, should I select UTF-8 under CodePage? I don't see
> >UTF-8 in the locale list. I pasted the 260 character Japanese filename
> >using WideCharToMultiByte. The hex values are showing in the window
> >below. Where will the result in UTF-8 be displayed?
>
> Joseph M. Newcomer [MVP]
> email: newco...(a)flounder.com
> Web:http://www.flounder.com
> MVP Tips:http://www.flounder.com/mvp_tips.htm- Hide quoted text -
>
> - Show quoted text -


Try encoding the filename in UTF-8. Note that you can do this by
using my Locale
Explorer, choosing the MultiByte tab, pasting the Japanese filename in
the top window, and
using WideCharToMultiByte with the UTF-8 locale selected. Then look
at the result and see
what would happen if you accepted, say, the first 260 UTF-8
characters. Since some UTF-8
encodings take more than 2 characters, there is a possibility that
this is what you are
seeing that creates the even smaller limit (128 vs. 130)


> Check the WideCharToMultiByte option. Click the arrow that points down to the MultiByte
> window; the hex for the text in the lower window appears below that window.
>

When I pasted the Japanese filename and click the arrow to Multibyte,
the text appears that appears in the top windows is not readable, the
hex values appear in the lower window, but I don't know how to convert
them to readable UTF-8 characters. May be I am missing something here?

The other test suggested by Mihai to connect to ftp server using
telnet gave this result:

211-FEAT
SIZE
MDTM
211 END

So it looks my ftp server does not support internalization.
How can I make it support internalization? like is there anything I
can install/download to enable this support?

Thanks,
ksr
From: Mihai N. on

> So it looks my ftp server does not support internalization.
> How can I make it support internalization? like is there anything I
> can install/download to enable this support?

Depends on the server software.
Same with the HTTP servers, there is no such thing as "http server"
You have IIS, Apache, etc.
Same here.

So you have to figure out what ftp server is used, version,
then dig into that specific server documentation.


--
Mihai Nita [Microsoft MVP, Visual C++]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

From: Joseph M. Newcomer on
If the text does not appear readable, you will need to select an appropriate font. Go to
the setup menu and set either the default font or create an entry in the font map for a
font that contains the characters you need. I use Arial Unicode MS if it is installed,
but if that font is not rich enough, you need to supply your own.

When you click WideCharToMultiByte, you should see "readable" characters for the UTF-8
encoding. They will look weird because you will get characters in the 128-255 range. Not
all of these will have printable representations in whatever the current font is (see
above comment about setting fonts) But in either case, you will also have the hex values
visible.
joe


On Wed, 30 Sep 2009 16:20:28 -0700 (PDT), ksr <sujatha.kokkirala(a)gmail.com> wrote:

>On Sep 30, 4:47�am, Joseph M. Newcomer <newco...(a)flounder.com> wrote:
>> Check the WideCharToMultiByte option. Click the arrow that points down to the MultiByte
>> window; the hex for the text in the lower window appears below that window.
>>
>> CP_UTF8 should be the last code page in the "Code Page" list.
>> � � � � � � � � � � � � � � � � joe
>>
>>
>>
>>
>>
>> On Tue, 29 Sep 2009 15:07:47 -0700 (PDT), ksr <sujatha.kokkir...(a)gmail.com> wrote:
>> >On Sep 29, 12:27�pm, Joseph M. Newcomer <newco...(a)flounder.com> wrote:
>> >> It looks like you might have hit some buffer limitation on the other side of the
>> >> connection. �It has occurred to me that the buffer might be MAX_PATH but the ftp system
>> >> might choose to use UTF-8 encoding. �Therfore, a 260-character Japanese name would need to
>> >> be encoded using 520 characters in UTF-8, and this might be where the problem is.
>>
>> >> You can check for surrogates (although I suspect this is now NOT the problem!) by getting
>> >> the file name as a string and printing out the bytes of the string. �The look to see if
>> >> any of the bytes are in the surrogate range. �But I suspect that a UTF-8 encoding might be
>> >> the cuplrit.
>>
>> >> Try encoding the filename in UTF-8. �Note that you can do this by using my Locale
>> >> Explorer, choosing the MultiByte tab, pasting the Japanese filename in the top window, and
>> >> using WideCharToMultiByte with the UTF-8 locale selected. �Then look at the result and see
>> >> what would happen if you accepted, say, the first 260 UTF-8 characters. �Since some UTF-8
>> >> encodings take more than 2 characters, there is a possibility that this is what you are
>> >> seeing that creates the even smaller limit (128 vs. 130)
>>
>> >> This is still just guesswork on my part, but I would rate it high in the list of probable
>> >> causes.
>> >> � � � � � � � � � � � � � � � � � � � � joe
>>
>> >> On Tue, 29 Sep 2009 09:20:23 -0700 (PDT), ksr <sujatha.kokkir...(a)gmail.com> wrote:
>> >> >On Sep 29, 1:38�am, "Mihai N." <nmihai_year_2...(a)yahoo.com> wrote:
>> >> >> > I am using WinInet API FtpFindFirstFile to enumerate files and folders
>> >> >> > on FTP server. It works fine for filenames that have english
>> >> >> > characters and filepath upto 260 characters. But for filenames that
>> >> >> > have Japanese characters it fails.
>> >> >> > For Japanese filenames it works fine upto 128 characters, but fails on
>> >> >> > longer filenames. It is a unicode compiled project, my question is,
>> >> >> > why is it failing to read upto 260 characters for japanese filenames.
>> >> >> > I tried by explicitly using FtpFindFirstFileW, but it does not work.
>> >> >> > Please help.
>>
>> >> >> I would try to connect with a telnet to the ftp server and see if
>> >> >> it supports RFC 2640 ("Internationalization of the File Transfer Protocol")
>> >> >> Most servers don't.
>>
>> >> >> If it is supported, then I would do some digging to see if FtpFindFirstFile
>> >> >> understands it. It is possible that it is not.
>>
>> >> >> If it works for short Japanese file names, but not for longer ones,
>> >> >> I would suspect some buffer lenght parameter is wrong.
>>
>> >> >> --
>> >> >> Mihai Nita [Microsoft MVP, Visual C++]http://www.mihai-nita.net
>> >> >> ------------------------------------------
>> >> >> Replace _year_ with _ to get the real email
>>
>> >> >Thank you for your responses.
>>
>> >> >Joe, to your questions:
>>
>> >> >On a non-Japanese windows, using characters A-Z0-9, I can read
>> >> >filenames upto 260 characters (path+filename+ext), explorer limits the
>> >> >length to 260 characters and FTP can read this path.
>> >> >On a Japanese windows, using characters A-Z0-9, again I can read
>> >> >filenames upto 260 characters (path+filename+ext)
>>
>> >> >However on a Japanese windows, using Japanese characters I can
>> >> >consistently see that it can read filenames upto 128 characters
>> >> >(excluding extension). This is irrespective of path length (ie, path
>> >> >could contain Japanese or A-Z0-9 characters). The number of characters
>> >> >in the filename is 128 but byte count is 256, if you include file
>> >> >extension, number of characters is 132 and byte count is 260.
>>
>> >> >It looks like there is a bug.
>>
>> >> >Can you explain how I can check this?
>> >> >"Also, check whether or not your �Japanese characters require Unicode
>> >> >surrogates for UTF-16 encoding. � "
>>
>> >> >Let me know.
>>
>> >> >Thanks,
>> >> >ksr
>>
>> >> Joseph M. Newcomer [MVP]
>> >> email: newco...(a)flounder.com
>> >> Web:http://www.flounder.com
>> >> MVP Tips:http://www.flounder.com/mvp_tips.htm-Hide quoted text -
>>
>> >> - Show quoted text -
>>
>> >In locale explorer, should I select UTF-8 under CodePage? I don't see
>> >UTF-8 in the locale list. I pasted the 260 character Japanese filename
>> >using WideCharToMultiByte. The hex values are showing in the window
>> >below. Where will the result in UTF-8 be displayed?
>>
>> Joseph M. Newcomer [MVP]
>> email: newco...(a)flounder.com
>> Web:http://www.flounder.com
>> MVP Tips:http://www.flounder.com/mvp_tips.htm- Hide quoted text -
>>
>> - Show quoted text -
>
>
>Try encoding the filename in UTF-8. Note that you can do this by
>using my Locale
>Explorer, choosing the MultiByte tab, pasting the Japanese filename in
>the top window, and
>using WideCharToMultiByte with the UTF-8 locale selected. Then look
>at the result and see
>what would happen if you accepted, say, the first 260 UTF-8
>characters. Since some UTF-8
>encodings take more than 2 characters, there is a possibility that
>this is what you are
>seeing that creates the even smaller limit (128 vs. 130)
>
>
>> Check the WideCharToMultiByte option. Click the arrow that points down to the MultiByte
>> window; the hex for the text in the lower window appears below that window.
>>
>
>When I pasted the Japanese filename and click the arrow to Multibyte,
>the text appears that appears in the top windows is not readable, the
>hex values appear in the lower window, but I don't know how to convert
>them to readable UTF-8 characters. May be I am missing something here?
>
>The other test suggested by Mihai to connect to ftp server using
>telnet gave this result:
>
>211-FEAT
> SIZE
> MDTM
>211 END
>
>So it looks my ftp server does not support internalization.
>How can I make it support internalization? like is there anything I
>can install/download to enable this support?
>
>Thanks,
>ksr
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm