From: ksr on
On Oct 1, 7:55 am, Joseph M. Newcomer <newco...(a)flounder.com> wrote:
> If the text does not appear readable, you will need to select an appropriate font.  Go to
> the setup menu and set either the default font or create an entry in the font map for a
> font that contains the characters you need.  I use Arial Unicode MS if it is installed,
> but if that font is not rich enough, you need to supply your own.
>
> When you click WideCharToMultiByte, you should see "readable" characters for the UTF-8
> encoding.  They will look weird because you will get characters in the 128-255 range.  Not
> all of these will have printable representations in whatever the current font is (see
> above comment about setting fonts) But in either case, you will also have the hex values
> visible.
>                                         joe
>
>
>
>
>
> On Wed, 30 Sep 2009 16:20:28 -0700 (PDT), ksr <sujatha.kokkir...(a)gmail.com> wrote:
> >On Sep 30, 4:47 am, Joseph M. Newcomer <newco...(a)flounder.com> wrote:
> >> Check the WideCharToMultiByte option. Click the arrow that points down to the MultiByte
> >> window; the hex for the text in the lower window appears below that window.
>
> >> CP_UTF8 should be the last code page in the "Code Page" list.
> >>                                 joe
>
> >> On Tue, 29 Sep 2009 15:07:47 -0700 (PDT), ksr <sujatha.kokkir...(a)gmail..com> wrote:
> >> >On Sep 29, 12:27 pm, Joseph M. Newcomer <newco...(a)flounder.com> wrote:
> >> >> It looks like you might have hit some buffer limitation on the other side of the
> >> >> connection.  It has occurred to me that the buffer might be MAX_PATH but the ftp system
> >> >> might choose to use UTF-8 encoding.  Therfore, a 260-character Japanese name would need to
> >> >> be encoded using 520 characters in UTF-8, and this might be where the problem is.
>
> >> >> You can check for surrogates (although I suspect this is now NOT the problem!) by getting
> >> >> the file name as a string and printing out the bytes of the string.  The look to see if
> >> >> any of the bytes are in the surrogate range.  But I suspect that a UTF-8 encoding might be
> >> >> the cuplrit.
>
> >> >> Try encoding the filename in UTF-8.  Note that you can do this by using my Locale
> >> >> Explorer, choosing the MultiByte tab, pasting the Japanese filename in the top window, and
> >> >> using WideCharToMultiByte with the UTF-8 locale selected.  Then look at the result and see
> >> >> what would happen if you accepted, say, the first 260 UTF-8 characters.  Since some UTF-8
> >> >> encodings take more than 2 characters, there is a possibility that this is what you are
> >> >> seeing that creates the even smaller limit (128 vs. 130)
>
> >> >> This is still just guesswork on my part, but I would rate it high in the list of probable
> >> >> causes.
> >> >>                                         joe
>
> >> >> On Tue, 29 Sep 2009 09:20:23 -0700 (PDT), ksr <sujatha.kokkir...(a)gmail.com> wrote:
> >> >> >On Sep 29, 1:38 am, "Mihai N." <nmihai_year_2...(a)yahoo.com> wrote:
> >> >> >> > I am using WinInet API FtpFindFirstFile to enumerate files and folders
> >> >> >> > on FTP server. It works fine for filenames that have english
> >> >> >> > characters and filepath upto 260 characters. But for filenames that
> >> >> >> > have Japanese characters it fails.
> >> >> >> > For Japanese filenames it works fine upto 128 characters, but fails on
> >> >> >> > longer filenames. It is a unicode compiled project, my question is,
> >> >> >> > why is it failing to read upto 260 characters for japanese filenames.
> >> >> >> > I tried by explicitly using FtpFindFirstFileW, but it does not work.
> >> >> >> > Please help.
>
> >> >> >> I would try to connect with a telnet to the ftp server and see if
> >> >> >> it supports RFC 2640 ("Internationalization of the File Transfer Protocol")
> >> >> >> Most servers don't.
>
> >> >> >> If it is supported, then I would do some digging to see if FtpFindFirstFile
> >> >> >> understands it. It is possible that it is not.
>
> >> >> >> If it works for short Japanese file names, but not for longer ones,
> >> >> >> I would suspect some buffer lenght parameter is wrong.
>
> >> >> >> --
> >> >> >> Mihai Nita [Microsoft MVP, Visual C++]http://www.mihai-nita.net
> >> >> >> ------------------------------------------
> >> >> >> Replace _year_ with _ to get the real email
>
> >> >> >Thank you for your responses.
>
> >> >> >Joe, to your questions:
>
> >> >> >On a non-Japanese windows, using characters A-Z0-9, I can read
> >> >> >filenames upto 260 characters (path+filename+ext), explorer limits the
> >> >> >length to 260 characters and FTP can read this path.
> >> >> >On a Japanese windows, using characters A-Z0-9, again I can read
> >> >> >filenames upto 260 characters (path+filename+ext)
>
> >> >> >However on a Japanese windows, using Japanese characters I can
> >> >> >consistently see that it can read filenames upto 128 characters
> >> >> >(excluding extension). This is irrespective of path length (ie, path
> >> >> >could contain Japanese or A-Z0-9 characters). The number of characters
> >> >> >in the filename is 128 but byte count is 256, if you include file
> >> >> >extension, number of characters is 132 and byte count is 260.
>
> >> >> >It looks like there is a bug.
>
> >> >> >Can you explain how I can check this?
> >> >> >"Also, check whether or not your  Japanese characters require Unicode
> >> >> >surrogates for UTF-16 encoding.   "
>
> >> >> >Let me know.
>
> >> >> >Thanks,
> >> >> >ksr
>
> >> >> Joseph M. Newcomer [MVP]
> >> >> email: newco...(a)flounder.com
> >> >> Web:http://www.flounder.com
> >> >> MVP Tips:http://www.flounder.com/mvp_tips.htm-Hidequoted text -
>
> >> >> - Show quoted text -
>
> >> >In locale explorer, should I select UTF-8 under CodePage? I don't see
> >> >UTF-8 in the locale list. I pasted the 260 character Japanese filename
> >> >using WideCharToMultiByte. The hex values are showing in the window
> >> >below. Where will the result in UTF-8 be displayed?
>
> >> Joseph M. Newcomer [MVP]
> >> email: newco...(a)flounder.com
> >> Web:http://www.flounder.com
> >> MVP Tips:http://www.flounder.com/mvp_tips.htm-Hide quoted text -
>
> >> - Show quoted text -
>
> >Try encoding the filename in UTF-8.  Note that you can do this by
> >using my Locale
> >Explorer, choosing the MultiByte tab, pasting the Japanese filename in
> >the top window, and
> >using WideCharToMultiByte with the UTF-8 locale selected.  Then look
> >at the result and see
> >what would happen if you accepted, say, the first 260 UTF-8
> >characters.  Since some UTF-8
> >encodings take more than 2 characters, there is a possibility that
> >this is what you are
> >seeing that creates the even smaller limit (128 vs. 130)
>
> >> Check the WideCharToMultiByte option. Click the arrow that points down to the MultiByte
> >> window; the hex for the text in the lower window appears below that window.
>
> >When I pasted the Japanese filename and click the arrow to Multibyte,
> >the text appears that appears in the top windows is not readable, the
> >hex values appear in the lower window, but I don't know how to convert
> >them to readable UTF-8 characters. May be I am missing something here?
>
> >The other test suggested by Mihai to connect to ftp server using
> >telnet gave this result:
>
> >211-FEAT
> >    SIZE
> >    MDTM
> >211 END
>
> >So it looks my ftp server does not support internalization.
> >How can I make it support internalization? like is there anything I
> >can install/download to enable this support?
>
> >Thanks,
> >ksr
>
> Joseph M. Newcomer [MVP]
> email: newco...(a)flounder.com
> Web:http://www.flounder.com
> MVP Tips:http://www.flounder.com/mvp_tips.htm- Hide quoted text -
>
> - Show quoted text -


Then look at the result and see
what would happen if you accepted, say, the first 260 UTF-8
characters. Since some UTF-8
encodings take more than 2 characters, there is a possibility that
this is what you are
seeing that creates the even smaller limit (128 vs. 130)

When I set the default font to MS Mincho/Pincho, which is the font
we've used for Japanese characters, I still get unreadable characters
like below:
カキクケコサシスセソ.doc
I see hex values in the window below.
In my test, very simply I have some files with Japanese filenames in a
folder which is my FTP site, I am trying to enumerate files in this
folder using FTP API. Using FTPFindFirstFileA or FTPFindFirstFileW it
is reading files upto 128 characters.
I am not clear how to accept 260 characters, where do I specify the
UTF-8 encoding?

I am working on Win 2003, IIS 6.0 ftp server. It looks there is no
UTF-8/unicode support.

Thanks,
ksr
From: Joseph M. Newcomer on
See below...
On Thu, 1 Oct 2009 13:01:45 -0700 (PDT), ksr <sujatha.kokkirala(a)gmail.com> wrote:

>On Oct 1, 7:55�am, Joseph M. Newcomer <newco...(a)flounder.com> wrote:
>> If the text does not appear readable, you will need to select an appropriate font. �Go to
>> the setup menu and set either the default font or create an entry in the font map for a
>> font that contains the characters you need. �I use Arial Unicode MS if it is installed,
>> but if that font is not rich enough, you need to supply your own.
>>
>> When you click WideCharToMultiByte, you should see "readable" characters for the UTF-8
>> encoding. �They will look weird because you will get characters in the 128-255 range. �Not
>> all of these will have printable representations in whatever the current font is (see
>> above comment about setting fonts) But in either case, you will also have the hex values
>> visible.
>> � � � � � � � � � � � � � � � � � � � � joe
>>
>>
>>
>>
>>
>> On Wed, 30 Sep 2009 16:20:28 -0700 (PDT), ksr <sujatha.kokkir...(a)gmail.com> wrote:
>> >On Sep 30, 4:47�am, Joseph M. Newcomer <newco...(a)flounder.com> wrote:
>> >> Check the WideCharToMultiByte option. Click the arrow that points down to the MultiByte
>> >> window; the hex for the text in the lower window appears below that window.
>>
>> >> CP_UTF8 should be the last code page in the "Code Page" list.
>> >> � � � � � � � � � � � � � � � � joe
>>
>> >> On Tue, 29 Sep 2009 15:07:47 -0700 (PDT), ksr <sujatha.kokkir...(a)gmail.com> wrote:
>> >> >On Sep 29, 12:27�pm, Joseph M. Newcomer <newco...(a)flounder.com> wrote:
>> >> >> It looks like you might have hit some buffer limitation on the other side of the
>> >> >> connection. �It has occurred to me that the buffer might be MAX_PATH but the ftp system
>> >> >> might choose to use UTF-8 encoding. �Therfore, a 260-character Japanese name would need to
>> >> >> be encoded using 520 characters in UTF-8, and this might be where the problem is.
>>
>> >> >> You can check for surrogates (although I suspect this is now NOT the problem!) by getting
>> >> >> the file name as a string and printing out the bytes of the string. �The look to see if
>> >> >> any of the bytes are in the surrogate range. �But I suspect that a UTF-8 encoding might be
>> >> >> the cuplrit.
>>
>> >> >> Try encoding the filename in UTF-8. �Note that you can do this by using my Locale
>> >> >> Explorer, choosing the MultiByte tab, pasting the Japanese filename in the top window, and
>> >> >> using WideCharToMultiByte with the UTF-8 locale selected. �Then look at the result and see
>> >> >> what would happen if you accepted, say, the first 260 UTF-8 characters. �Since some UTF-8
>> >> >> encodings take more than 2 characters, there is a possibility that this is what you are
>> >> >> seeing that creates the even smaller limit (128 vs. 130)
>>
>> >> >> This is still just guesswork on my part, but I would rate it high in the list of probable
>> >> >> causes.
>> >> >> � � � � � � � � � � � � � � � � � � � � joe
>>
>> >> >> On Tue, 29 Sep 2009 09:20:23 -0700 (PDT), ksr <sujatha.kokkir...(a)gmail.com> wrote:
>> >> >> >On Sep 29, 1:38�am, "Mihai N." <nmihai_year_2...(a)yahoo.com> wrote:
>> >> >> >> > I am using WinInet API FtpFindFirstFile to enumerate files and folders
>> >> >> >> > on FTP server. It works fine for filenames that have english
>> >> >> >> > characters and filepath upto 260 characters. But for filenames that
>> >> >> >> > have Japanese characters it fails.
>> >> >> >> > For Japanese filenames it works fine upto 128 characters, but fails on
>> >> >> >> > longer filenames. It is a unicode compiled project, my question is,
>> >> >> >> > why is it failing to read upto 260 characters for japanese filenames.
>> >> >> >> > I tried by explicitly using FtpFindFirstFileW, but it does not work.
>> >> >> >> > Please help.
>>
>> >> >> >> I would try to connect with a telnet to the ftp server and see if
>> >> >> >> it supports RFC 2640 ("Internationalization of the File Transfer Protocol")
>> >> >> >> Most servers don't.
>>
>> >> >> >> If it is supported, then I would do some digging to see if FtpFindFirstFile
>> >> >> >> understands it. It is possible that it is not.
>>
>> >> >> >> If it works for short Japanese file names, but not for longer ones,
>> >> >> >> I would suspect some buffer lenght parameter is wrong.
>>
>> >> >> >> --
>> >> >> >> Mihai Nita [Microsoft MVP, Visual C++]http://www.mihai-nita.net
>> >> >> >> ------------------------------------------
>> >> >> >> Replace _year_ with _ to get the real email
>>
>> >> >> >Thank you for your responses.
>>
>> >> >> >Joe, to your questions:
>>
>> >> >> >On a non-Japanese windows, using characters A-Z0-9, I can read
>> >> >> >filenames upto 260 characters (path+filename+ext), explorer limits the
>> >> >> >length to 260 characters and FTP can read this path.
>> >> >> >On a Japanese windows, using characters A-Z0-9, again I can read
>> >> >> >filenames upto 260 characters (path+filename+ext)
>>
>> >> >> >However on a Japanese windows, using Japanese characters I can
>> >> >> >consistently see that it can read filenames upto 128 characters
>> >> >> >(excluding extension). This is irrespective of path length (ie, path
>> >> >> >could contain Japanese or A-Z0-9 characters). The number of characters
>> >> >> >in the filename is 128 but byte count is 256, if you include file
>> >> >> >extension, number of characters is 132 and byte count is 260.
>>
>> >> >> >It looks like there is a bug.
>>
>> >> >> >Can you explain how I can check this?
>> >> >> >"Also, check whether or not your �Japanese characters require Unicode
>> >> >> >surrogates for UTF-16 encoding. � "
>>
>> >> >> >Let me know.
>>
>> >> >> >Thanks,
>> >> >> >ksr
>>
>> >> >> Joseph M. Newcomer [MVP]
>> >> >> email: newco...(a)flounder.com
>> >> >> Web:http://www.flounder.com
>> >> >> MVP Tips:http://www.flounder.com/mvp_tips.htm-Hidequoted text -
>>
>> >> >> - Show quoted text -
>>
>> >> >In locale explorer, should I select UTF-8 under CodePage? I don't see
>> >> >UTF-8 in the locale list. I pasted the 260 character Japanese filename
>> >> >using WideCharToMultiByte. The hex values are showing in the window
>> >> >below. Where will the result in UTF-8 be displayed?
>>
>> >> Joseph M. Newcomer [MVP]
>> >> email: newco...(a)flounder.com
>> >> Web:http://www.flounder.com
>> >> MVP Tips:http://www.flounder.com/mvp_tips.htm-Hide quoted text -
>>
>> >> - Show quoted text -
>>
>> >Try encoding the filename in UTF-8. �Note that you can do this by
>> >using my Locale
>> >Explorer, choosing the MultiByte tab, pasting the Japanese filename in
>> >the top window, and
>> >using WideCharToMultiByte with the UTF-8 locale selected. �Then look
>> >at the result and see
>> >what would happen if you accepted, say, the first 260 UTF-8
>> >characters. �Since some UTF-8
>> >encodings take more than 2 characters, there is a possibility that
>> >this is what you are
>> >seeing that creates the even smaller limit (128 vs. 130)
>>
>> >> Check the WideCharToMultiByte option. Click the arrow that points down to the MultiByte
>> >> window; the hex for the text in the lower window appears below that window.
>>
>> >When I pasted the Japanese filename and click the arrow to Multibyte,
>> >the text appears that appears in the top windows is not readable, the
>> >hex values appear in the lower window, but I don't know how to convert
>> >them to readable UTF-8 characters. May be I am missing something here?
>>
>> >The other test suggested by Mihai to connect to ftp server using
>> >telnet gave this result:
>>
>> >211-FEAT
>> > � �SIZE
>> > � �MDTM
>> >211 END
>>
>> >So it looks my ftp server does not support internalization.
>> >How can I make it support internalization? like is there anything I
>> >can install/download to enable this support?
>>
>> >Thanks,
>> >ksr
>>
>> Joseph M. Newcomer [MVP]
>> email: newco...(a)flounder.com
>> Web:http://www.flounder.com
>> MVP Tips:http://www.flounder.com/mvp_tips.htm- Hide quoted text -
>>
>> - Show quoted text -
>
>
>Then look at the result and see
>what would happen if you accepted, say, the first 260 UTF-8
>characters. Since some UTF-8
>encodings take more than 2 characters, there is a possibility that
>this is what you are
>seeing that creates the even smaller limit (128 vs. 130)
>
>When I set the default font to MS Mincho/Pincho, which is the font
>we've used for Japanese characters, I still get unreadable characters
>like below:
>カキクケコサシスセソ.doc
>I see hex values in the window below.
>In my test, very simply I have some files with Japanese filenames in a
>folder which is my FTP site, I am trying to enumerate files in this
>folder using FTP API. Using FTPFindFirstFileA or FTPFindFirstFileW it
>is reading files upto 128 characters.
>I am not clear how to accept 260 characters, where do I specify the
>UTF-8 encoding?
****
You don't. This probably happens By Serious Magic under the floor. But it suggests that
the limit is because the fixed-size buffer is filled with UTF-8 characters, and when it is
full, that's all that can be sent. So this sounds like a design error.

You'd have to post this in a NG where IIS 6.0 ftp server experts hang out to get a more
detailed answer.
joe
****
>
>I am working on Win 2003, IIS 6.0 ftp server. It looks there is no
>UTF-8/unicode support.
>
>Thanks,
>ksr
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Mihai N. on

> I am working on Win 2003, IIS 6.0 ftp server. It looks there is no
> UTF-8/unicode support.

About IIS 7.0
"One of the most significant features in the new FTP server is support for
FTP over SSL. The new FTP server also supports other Internet improvements
such as UTF8 and IPv6."
http://www.petri.co.il/ftp-publishing-service-iis7-windows-server-2008.htm


So it looks like there is no UTF-8 support in the ftp server of IIS 6.0.



--
Mihai Nita [Microsoft MVP, Visual C++]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

From: ksr on
On Oct 2, 2:18 am, "Mihai N." <nmihai_year_2...(a)yahoo.com> wrote:
> > I am working on Win 2003, IIS 6.0 ftp server. It looks there is no
> > UTF-8/unicode support.
>
> About IIS 7.0
>   "One of the most significant features in the new FTP server is support for
>   FTP over SSL. The new FTP server also supports other Internet improvements
>   such as UTF8 and IPv6."http://www.petri.co.il/ftp-publishing-service-iis7-windows-server-200...
>
> So it looks like there is no UTF-8 support in the ftp server of IIS 6.0.
>
> --
> Mihai Nita [Microsoft MVP, Visual C++]http://www.mihai-nita.net
> ------------------------------------------
> Replace _year_ with _ to get the real email

Thank you Joe, Mihai for your help.
I'll try to set up IIS 7.0 and see if that helps.

ksr