From: ksr on
On Sep 29, 1:38 am, "Mihai N." <nmihai_year_2...(a)yahoo.com> wrote:
> > I am using WinInet API FtpFindFirstFile to enumerate files and folders
> > on FTP server. It works fine for filenames that have english
> > characters and filepath upto 260 characters. But for filenames that
> > have Japanese characters it fails.
> > For Japanese filenames it works fine upto 128 characters, but fails on
> > longer filenames. It is a unicode compiled project, my question is,
> > why is it failing to read upto 260 characters for japanese filenames.
> > I tried by explicitly using FtpFindFirstFileW, but it does not work.
> > Please help.
>
> I would try to connect with a telnet to the ftp server and see if
> it supports RFC 2640 ("Internationalization of the File Transfer Protocol")
> Most servers don't.
>
> If it is supported, then I would do some digging to see if FtpFindFirstFile
> understands it. It is possible that it is not.
>
> If it works for short Japanese file names, but not for longer ones,
> I would suspect some buffer lenght parameter is wrong.
>
> --
> Mihai Nita [Microsoft MVP, Visual C++]http://www.mihai-nita.net
> ------------------------------------------
> Replace _year_ with _ to get the real email


Thank you for your responses.

Joe, to your questions:

On a non-Japanese windows, using characters A-Z0-9, I can read
filenames upto 260 characters (path+filename+ext), explorer limits the
length to 260 characters and FTP can read this path.
On a Japanese windows, using characters A-Z0-9, again I can read
filenames upto 260 characters (path+filename+ext)

However on a Japanese windows, using Japanese characters I can
consistently see that it can read filenames upto 128 characters
(excluding extension). This is irrespective of path length (ie, path
could contain Japanese or A-Z0-9 characters). The number of characters
in the filename is 128 but byte count is 256, if you include file
extension, number of characters is 132 and byte count is 260.

It looks like there is a bug.

Can you explain how I can check this?
"Also, check whether or not your Japanese characters require Unicode
surrogates for UTF-16 encoding. "

Let me know.

Thanks,
ksr


From: Joseph M. Newcomer on
It looks like you might have hit some buffer limitation on the other side of the
connection. It has occurred to me that the buffer might be MAX_PATH but the ftp system
might choose to use UTF-8 encoding. Therfore, a 260-character Japanese name would need to
be encoded using 520 characters in UTF-8, and this might be where the problem is.

You can check for surrogates (although I suspect this is now NOT the problem!) by getting
the file name as a string and printing out the bytes of the string. The look to see if
any of the bytes are in the surrogate range. But I suspect that a UTF-8 encoding might be
the cuplrit.

Try encoding the filename in UTF-8. Note that you can do this by using my Locale
Explorer, choosing the MultiByte tab, pasting the Japanese filename in the top window, and
using WideCharToMultiByte with the UTF-8 locale selected. Then look at the result and see
what would happen if you accepted, say, the first 260 UTF-8 characters. Since some UTF-8
encodings take more than 2 characters, there is a possibility that this is what you are
seeing that creates the even smaller limit (128 vs. 130)

This is still just guesswork on my part, but I would rate it high in the list of probable
causes.
joe
On Tue, 29 Sep 2009 09:20:23 -0700 (PDT), ksr <sujatha.kokkirala(a)gmail.com> wrote:

>On Sep 29, 1:38�am, "Mihai N." <nmihai_year_2...(a)yahoo.com> wrote:
>> > I am using WinInet API FtpFindFirstFile to enumerate files and folders
>> > on FTP server. It works fine for filenames that have english
>> > characters and filepath upto 260 characters. But for filenames that
>> > have Japanese characters it fails.
>> > For Japanese filenames it works fine upto 128 characters, but fails on
>> > longer filenames. It is a unicode compiled project, my question is,
>> > why is it failing to read upto 260 characters for japanese filenames.
>> > I tried by explicitly using FtpFindFirstFileW, but it does not work.
>> > Please help.
>>
>> I would try to connect with a telnet to the ftp server and see if
>> it supports RFC 2640 ("Internationalization of the File Transfer Protocol")
>> Most servers don't.
>>
>> If it is supported, then I would do some digging to see if FtpFindFirstFile
>> understands it. It is possible that it is not.
>>
>> If it works for short Japanese file names, but not for longer ones,
>> I would suspect some buffer lenght parameter is wrong.
>>
>> --
>> Mihai Nita [Microsoft MVP, Visual C++]http://www.mihai-nita.net
>> ------------------------------------------
>> Replace _year_ with _ to get the real email
>
>
>Thank you for your responses.
>
>Joe, to your questions:
>
>On a non-Japanese windows, using characters A-Z0-9, I can read
>filenames upto 260 characters (path+filename+ext), explorer limits the
>length to 260 characters and FTP can read this path.
>On a Japanese windows, using characters A-Z0-9, again I can read
>filenames upto 260 characters (path+filename+ext)
>
>However on a Japanese windows, using Japanese characters I can
>consistently see that it can read filenames upto 128 characters
>(excluding extension). This is irrespective of path length (ie, path
>could contain Japanese or A-Z0-9 characters). The number of characters
>in the filename is 128 but byte count is 256, if you include file
>extension, number of characters is 132 and byte count is 260.
>
>It looks like there is a bug.
>
>Can you explain how I can check this?
>"Also, check whether or not your Japanese characters require Unicode
>surrogates for UTF-16 encoding. "
>
>Let me know.
>
>Thanks,
>ksr
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
From: ksr on
On Sep 29, 12:27 pm, Joseph M. Newcomer <newco...(a)flounder.com> wrote:
> It looks like you might have hit some buffer limitation on the other side of the
> connection.  It has occurred to me that the buffer might be MAX_PATH but the ftp system
> might choose to use UTF-8 encoding.  Therfore, a 260-character Japanese name would need to
> be encoded using 520 characters in UTF-8, and this might be where the problem is.
>
> You can check for surrogates (although I suspect this is now NOT the problem!) by getting
> the file name as a string and printing out the bytes of the string.  The look to see if
> any of the bytes are in the surrogate range.  But I suspect that a UTF-8 encoding might be
> the cuplrit.
>
> Try encoding the filename in UTF-8.  Note that you can do this by using my Locale
> Explorer, choosing the MultiByte tab, pasting the Japanese filename in the top window, and
> using WideCharToMultiByte with the UTF-8 locale selected.  Then look at the result and see
> what would happen if you accepted, say, the first 260 UTF-8 characters.  Since some UTF-8
> encodings take more than 2 characters, there is a possibility that this is what you are
> seeing that creates the even smaller limit (128 vs. 130)
>
> This is still just guesswork on my part, but I would rate it high in the list of probable
> causes.
>                                         joe
>
>
>
>
>
> On Tue, 29 Sep 2009 09:20:23 -0700 (PDT), ksr <sujatha.kokkir...(a)gmail.com> wrote:
> >On Sep 29, 1:38 am, "Mihai N." <nmihai_year_2...(a)yahoo.com> wrote:
> >> > I am using WinInet API FtpFindFirstFile to enumerate files and folders
> >> > on FTP server. It works fine for filenames that have english
> >> > characters and filepath upto 260 characters. But for filenames that
> >> > have Japanese characters it fails.
> >> > For Japanese filenames it works fine upto 128 characters, but fails on
> >> > longer filenames. It is a unicode compiled project, my question is,
> >> > why is it failing to read upto 260 characters for japanese filenames..
> >> > I tried by explicitly using FtpFindFirstFileW, but it does not work.
> >> > Please help.
>
> >> I would try to connect with a telnet to the ftp server and see if
> >> it supports RFC 2640 ("Internationalization of the File Transfer Protocol")
> >> Most servers don't.
>
> >> If it is supported, then I would do some digging to see if FtpFindFirstFile
> >> understands it. It is possible that it is not.
>
> >> If it works for short Japanese file names, but not for longer ones,
> >> I would suspect some buffer lenght parameter is wrong.
>
> >> --
> >> Mihai Nita [Microsoft MVP, Visual C++]http://www.mihai-nita.net
> >> ------------------------------------------
> >> Replace _year_ with _ to get the real email
>
> >Thank you for your responses.
>
> >Joe, to your questions:
>
> >On a non-Japanese windows, using characters A-Z0-9, I can read
> >filenames upto 260 characters (path+filename+ext), explorer limits the
> >length to 260 characters and FTP can read this path.
> >On a Japanese windows, using characters A-Z0-9, again I can read
> >filenames upto 260 characters (path+filename+ext)
>
> >However on a Japanese windows, using Japanese characters I can
> >consistently see that it can read filenames upto 128 characters
> >(excluding extension). This is irrespective of path length (ie, path
> >could contain Japanese or A-Z0-9 characters). The number of characters
> >in the filename is 128 but byte count is 256, if you include file
> >extension, number of characters is 132 and byte count is 260.
>
> >It looks like there is a bug.
>
> >Can you explain how I can check this?
> >"Also, check whether or not your  Japanese characters require Unicode
> >surrogates for UTF-16 encoding.   "
>
> >Let me know.
>
> >Thanks,
> >ksr
>
> Joseph M. Newcomer [MVP]
> email: newco...(a)flounder.com
> Web:http://www.flounder.com
> MVP Tips:http://www.flounder.com/mvp_tips.htm- Hide quoted text -
>
> - Show quoted text -

In locale explorer, should I select UTF-8 under CodePage? I don't see
UTF-8 in the locale list. I pasted the 260 character Japanese filename
using WideCharToMultiByte. The hex values are showing in the window
below. Where will the result in UTF-8 be displayed?
From: Scot T Brennecke on
Joseph M. Newcomer wrote:

> Sadly, someone believed the completely silly nonsense about having only one hyperlink per
> page for a hyperlinked term, and therefore it is hard to find the hyperlink that actually
> leads you to the WIN32_FIND_DATA structure (it is at the top, and if you scroll it off,
> the next several instances are not hyperlinked. I believe this idiocy is due to
> unemployable English majors who feel they have to impose nonsensical standards unsuitable
> for hypertext documents because someone once told them this in a class.
> joe
>
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

Yeah, I agree. My guess is that the rule was created a long time ago, when help page editing wasn't as easily automated, and it was
deemed safer to have only one hyperlink that needed to be updated (if ever). Having two or more might mean they'd get out of synch.
This, of course, is obsolete thinking -- and even then, might not even be the real reason. Just my guess. I can't think of a
better one.
From: Mihai N. on

> I would try to connect with a telnet to the ftp server and see if
> it supports RFC 2640 ("Internationalization of the File Transfer Protocol")
> Most servers don't.

------------
Question by email:

>> Could you please help on how I can try this? Can you give me steps to
>> try this.

It is "good form" to follow up on public posts in the same public place
where the initial tread happened.
There is no shame to ask something, and asking in the public place help
other learn (and this is how we all learned a lot of stuff)

------------

How to connect to ftp using telnet:

From the Windows command line start
telnet <ftpServerName> 21
(21 is the ftp port)

Once the connection is established, type "FEAT" (no quotes)
Example (when connecting to ftp.microsoft.com 21)
(the lines tagged with => comes from the server, the => is not there)
FEAT and QUIT is what you type.

=> 220 Microsoft FTP Service
FEAT
=> 211-Extended features supported:
=> LANG EN*
=> UTF8
=> AUTH TLS;TLS-C;SSL;TLS-P;
=> PBSZ
=> PROT C;P;
=> CCC
=> HOST
=> SIZE
=> MDTM
=> REST STREAM
=> 211 END
QUIT
=> 221 Thank you for using Microsoft products.
=>
=>
=> Connection to host lost.

In the answer above you will note UTF-8, which means it supports RFC 2640

ftp.unicode.org will answer with
=> MDTM
=> REST STREAM
=> SIZE
So no international support in ftp.

------------------------

It is often handy to be able to "talk" to a web server using telnet.
Just specify the port, and read a bit the RFC first.

--
Mihai Nita [Microsoft MVP, Visual C++]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email