From: Jeff Hobbs on
On Aug 2, 3:01 pm, eugene <eugene.mind...(a)gmail.com> wrote:
> On Aug 2, 6:56 pm, George Petasis <petas...(a)yahoo.gr> wrote:
>
> > Can you please test with the tkdnd demo (basic.tcl)?
>
> >http://tkdnd.svn.sourceforge.net/viewvc/tkdnd/trunk/demos/
>
> > Try dropping the filename on the demo, and see what filename is dropped..
> > Is there a problem only with files? If you drag from the demo into word
> > for example, the dropped text should contain the euro sign (a unicode
> > character). Do you see it when dragging text?
>
> Well, I see very strange behavior. I've created two empty text files
> for test purposes with the following names:
>
> 1. "english, русский, العربية, Ελληνικά.txt" (that's english, russian,
> arabic and greek words delimited by comma)
> 2. "currency (¢£¥€).txt" (just four currency symbols)
>
> I've tried to drop them on the demo application I got from your link
> and for the first file it shows only english and russian words
> correctly, all arabic and greek symbols are replaced with question
> marks.
> For the second file it shows only euro symbol correctly, replacing
> cent, pound and yen with questions also.
> These results are for Windows XP SP3, encoding system 1251, ActiveTcl
> 8.5.8, tkdnd 2.1
>
> When I drag the text from demo app into word, everything seems to be
> Ok, I can see euro sign.
>
> > What is also interesting, is whether this happens only under XP, or it
> > affects also vista/7. Have you tried on more recent versions?
>
> Yes, I managed to get hold of a box with Windows 7 Professional and
> installed ActiveTcl 8.5.8 with tkdnd 2.1 to see if things there are
> different compared to XP. No such luck. The result is exactly the same
> as described above.
>
> However, the behavior changes when I select another system locale. For
> example, when I changed it to Greek (encoding system cp1253), the demo
> app started showing correct greek symbols in the first file name,
> losing all russian and arabic ones (replaced with questions). For the
> second file name it shows all currency symbols Ok, except for cent.
>
> Meanwhile, glob *.txt in wish console shows correct symbols for both
> files regardless of system locale settings...

This might indicate that tkdnd is using correct locale-specific
handling of data into Tcl, where full unicode-aware APIs might be
preferred. Looking briefly at the code, it does have correct unicode
handling - for the right drop types. CF_UNICODETEXT is handled
correctly, but if you received it as text/file (Xdnd) or CF_HDROP, it
would do locale conversion.

Jeff
From: Georgios Petasis on
Στις 3/8/2010 01:52, ο/η Jeff Hobbs έγραψε:
> On Aug 2, 3:01 pm, eugene<eugene.mind...(a)gmail.com> wrote:
>> On Aug 2, 6:56 pm, George Petasis<petas...(a)yahoo.gr> wrote:
>>
>>> Can you please test with the tkdnd demo (basic.tcl)?
>>
>>> http://tkdnd.svn.sourceforge.net/viewvc/tkdnd/trunk/demos/
>>
>>> Try dropping the filename on the demo, and see what filename is dropped.
>>> Is there a problem only with files? If you drag from the demo into word
>>> for example, the dropped text should contain the euro sign (a unicode
>>> character). Do you see it when dragging text?
>>
>> Well, I see very strange behavior. I've created two empty text files
>> for test purposes with the following names:
>>
>> 1. "english, русский, العربية, Ελληνικά.txt" (that's english, russian,
>> arabic and greek words delimited by comma)
>> 2. "currency (¢£¥€).txt" (just four currency symbols)
>>
>> I've tried to drop them on the demo application I got from your link
>> and for the first file it shows only english and russian words
>> correctly, all arabic and greek symbols are replaced with question
>> marks.
>> For the second file it shows only euro symbol correctly, replacing
>> cent, pound and yen with questions also.
>> These results are for Windows XP SP3, encoding system 1251, ActiveTcl
>> 8.5.8, tkdnd 2.1
>>
>> When I drag the text from demo app into word, everything seems to be
>> Ok, I can see euro sign.
>>
>>> What is also interesting, is whether this happens only under XP, or it
>>> affects also vista/7. Have you tried on more recent versions?
>>
>> Yes, I managed to get hold of a box with Windows 7 Professional and
>> installed ActiveTcl 8.5.8 with tkdnd 2.1 to see if things there are
>> different compared to XP. No such luck. The result is exactly the same
>> as described above.
>>
>> However, the behavior changes when I select another system locale. For
>> example, when I changed it to Greek (encoding system cp1253), the demo
>> app started showing correct greek symbols in the first file name,
>> losing all russian and arabic ones (replaced with questions). For the
>> second file name it shows all currency symbols Ok, except for cent.
>>
>> Meanwhile, glob *.txt in wish console shows correct symbols for both
>> files regardless of system locale settings...
>
> This might indicate that tkdnd is using correct locale-specific
> handling of data into Tcl, where full unicode-aware APIs might be
> preferred. Looking briefly at the code, it does have correct unicode
> handling - for the right drop types. CF_UNICODETEXT is handled
> correctly, but if you received it as text/file (Xdnd) or CF_HDROP, it
> would do locale conversion.
>
> Jeff

I think that I don't do any conversion when receiving data with
CF_HDROP. The code is in GetData_CF_HDROP, file OleDND.h:

for( UINT count = 0; count < cFiles; count++ ) {
::DragQueryFile(hdrop, count, szFile, sizeof(szFile));
/* Convert to forward slashes for easier access in scripts... */
for (p=szFile; *p!='\0'; p=(char *) CharNext(p)) {
if (*p == '\\') *p = '/';
}
Tcl_ListObjAppendElement(NULL, result, Tcl_NewStringObj(szFile,-1));
}

I assume that DragQueryFile returns utf-8.
I will look into this, but I am not sure if I can correct this. I
suspect that behaviour depends on how the code gets compiled.

George
From: Georgios Petasis on
Στις 3/8/2010 10:32, ο/η Georgios Petasis έγραψε:
> Στις 3/8/2010 01:52, ο/η Jeff Hobbs έγραψε:
>> On Aug 2, 3:01 pm, eugene<eugene.mind...(a)gmail.com> wrote:
>>> On Aug 2, 6:56 pm, George Petasis<petas...(a)yahoo.gr> wrote:
>>>
>>>> Can you please test with the tkdnd demo (basic.tcl)?
>>>
>>>> http://tkdnd.svn.sourceforge.net/viewvc/tkdnd/trunk/demos/
>>>
>>>> Try dropping the filename on the demo, and see what filename is
>>>> dropped.
>>>> Is there a problem only with files? If you drag from the demo into word
>>>> for example, the dropped text should contain the euro sign (a unicode
>>>> character). Do you see it when dragging text?
>>>
>>> Well, I see very strange behavior. I've created two empty text files
>>> for test purposes with the following names:
>>>
>>> 1. "english, русский, العربية, Ελληνικά.txt" (that's english, russian,
>>> arabic and greek words delimited by comma)
>>> 2. "currency (¢£¥€).txt" (just four currency symbols)
>>>
>>> I've tried to drop them on the demo application I got from your link
>>> and for the first file it shows only english and russian words
>>> correctly, all arabic and greek symbols are replaced with question
>>> marks.
>>> For the second file it shows only euro symbol correctly, replacing
>>> cent, pound and yen with questions also.
>>> These results are for Windows XP SP3, encoding system 1251, ActiveTcl
>>> 8.5.8, tkdnd 2.1
>>>
>>> When I drag the text from demo app into word, everything seems to be
>>> Ok, I can see euro sign.
>>>
>>>> What is also interesting, is whether this happens only under XP, or it
>>>> affects also vista/7. Have you tried on more recent versions?
>>>
>>> Yes, I managed to get hold of a box with Windows 7 Professional and
>>> installed ActiveTcl 8.5.8 with tkdnd 2.1 to see if things there are
>>> different compared to XP. No such luck. The result is exactly the same
>>> as described above.
>>>
>>> However, the behavior changes when I select another system locale. For
>>> example, when I changed it to Greek (encoding system cp1253), the demo
>>> app started showing correct greek symbols in the first file name,
>>> losing all russian and arabic ones (replaced with questions). For the
>>> second file name it shows all currency symbols Ok, except for cent.
>>>
>>> Meanwhile, glob *.txt in wish console shows correct symbols for both
>>> files regardless of system locale settings...
>>
>> This might indicate that tkdnd is using correct locale-specific
>> handling of data into Tcl, where full unicode-aware APIs might be
>> preferred. Looking briefly at the code, it does have correct unicode
>> handling - for the right drop types. CF_UNICODETEXT is handled
>> correctly, but if you received it as text/file (Xdnd) or CF_HDROP, it
>> would do locale conversion.
>>
>> Jeff
>
> I think that I don't do any conversion when receiving data with
> CF_HDROP. The code is in GetData_CF_HDROP, file OleDND.h:
>
> for( UINT count = 0; count < cFiles; count++ ) {
> ::DragQueryFile(hdrop, count, szFile, sizeof(szFile));
> /* Convert to forward slashes for easier access in scripts... */
> for (p=szFile; *p!='\0'; p=(char *) CharNext(p)) {
> if (*p == '\\') *p = '/';
> }
> Tcl_ListObjAppendElement(NULL, result, Tcl_NewStringObj(szFile,-1));
> }
>
> I assume that DragQueryFile returns utf-8.
> I will look into this, but I am not sure if I can correct this. I
> suspect that behaviour depends on how the code gets compiled.
>
> George

Hm, I have found a page that states that the DROPFILES structure
will never contain data in utf-8 format:

http://www.eggheadcafe.com/software/aspnet/33812038/-copy-paste-with-cfhdrop-format-getdata-invoked-multiple-times.aspx

It states "Note that file names in DROPFILES structure are never in
UTF-8. They are either in UTF-16, or in system default ANSI code page."

So, my assumption that defining _MBCS will have them in utf-8 is not
valid. Windows use the default ANSI page. I will define _UNICODE and
handle it as a unicode string.

George
From: eugene on
On Aug 3, 12:03 pm, Georgios Petasis <peta...(a)iit.demokritos.gr>
wrote:

> Hm, I have found a page that states that the DROPFILES structure
> will never contain data in utf-8 format:
>
> http://www.eggheadcafe.com/software/aspnet/33812038/-copy-paste-with-...
>
> It states "Note that file names in DROPFILES structure are never in
> UTF-8. They are either in UTF-16, or in system default ANSI code page."
>
> So, my assumption that defining _MBCS will have them in utf-8 is not
> valid. Windows use the default ANSI page. I will define _UNICODE and
> handle it as a unicode string.
>
> George

So then can we expect a patch any time soon? :)
From: Georgios Petasis on
Στις 3/8/2010 13:19, ο/η eugene έγραψε:
> On Aug 3, 12:03 pm, Georgios Petasis<peta...(a)iit.demokritos.gr>
> wrote:
>
>> Hm, I have found a page that states that the DROPFILES structure
>> will never contain data in utf-8 format:
>>
>> http://www.eggheadcafe.com/software/aspnet/33812038/-copy-paste-with-...
>>
>> It states "Note that file names in DROPFILES structure are never in
>> UTF-8. They are either in UTF-16, or in system default ANSI code page."
>>
>> So, my assumption that defining _MBCS will have them in utf-8 is not
>> valid. Windows use the default ANSI page. I will define _UNICODE and
>> handle it as a unicode string.
>>
>> George
>
> So then can we expect a patch any time soon? :)

I am not sure I will manage to fix it. I compiled tkdnd with UNICODE &
_UNICODE defined instead of _MBCS, and treated the data as both unicode
(using Tcl_UniCharToUtfDString) and UTF-16 (using WideCharToMultiByte).
The result was the same in both cases, a wrong one. Dropping the
filename "english, русский, العربية, Ελληνικά.txt" results in "english,
@CAA:89, 'D91(J), •»»·½ΉΊ¬.txt".

I am out of ideas...

George