|
From: Ilya Zakharevich on 11 Apr 2006 17:56 [A complimentary Cc of this posting was sent to Dr.Ruud <rvtol+news(a)isolution.nl>], who wrote in article <e1gu0l.1m4.1(a)news.isolution.nl>: > > The original code contained something like > > > > perl5.8.7 -wle "$_ = qq(abcd\x{e155}efg); > > tr/\x{e100}-\x{e1ff}\x00-\x{1FFFFF}/\x00-\xFF_/; print" > > Unicode character 0x1fffff is illegal at -e line 1. > > ________ > > > > That spurious warning can be worked about, > > Is it a "spurious warning"? Looks so. What makes you doubt it? I'm working with Perl characters, not Unicode characters; and IIRC, even Unicode goes up to 0x1fffff... Or is it 0x10ffff? > perl -MO=Deparse -e 'tr/\x{d7ff}\x{d800}//' What is your point? I do not see which output makes you think this is relevant... Did you try perl -MO=Deparse -e 'tr/\x{7ff}\x{800}//' Thanks, Ilya
From: Ilya Zakharevich on 11 Apr 2006 18:04 [A complimentary Cc of this posting was sent to Dr.Ruud <rvtol+news(a)isolution.nl>], who wrote in article <e1gu0l.1m4.1(a)news.isolution.nl>: > Is it a "spurious warning"? > perl -MO=Deparse -e 'tr/\x{d7ff}\x{d800}//' Oups, ignore my preceeding message; I was using wrong quotes... So I see now where the Perl bug is: >perl -MO=Deparse -e "tr/\x{0000}-\x{ffff}//" Malformed UTF-8 character (character 0xffff) at -e line 1. Malformed UTF-8 character (character 0xffff) at -e line 1. use utf8 (); tr/\000//; -e syntax OK >perl -MO=Deparse -e "tr/\x{0000}-\x{fff0}//" use utf8 (); tr/\000-\x{fff0}//; -e syntax OK So some Perl developer thought that Perl characters == Unicode characters, and mangles the pattern without reporting errors... A lot of thanks, Ilya
From: Ilya Zakharevich on 11 Apr 2006 18:11 [A complimentary Cc of this posting was sent to thundergnat <thundergnat(a)hotmail.com>], who wrote in article <g8idnSdjQ4e2lKHZRVn-iw(a)rcn.net>: > It /does/ appear to be a bug in tr. Not in that it has a problem with > characters in the range D800?DFFF, that doesn't surprise me much. Those > /aren't/ legal utf-8 character codes. Let me disagree. First, I know of no such thing as utf-8. Second, if you mean utf8, legal codes are 0..MAX_UV (since the size of UV is specific to Perl build, this depends on the build of Perl executable). Some codes would not appear in Unicode strings; but one should be able to treat "binary" data freely (including 0..31 and 0x80..0x9F ranges, and other characters which have no Unicode-consortium-assigned cultural information). Thanks, Ilya
From: zbrg on 12 Apr 2006 02:44 Ilya Zakharevich a dit le Tue, 11 Apr 2006 16:17:49 +0000 (UTC): > Since it does not apply to the >situation I discuss, I can hardly find your finding this message in >the list of warnings relevant. > >Second, what I was discussing was not the warning, but the ACTION. Do >you think the RESULT ('abcdefg') is "correct"? The warning seems relevant, as avoiding the 0xD800-0xDFFF range seems to give a good result : $ perl -wle '$_ = q(abcdefg); tr/\x{d7ff}-\x{e0ff}/ /c; print'
From: Ben Bacarisse on 13 Apr 2006 08:57 On Tue, 11 Apr 2006 22:11:32 +0000, Ilya Zakharevich wrote: > Let me disagree. First, I know of no such thing as utf-8. Second, if > you mean utf8 The proper form is UTF-8 (i.e. with caps) so your correction (further from the accepted form) seems rather harsh! Refs: http://www.unicode.org/versions/Unicode3.0.html http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 -- Ben.
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 Prev: creating charts in excel using perl Next: long (64bit) to binary |