Prev: FAQ 5.23 All I want to do is append a small amount of text to the end of a file. Do I still have to use locking?
Next: Net::SSH::Expect SSHAuthenticationError Login timed out.
From: sln on 9 Jan 2010 08:41 On Sat, 9 Jan 2010 00:50:51 -0800 (PST), Bart Van der Donck <bart(a)nijlen.com> wrote: >Peter J. Holzer wrote: > >> On 2010-01-08 09:58, Bart Van der Donck <b...(a)nijlen.com> wrote: >> >>> � use Encode; >>> � open(my $in, '<:raw', $mypath) || die "Couldn't open file: $!"; >>> � my $txt = do { local $/; <$in> }; >>> � close $in; >>> � my @lines = split /\n/, decode('UTF-16LE', $txt); >> >> Shorter: >> >> � � open(my $in, '<:encoding(UTF-16LE)', $mypath) || die "Couldn't open file: $!"; >> � � my @lines = <$in>; >> � � chomp @lines; > >For my particular situation, it appears that I need the raw method >anyhow. When I read directly with '<:encoding(UTF-16LE)', it says: > > "UTF-16LE:Unicode character fffe is illegal at script.pl line 32." > >(32 is the line with the 'open'-call) Try: � open(my $in, '<:encoding(UTF-16)', $mypath) || die "Couldn't open file: $!"; ^^ UTF-16 fffe BOM is UTF-16LE, and should have opened ok. However, when you read the first time without seeking past the bom offset (2), fffe is read and is illeagal UTF-16 char. When you open with UTF-16 instead, the layer expects a BOM and automatically moves the file position past it for the first read. Its called the BOM bug !!! Of course if you don't have a BOM, using UTF-16 will die with "no BOM". Another bug !!! I posted code before that auto navigates these waters, if you bothered to look. -sln
From: sln on 9 Jan 2010 08:47 On Sat, 09 Jan 2010 05:41:49 -0800, sln(a)netherlands.com wrote: >On Sat, 9 Jan 2010 00:50:51 -0800 (PST), Bart Van der Donck <bart(a)nijlen.com> wrote: > >>Peter J. Holzer wrote: >> >>> On 2010-01-08 09:58, Bart Van der Donck <b...(a)nijlen.com> wrote: >>> >>>> � use Encode; >>>> � open(my $in, '<:raw', $mypath) || die "Couldn't open file: $!"; >>>> � my $txt = do { local $/; <$in> }; >>>> � close $in; >>>> � my @lines = split /\n/, decode('UTF-16LE', $txt); >>> >>> Shorter: >>> >>> � � open(my $in, '<:encoding(UTF-16LE)', $mypath) || die "Couldn't open file: $!"; >>> � � my @lines = <$in>; >>> � � chomp @lines; >> >>For my particular situation, it appears that I need the raw method >>anyhow. When I read directly with '<:encoding(UTF-16LE)', it says: >> >> "UTF-16LE:Unicode character fffe is illegal at script.pl line 32." >> >>(32 is the line with the 'open'-call) > >Try: > � open(my $in, '<:encoding(UTF-16)', $mypath) || die "Couldn't open file: $!"; > ^^ > UTF-16 > >fffe BOM is UTF-16LE, and should have opened ok. >However, when you read the first time without seeking past the >bom offset (2), fffe is read and is illeagal UTF-16 char. > >When you open with UTF-16 instead, the layer expects a BOM and >automatically moves the file position past it for the first read. >Its called the BOM bug !!! The bug is that seek's are dead, you have to keep track of bom offset yourself (if bom) and this should be transparent if :encoding(UTF-16). > >Of course if you don't have a BOM, using UTF-16 will die with >"no BOM". Another bug !!! > >I posted code before that auto navigates these waters, if you >bothered to look. > >-sln
From: Ben Morrow on 9 Jan 2010 17:04
Quoth J�rgen Exner <jurgenex(a)hotmail.com>: > Bart Van der Donck <bart(a)nijlen.com> wrote: > >For my particular situation, it appears that I need the raw method > >anyhow. When I read directly with '<:encoding(UTF-16LE)', it says: > > > > "UTF-16LE:Unicode character fffe is illegal at script.pl line 32." > > The only place where 0xFFFE could possibly show up is the byte order > mark (BOM) and I would be very surprised if Perl couldn't handle the > BOM. IIRC the the Perl UTF-16 layers are a little too picky. If you ask for UTF-16LE, it will complain if there is a BOM. If, OTOH, you ask for UTF-16, it will correctly detect the BOM and set the byte order from it. > I would suggest to check the file with a hex editor to make sure it does > not contain an additional rouge BOM somewhere in the middle of the file. I wasn't aware BOMs came in different colours :). Ben |