|
Prev: form field
Next: Permutations Problem
From: miletwo on 25 May 2006 15:00 I'm trying to read xml file and rewrite as RSS using following file. Problem is, it is not forcing UTF-8 no matter what I do. Any help appreciated. *********************** #!/bin/perl -w #use strict; use XML::Twig; use utf8; use open OUT => ":utf8"; use open IN => ":utf8"; my $shownum = 10; my $thisyear = '2006'; my $field= 'releasedate'; my $twig= new XML::Twig( keep_encoding=> 1); open(INFILE, "directorylist.xml"); $twig->parse(\*INFILE); my $root= $twig->root; my @releases= $root->children; my $output = ""; $output .= '<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">' . "\n"; $output .= '<channel>' . "\n\n"; $output .= <<EOT; <title>scrubbed Incorporated - Recent News</title> <link>http://www.scrubbed.com/press/</link> <description>Visit the scrubbed Press Center where you will find many resources, including press releases, corporate information, technology overviews, executive bios and photos, the scrubbed logo and more.<br />If you are a member of the media and are not able to find what you are looking for in the Press Center, please send an email to corpcomm\@scrubbed.com.</description> <language>en-us</language> EOT for(my $i=0; $i < $shownum; $i++){ $output .= "\t" . '<item>' . "\n"; $output .= "\t\t" . '<title>' . $releases[$i]->first_child('headline')->text . '</title>' . "\n"; $output .= "\t\t" . '<link>http://www.scrubbed.com/press/releases/' . $thisyear . '/' . $releases[$i]->att('name') . '.html</link>' . "\n"; $output .= "\t\t" . '<description>' . $releases[$i]->first_child('subheader')->text . '</description>' . "\n"; $output .= "\t\t" . '<dc:date>' . $releases[$i]->first_child('releasedate')->text . '</dc:date>' . "\n"; $output .= "\t" . '</item>'; $output .= "\n\n"; } $output .= "</channel>\n</rss>"; Encode::_utf8_on($output); open(FILEWRITE,">:utf8", "press.rss"); binmode FILEWRITE, ":utf8"; print FILEWRITE $output;
From: Peter J. Holzer on 25 May 2006 16:37 miletwo(a)gmail.com wrote: > I'm trying to read xml file and rewrite as RSS using following file. > Problem is, it is not forcing UTF-8 no matter what I do. Any help > appreciated. Your script works for me. Please provide a complete example that demonstrates the error. Your script tries to read a file named directorylist.xml, but you didn't provide that file. I had to read your script to find out what that file should contain, and write one myself. Maybe there is an error in your input file. Also you didn't provide any information about the system you are using. I tested it with Debian Sarge (perl 5.8.4, XML::Twig 3.17). hp -- _ | Peter J. Holzer | Man könnte sich [die Diskussion] auch |_|_) | Sysadmin WSR/LUGA | sparen, wenn man sie sich einfach sparen | | | hjp(a)hjp.at | würde. __/ | http://www.hjp.at/ | -- Ralph Angenendt in dang 2006-04-15
From: Michel Rodriguez on 26 May 2006 05:09 miletwo(a)gmail.com wrote: > I'm trying to read xml file and rewrite as RSS using following file. > Problem is, it is not forcing UTF-8 no matter what I do. Any help > appreciated. > > *********************** > #!/bin/perl -w > #use strict; > use XML::Twig; > use utf8; > > use open OUT => ":utf8"; > use open IN => ":utf8"; > > my $shownum = 10; > my $thisyear = '2006'; > my $field= 'releasedate'; > my $twig= new XML::Twig( keep_encoding=> 1); > > open(INFILE, "directorylist.xml"); > $twig->parse(\*INFILE); > > my $root= $twig->root; > my @releases= $root->children; > > my $output = ""; > > $output .= '<rss version="2.0" > xmlns:dc="http://purl.org/dc/elements/1.1/">' . "\n"; > $output .= '<channel>' . "\n\n"; > $output .= <<EOT; > <title>scrubbed Incorporated - Recent News</title> > <link>http://www.scrubbed.com/press/</link> > <description>Visit the scrubbed Press Center where you will find > many resources, including press releases, corporate information, > technology overviews, executive bios and photos, the scrubbed logo and > more.<br />If you are a member of the media and are not able to find > what you are looking for in the Press Center, please send an email to > corpcomm\@scrubbed.com.</description> > <language>en-us</language> > > EOT > > for(my $i=0; $i < $shownum; $i++){ > $output .= "\t" . '<item>' . "\n"; > $output .= "\t\t" . '<title>' . > $releases[$i]->first_child('headline')->text . '</title>' . "\n"; > $output .= "\t\t" . '<link>http://www.scrubbed.com/press/releases/' . > $thisyear . '/' . $releases[$i]->att('name') . '.html</link>' . "\n"; > $output .= "\t\t" . '<description>' . > $releases[$i]->first_child('subheader')->text . '</description>' . > "\n"; > $output .= "\t\t" . '<dc:date>' . > $releases[$i]->first_child('releasedate')->text . '</dc:date>' . "\n"; > $output .= "\t" . '</item>'; > $output .= "\n\n"; > } > > $output .= "</channel>\n</rss>"; > Encode::_utf8_on($output); > > open(FILEWRITE,">:utf8", "press.rss"); > binmode FILEWRITE, ":utf8"; > print FILEWRITE $output; Whaouh! You sure want to make sure you get UTF-8 on output! Except of course that the keep_encoding option tells XML::Twig not output the same encoding as you got in the input (which you did not show us as mentionned by the previous poster). If you want to output utf-8, the best way is NOT to do anything: by default the parser will convert anything into utf-8, and the output will be in that encoding. Did you try your code without the various utf8-related instructions peppered though it? What was the result? -- mirod
From: miletwo on 26 May 2006 16:10 Here's directorylist.xml. I'm on MacOSX but also tried running this on my Solaris box and it does the same thing. I've also tried it with and without keep_encoding, so don't "think" that's it. Thanks for replies. <?xml version="1.0" encoding="UTF-8"?> <directory> <file name="060525_brings_custom_user"> <releasedate>05-25-2006</releasedate> <releasetime>04:30 AM</releasetime> <timezone>America/Los_Angeles</timezone> <headline><![CDATA[XXSCRUBBEDXX Brings Custom User-Interface Capabilities to U.S. Cellular's easyedgeSM with the uiOne Solution]]></headline> <subheader><![CDATA[]]></subheader> <division>Corp, QIS</division> <categories></categories> <document></document> <exclude></exclude> </file> <file name="060524_initiates_patent_infringement"> <releasedate>05-24-2006</releasedate> <releasetime>04:30 AM</releasetime> <timezone>America/Los_Angeles</timezone> <headline><![CDATA[XXSCRUBBEDXX Initiates Patent Infringement Proceedings in the UK against Nokia]]></headline> <subheader><![CDATA[]]></subheader> <division>Corp</division> <categories></categories> <document></document> <exclude></exclude> </file> <file name="060518_takes_XXSCRUBBEDXX_2006"> <releasedate>05-18-2006</releasedate> <releasetime>04:30 AM</releasetime> <timezone>America/Los_Angeles</timezone> <headline><![CDATA[XXSCRUBBEDXX Takes XXSCRUBBEDXX 2006 to the Next Level with Addition of Telecom Italia and XXSCRUBBEDXX to an Already Impressive XXSCRUBBEDXX 2006 Conference Agenda]]></headline> <subheader><![CDATA[Premiere Players in the Industry Showcase Advanced Data Capabilities at XXSCRUBBEDXX 2006 Conference in San Diego May 31-June 2]]></subheader> <division>Corp, QIS</division> <categories></categories> <document></document> <exclude></exclude> </file> <file name="060518_averitt_selects_omnitracs"> <releasedate>05-18-2006</releasedate> <releasetime>04:30 AM</releasetime> <timezone>America/Los_Angeles</timezone> <headline><![CDATA[AVERITT Selects XXSCRUBBEDXX's OmniTRACS® and OmniExpress® Mobile Communication Systems for Entire Fleet and Service Centers]]></headline> <subheader><![CDATA[Leading Freight and Supply Chain Management Provider with International Reach One of First to Implement End-to-End Solution for Improved Fleet Communications]]></subheader> <division>Corp, QWBS</division> <categories></categories> <document></document> <exclude></exclude> </file> <file name="060517_clears_up_misunderstandings"> <releasedate>05-17-2006</releasedate> <releasetime>12:36 PM</releasetime> <timezone>America/Los_Angeles</timezone> <headline><![CDATA[XXSCRUBBEDXX Clears Up Misunderstandings Regarding the ITC Staff Attorney Briefing]]></headline> <subheader><![CDATA[]]></subheader> <division>Corp</division> <categories></categories> <document></document> <exclude></exclude> </file> <file name="060512_hospital_democratic_republic"> <releasedate>05-12-2006</releasedate> <releasetime>04:30 AM</releasetime> <timezone>America/Los_Angeles</timezone> <headline><![CDATA[Hospital in the Democratic Republic of Congo to Be Outfitted with CDMA2000 1xEV-DO to Help Improve Healthcare in Africa]]></headline> <subheader><![CDATA[XXSCRUBBEDXX Pledges Donation and Technology to the Dikembe Mutombo Foundation, First Hospital Built in the Congo in Nearly 40 Years]]></subheader> <division>Corp</division> <categories></categories> <document></document> <exclude></exclude> </file> <file name="060509_british_sky_broadcasting"> <releasedate>05-09-2006</releasedate> <releasetime>04:30 AM</releasetime> <timezone>America/Los_Angeles</timezone> <headline><![CDATA[XXSCRUBBEDXX and British Sky Broadcasting Announce Intent to Conduct XXSCRUBBEDXX™ Technology Trial in United Kingdom]]></headline> <subheader><![CDATA[Joint Exercise Expected to be Europe's First Technical Trial of Open, Network-Agnostic FLO Technology]]></subheader> <division>Corp</division> <categories></categories> <document></document> <exclude></exclude> </file> <file name="060509_application_downloads_XXSCRUBBEDXX"> <releasedate>05-09-2006</releasedate> <releasetime>04:30 AM</releasetime> <timezone>America/Los_Angeles</timezone> <headline><![CDATA[Application Downloads with XXSCRUBBEDXX's XXSCRUBBEDXX® Solution Surpass Three Million in Thailand on Hutch's Advanced CDMA2000 1X Network]]></headline> <subheader><![CDATA[Active Hutchison CAT Customers Have Downloaded an Average of 10 Applications Each Since XXSCRUBBEDXX Launched, Numbers Continue to Grow]]></subheader> <division>Corp, QIS</division> <categories></categories> <document></document> <exclude></exclude> </file> </directory>
From: Peter J. Holzer on 26 May 2006 16:36 miletwo(a)gmail.com wrote: > Here's directorylist.xml. I'm on MacOSX but also tried running this on > my Solaris box and it does the same thing. I've also tried it with and > without keep_encoding, so don't "think" that's it. This file contains only 8 <file/> elements. Your script crashes with Can't call method "first_child" on an undefined value at ./miletwo line 40. if there are less than 10 children of the root element, before it even opens the output file. So with this file, your script doesn't write anything. How do you determine whether a non-existent file is UTF-8 or not? hp -- _ | Peter J. Holzer | Man könnte sich [die Diskussion] auch |_|_) | Sysadmin WSR/LUGA | sparen, wenn man sie sich einfach sparen | | | hjp(a)hjp.at | würde. __/ | http://www.hjp.at/ | -- Ralph Angenendt in dang 2006-04-15
|
Pages: 1 Prev: form field Next: Permutations Problem |