|
From: Abigail on 29 Jan 2007 16:04 Kio (ysmay13@_NO_SPAM_poczta.fm) wrote on MMMMDCCCXCIX September MCMXCIII in <URL:news:eplfgo$rah$1(a)news.onet.pl>: ... hi, ... ... I've got a perl script which is parsing a HTML file and printing a src ... text from a <img> tag on a screen. ... ... I dont know how to write a sub which can take a html $file and change ... all the src contents from the img tags to sth else. I dont want to use a ... regexp. That's silly. If you want to code with articificial restrictions (in this case "don't use regular expressions"), you're better off asking your question in a more appropriate forum. Abigail -- perl -we '$@="\145\143\150\157\040\042\112\165\163\164\040\141\156\157\164". "\150\145\162\040\120\145\162\154\040\110\141\143\153\145\162". "\042\040\076\040\057\144\145\166\057\164\164\171";`$@`'
From: Kio on 29 Jan 2007 16:45 Abigail napisa�(a): > That's silly. No it's not. > If you want to code with articificial restrictions (in this > case "don't use regular expressions"), you're better off asking your question > in a more appropriate forum. I can use regexp for those: <img src="whatever.jpg"> and than change it ... but if i would have sth like: <img alt="aa" src="whatever.jpg"> whats then ? regexp is not gonna work on it :/ what about links situated in JS ? regexp is not gonan work again :/ Damian
From: Kio on 29 Jan 2007 17:46 Michele Dondi napisa?(a): > Earlier today someone posted a question about HTML::TokeParser, and > people answered giving some explicit example. You may enjoy reading > that thread. Your case is actually easier than what was being asked > there... I wrote sth like this: use HTML::TokeParser; my $p = HTML::TokeParser->new(\$html_file); while (my $token = $p->get_tag('img')) { #how can i write sth to the src attr ? #like: new attr="/e/.jpg" } Damian
From: Michele Dondi on 31 Jan 2007 09:41 On Mon, 29 Jan 2007 23:46:17 +0100, Kio <ysmay13@_NO_SPAM_poczta.fm> wrote: >> Earlier today someone posted a question about HTML::TokeParser, and >> people answered giving some explicit example. You may enjoy reading >> that thread. Your case is actually easier than what was being asked >> there... > >I wrote sth like this: > > use HTML::TokeParser; > my $p = HTML::TokeParser->new(\$html_file); > while (my $token = $p->get_tag('img')) > { > #how can i write sth to the src attr ? > #like: new attr="/e/.jpg" > } Well, in that case you should probably get all tokens and simply print() those you're not interested in, and manipulate those that are (i) tags and (ii) correspond to img elements. Each $token contains all that is necessary, as described in the docs. Admittedly, this is perhaps a little bit convoluted. Thinking of it better, maybe H::TP is not the best tool in this regard. Actually it is for *parsing*, i.e. to extract information. Of course you can use that information to rebuild HTML code, but other modules may be more fit to the task. For example HTML::TreeBuilder *does* use a parser but it uses it to build a network of nodes which in the complex yields a representation of the HTML providing easy access to parts of it: each node can either be a simple string or an object depending on its nature. Tags, for example, are objects. Access to structured data is granted by means of methods on these objects. Since I did not really know H::TB myself I gave a quick peek into the docs and found out it's really easy to get it to do the job: #!/usr/bin/perl use strict; use warnings; use HTML::TreeBuilder; my $tree=HTML::TreeBuilder->new_from_content(<DATA>); $_->attr(src => '/e/.jpg') for $tree->find('img'); print $tree->as_HTML; __DATA__ <html> <head> <title>The Real Foo Homepage</title> </head> <body> <h1>The Real Foo</h1> <p>Foo <img alt="foo" src="whatever">, bar <img src="whatever" alt="bar">, baz <img src="whatever" alt="bar">. </p> </body> </html> Michele -- {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB=' ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_, 256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
|
Pages: 1 Prev: French Accents appear incorrectly... Next: Tar on Windows XP |