From: Abigail on
Kio (ysmay13@_NO_SPAM_poczta.fm) wrote on MMMMDCCCXCIX September MCMXCIII
in <URL:news:eplfgo$rah$1(a)news.onet.pl>:
... hi,
...
... I've got a perl script which is parsing a HTML file and printing a src
... text from a <img> tag on a screen.
...
... I dont know how to write a sub which can take a html $file and change
... all the src contents from the img tags to sth else. I dont want to use a
... regexp.


That's silly. If you want to code with articificial restrictions (in this
case "don't use regular expressions"), you're better off asking your question
in a more appropriate forum.



Abigail
--
perl -we '$@="\145\143\150\157\040\042\112\165\163\164\040\141\156\157\164".
"\150\145\162\040\120\145\162\154\040\110\141\143\153\145\162".
"\042\040\076\040\057\144\145\166\057\164\164\171";`$@`'
From: Kio on
Abigail napisa�(a):

> That's silly.

No it's not.

> If you want to code with articificial restrictions (in this
> case "don't use regular expressions"), you're better off asking your question
> in a more appropriate forum.

I can use regexp for those:

<img src="whatever.jpg">

and than change it ...

but if i would have sth like:

<img alt="aa" src="whatever.jpg"> whats then ? regexp is not gonna work
on it :/

what about links situated in JS ? regexp is not gonan work again :/


Damian
From: Kio on
Michele Dondi napisa?(a):

> Earlier today someone posted a question about HTML::TokeParser, and
> people answered giving some explicit example. You may enjoy reading
> that thread. Your case is actually easier than what was being asked
> there...

I wrote sth like this:

use HTML::TokeParser;
my $p = HTML::TokeParser->new(\$html_file);
while (my $token = $p->get_tag('img'))
{
#how can i write sth to the src attr ?
#like: new attr="/e/.jpg"
}

Damian
From: Michele Dondi on
On Mon, 29 Jan 2007 23:46:17 +0100, Kio <ysmay13@_NO_SPAM_poczta.fm>
wrote:

>> Earlier today someone posted a question about HTML::TokeParser, and
>> people answered giving some explicit example. You may enjoy reading
>> that thread. Your case is actually easier than what was being asked
>> there...
>
>I wrote sth like this:
>
> use HTML::TokeParser;
> my $p = HTML::TokeParser->new(\$html_file);
> while (my $token = $p->get_tag('img'))
> {
> #how can i write sth to the src attr ?
> #like: new attr="/e/.jpg"
> }

Well, in that case you should probably get all tokens and simply
print() those you're not interested in, and manipulate those that are
(i) tags and (ii) correspond to img elements. Each $token contains all
that is necessary, as described in the docs. Admittedly, this is
perhaps a little bit convoluted.

Thinking of it better, maybe H::TP is not the best tool in this
regard. Actually it is for *parsing*, i.e. to extract information. Of
course you can use that information to rebuild HTML code, but other
modules may be more fit to the task. For example HTML::TreeBuilder
*does* use a parser but it uses it to build a network of nodes which
in the complex yields a representation of the HTML providing easy
access to parts of it: each node can either be a simple string or an
object depending on its nature. Tags, for example, are objects. Access
to structured data is granted by means of methods on these objects.
Since I did not really know H::TB myself I gave a quick peek into the
docs and found out it's really easy to get it to do the job:

#!/usr/bin/perl

use strict;
use warnings;
use HTML::TreeBuilder;

my $tree=HTML::TreeBuilder->new_from_content(<DATA>);
$_->attr(src => '/e/.jpg') for $tree->find('img');
print $tree->as_HTML;

__DATA__
<html>
<head>
<title>The Real Foo Homepage</title>
</head>
<body>
<h1>The Real Foo</h1>
<p>Foo <img alt="foo" src="whatever">,
bar <img src="whatever" alt="bar">,
baz <img src="whatever" alt="bar">.
</p>
</body>
</html>


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,