From: Steve on
I started a little project where I need to search web pages for their
text and return the links of those pages to me. I am using
LWP::Simple, HTML::LinkExtor, and Data::Dumper. Basically all I have
done so far is a list of URL's from my search query of a website, but
I want to be able to filter this content based on the pages contents.
How can I do this? How can I get the content of a web page, and not
just the URL?
From: Kyle T. Jones on
Steve wrote:
> I started a little project where I need to search web pages for their
> text and return the links of those pages to me. I am using
> LWP::Simple, HTML::LinkExtor, and Data::Dumper. Basically all I have
> done so far is a list of URL's from my search query of a website, but
> I want to be able to filter this content based on the pages contents.
> How can I do this? How can I get the content of a web page, and not
> just the URL?

my $pagecontents=get("url");

Then you'll have to parse it yourself to pull out whatever stuff you're
interested in...

Cheers.

From: J�rgen Exner on
Steve <steve(a)staticg.com> wrote:
>I started a little project where I need to search web pages for their
>text and return the links of those pages to me. I am using
>LWP::Simple, HTML::LinkExtor, and Data::Dumper. Basically all I have
>done so far is a list of URL's from my search query of a website, but
>I want to be able to filter this content based on the pages contents.
>How can I do this? How can I get the content of a web page, and not
>just the URL?

???

I don't understand.

use LWP::Simple;
$content = get("http://www.whateverURL");

will get you exactly the content of that web page and assign it to
$content and apparently you are doing that already.

So what is your problem?

jue
From: Steve on
On Mar 19, 11:01 am, Jürgen Exner <jurge...(a)hotmail.com> wrote:
> Steve <st...(a)staticg.com> wrote:
> >I started a little project where I need to search web pages for their
> >text and return the links of those pages to me.  I am using
> >LWP::Simple, HTML::LinkExtor, and Data::Dumper.  Basically all I have
> >done so far is a list of URL's from my search query of a website, but
> >I want to be able to filter this content based on the pages contents.
> >How can I do this? How can I get the content of a web page, and not
> >just the URL?
>
> ???
>
> I don't understand.
>
>         use LWP::Simple;
>         $content = get("http://www.whateverURL");
>
> will get you exactly the content of that web page and assign it to
> $content and apparently you are doing that already.
>
> So what is your problem?
>
> jue

Sorry I am a little overwhelmed with the coding so far (I'm not very
good at perl). I have what you have posted, but my problem is that I
would like to filter that content... like lets say I searched a site
that had 15 news links and 3 of them said "Hello" in the title. I
would want to extract only the links that said hello in the title.
From: J. Gleixner on
Steve wrote:
> On Mar 19, 11:01 am, J�rgen Exner <jurge...(a)hotmail.com> wrote:
>> Steve <st...(a)staticg.com> wrote:
>>> I started a little project where I need to search web pages for their
>>> text and return the links of those pages to me. I am using
>>> LWP::Simple, HTML::LinkExtor, and Data::Dumper. Basically all I have
>>> done so far is a list of URL's from my search query of a website, but
>>> I want to be able to filter this content based on the pages contents.
>>> How can I do this? How can I get the content of a web page, and not
>>> just the URL?
>> ???
>>
>> I don't understand.
>>
>> use LWP::Simple;
>> $content = get("http://www.whateverURL");
>>
>> will get you exactly the content of that web page and assign it to
>> $content and apparently you are doing that already.
>>
>> So what is your problem?
>>
>> jue
>
> Sorry I am a little overwhelmed with the coding so far (I'm not very
> good at perl). I have what you have posted, but my problem is that I
> would like to filter that content... like lets say I searched a site
> that had 15 news links and 3 of them said "Hello" in the title. I
> would want to extract only the links that said hello in the title.


'"Hello" in the title'??.. The title element of the HTML????
Or the 'a' element contains 'Hello'?? e.g. <a href="...">Hello Kitty</a>

How are you using HTML::LinkExtor??

That seems like the right choice.

Why are you using Data::Dumper?

That's helpful when debugging, or logging, so how are you using it?

Post your very short example, because there's something you're
missing and no one can tell what that is based on your description.