From: slugger3113 on
Hi, I'm trying to get full/absolute URLs from relative links in HTML
documents. I've been trying to fudge this using File::Basename,
WWW::Mechanize, etc. but was wondering if there's a more ready-made
way to do this.

For example, if my main doc is:

http://www.abc.com/x/y/z/mydoc.html

and it contains a relative link to:

.../../otherdir/yourdoc.html

how do I get the absolute URL to "yourdoc.html"? Using the above
modules I've been able to get:

http://www.abc.com/x/y/z/../../otherdir/yourdoc.html

when what I want is:

http://www.abc.com/x/otherdir/yourdoc.html

Of course I could try and parse all of the possible variations for
relative paths, but it's making my head hurt and I was wondering if
there's a module that could help with this. Any thoughts would be
appreciated.

thanks
Scott
From: J�rgen Exner on
slugger3113 <sstark(a)hi-beam.net> wrote:
>http://www.abc.com/x/y/z/../../otherdir/yourdoc.html
>
>when what I want is:
>
>http://www.abc.com/x/otherdir/yourdoc.html

For file names there is a module that will compute the canonical path,
but I can't remember the name right now. And I don't know if it will
work with URLs, either.

jue
From: Steve C on
slugger3113 wrote:
> Hi, I'm trying to get full/absolute URLs from relative links in HTML
> documents. I've been trying to fudge this using File::Basename,
> WWW::Mechanize, etc. but was wondering if there's a more ready-made
> way to do this.
>
> For example, if my main doc is:
>
> http://www.abc.com/x/y/z/mydoc.html
>
> and it contains a relative link to:
>
> ../../otherdir/yourdoc.html
>
> how do I get the absolute URL to "yourdoc.html"? Using the above
> modules I've been able to get:
>
> http://www.abc.com/x/y/z/../../otherdir/yourdoc.html
>
> when what I want is:
>
> http://www.abc.com/x/otherdir/yourdoc.html
>
> Of course I could try and parse all of the possible variations for
> relative paths, but it's making my head hurt and I was wondering if
> there's a module that could help with this. Any thoughts would be
> appreciated.

You also need to know if there is a base tag in the head section
since that changes the meaning of a relative link.
From: C.DeRykus on
On Apr 19, 8:51 am, slugger3113 <sst...(a)hi-beam.net> wrote:
> Hi, I'm trying to get full/absolute URLs from relative links in HTML
> documents. I've been trying to fudge this using File::Basename,
> WWW::Mechanize, etc. but was wondering if there's a more ready-made
> way to do this.
>
> For example, if my main doc is:
>
> http://www.abc.com/x/y/z/mydoc.html
>
> and it contains a relative link to:
>
> ../../otherdir/yourdoc.html
>
> how do I get the absolute URL to "yourdoc.html"? Using the above
> modules I've been able to get:
>
> http://www.abc.com/x/y/z/../../otherdir/yourdoc.html
>
> when what I want is:
>
> http://www.abc.com/x/otherdir/yourdoc.html
>
> Of course I could try and parse all of the possible variations for
> relative paths, but it's making my head hurt and I was wondering if
> there's a module that could help with this. Any thoughts would be
> appreciated.
>


See: perldoc URI

eg, print URI->new_abs('../../otherdir/yourdoc.html' ,
'http://www.abc.com/x/y/z/')

--
Charles DeRykus

From: slugger3113 on
On Apr 19, 11:07 am, Jürgen Exner <jurge...(a)hotmail.com> wrote:
> For file names there is a module that will compute the canonical path,
> but I can't remember the name right now. And I don't know if it will
> work with URLs, either.
>
> jue

Hm it looks like File::Spec will do what I want:

my($dpath) = "/one/two/../three/four";

my $cpath = File::Spec->canonpath( $dpath );

print $cpath,$/;

result: /one/three/four

thanks for the tip on "canonical" (whatever that means)!

Scott