From: James Harris on
On 4 June, 23:47, James Harris <james.harri...(a)googlemail.com> wrote:

> I'm trying to use wget -r to back up
>
>  http://sundry.wikispaces.com/
>
> but it fails to back up more than the home page. The same command
> works fine elsewhere and I've tried various options for the above web
> site to no avail. The site seems to use a session id - if that's
> important - but the home page as downloaded clearly has the <a href
> links to further pages so I'm not sure why wget fails to follow them.
>
> Any ideas?

No response from comp.unix.admin. Trying comp.unix.shell. Maybe
someone there has an idea to fix the wget problem...?

James
From: Tony on
On 07/06/2010 22:10, James Harris wrote:
> On 4 June, 23:47, James Harris<james.harri...(a)googlemail.com> wrote:
>
>> I'm trying to use wget -r to back up
>>
>> http://sundry.wikispaces.com/

>> Any ideas?
>
> No response from comp.unix.admin. Trying comp.unix.shell. Maybe
> someone there has an idea to fix the wget problem...?

Does the site's robots.txt file preclude the links you're trying to
spider? wget plays nice by default.


--
Tony Evans
Saving trees and wasting electrons since 1993
blog -> http://perceptionistruth.com/
books -> http://www.bookthing.co.uk
[ anything below this line wasn't written by me ]
From: Bob Melson on
On Tuesday 08 June 2010 15:19, Tony (tony(a)darkstorm.invalid) opined:

> On 07/06/2010 22:10, James Harris wrote:
>> On 4 June, 23:47, James Harris<james.harri...(a)googlemail.com> wrote:
>>
>>> I'm trying to use wget -r to back up
>>>
>>> http://sundry.wikispaces.com/
>
>>> Any ideas?
>>
>> No response from comp.unix.admin. Trying comp.unix.shell. Maybe
>> someone there has an idea to fix the wget problem...?
>
> Does the site's robots.txt file preclude the links you're trying to
> spider? wget plays nice by default.
>
>
> --
> Tony Evans
> Saving trees and wasting electrons since 1993
> blog -> http://perceptionistruth.com/
> books -> http://www.bookthing.co.uk
> [ anything below this line wasn't written by me ]

Another thing to consider is that many folks killfile all gmail, googlemail
and googlegroups addresses because of the huge amount spam originating on
them and google's refusal to do anything about it. Many of us don't see
those original posts, just the rare responses.


--
Robert G. Melson | Rio Grande MicroSolutions | El Paso, Texas
-----
Nothing astonishes men so much as common sense and plain dealing.
Ralph Waldo Emerson

From: Christian on
>"James Harris" <james.harris.1(a)googlemail.com> a �crit dans le message de
>news: daed461b-a37a-445a-8c7d-4791875fc4fe(a)t10g2000yqg.googlegroups.com...
>On 4 June, 23:47, James Harris <james.harri...(a)googlemail.com> wrote:

>> I'm trying to use wget -r to back up
>>
>> http://sundry.wikispaces.com/
>>
>> but it fails to back up more than the home page. The same command
>> works fine elsewhere and I've tried various options for the above web
>> site to no avail. The site seems to use a session id - if that's
>> important - but the home page as downloaded clearly has the <a href
>> links to further pages so I'm not sure why wget fails to follow them.
>>
>> Any ideas?

>No response from comp.unix.admin. Trying comp.unix.shell. Maybe
>someone there has an idea to fix the wget problem...?

>James

Try with a 'standard' user-agent : wget --user-agent="Mozilla/4.0
(compatible; MSIE 7.0; Windows NT 6.0; GTB6.4; SLCC1; .NET CLR 2.0.50727;
Media Center PC 5.0; Tablet PC 2.0; .NET CLR 3.5.21022; .NET CLR 3.5.30729;
..NET CLR 3.0.30729)" ...

Christian


From: James Harris (es) on

"Tony" <tony(a)darkstorm.invalid> wrote in message
news:humc4g$5p7$1(a)matrix.darkstorm.co.uk...
> On 07/06/2010 22:10, James Harris wrote:
>> On 4 June, 23:47, James Harris<james.harri...(a)googlemail.com> wrote:
>>
>>> I'm trying to use wget -r to back up
>>>
>>> http://sundry.wikispaces.com/
>
>>> Any ideas?
>>
>> No response from comp.unix.admin. Trying comp.unix.shell. Maybe
>> someone there has an idea to fix the wget problem...?
>
> Does the site's robots.txt file preclude the links you're trying to
> spider? wget plays nice by default.

Good idea. I've been checking it and it doesn't seem to be the problem. It
has lines such as

User-agent: *
Disallow: /file/rename
Disallow: /file/delete

but these don't disallow the data pages that I want to back up. There is
also a sitemap.xml. To my untutored eye it looks fine too.

James