From: James Harris (es) on

"Bob Melson" <amia9018(a)mypacks.net> wrote in message
news:D4OdnbLEnJAPUpPRnZ2dnUVZ_sCdnZ2d(a)earthlink.com...

....

> Another thing to consider is that many folks killfile all gmail,
> googlemail
> and googlegroups addresses because of the huge amount spam originating on
> them and google's refusal to do anything about it. Many of us don't see
> those original posts, just the rare responses.

Understood. Google's lack of policing or even their lack of adequate
response to spam reports is very bad. The touble is it's just too useful.
The Usenet service providers I use seem to filter spam - including that from
Google - but keep legitimate posts.

To anyone who didn't see the original query, I'm trying to use wget -r to
back up

http://sundry.wikispaces.com/

but despite what I try I only ever get the home page.

Any ideas why wget is not recursing to linked pages on the same site?

James


From: James Harris (es) on

"Christian" <cgregoir99(a)yahoo.com> wrote in message
news:hutero$hsp$1(a)writer.imaginet.fr...
> >"James Harris" <james.harris.1(a)googlemail.com> a �crit dans le message de
> >news:
> >daed461b-a37a-445a-8c7d-4791875fc4fe(a)t10g2000yqg.googlegroups.com...
>>On 4 June, 23:47, James Harris <james.harri...(a)googlemail.com> wrote:
>
>>> I'm trying to use wget -r to back up
>>>
>>> http://sundry.wikispaces.com/
>>>
>>> but it fails to back up more than the home page. The same command
>>> works fine elsewhere and I've tried various options for the above web
>>> site to no avail. The site seems to use a session id - if that's
>>> important - but the home page as downloaded clearly has the <a href
>>> links to further pages so I'm not sure why wget fails to follow them.
>>>
>>> Any ideas?
>
>>No response from comp.unix.admin. Trying comp.unix.shell. Maybe
>>someone there has an idea to fix the wget problem...?
>
>>James
>
> Try with a 'standard' user-agent : wget --user-agent="Mozilla/4.0
> (compatible; MSIE 7.0; Windows NT 6.0; GTB6.4; SLCC1; .NET CLR 2.0.50727;
> Media Center PC 5.0; Tablet PC 2.0; .NET CLR 3.5.21022; .NET CLR
> 3.5.30729; .NET CLR 3.0.30729)" ...

Also a good idea. I've just tried with a couple of user-agent strings but it
still doesn't work. I don't think it can be the user-agent id as wget loads
the specified page successfully and that page looks alright. It contains
embedded <a href=...> links. Unfortunately wget -r fails to follow them.

James


From: Chris Nehren on
["Followup-To:" header set to comp.unix.admin.]
On 2010-06-11, Christian scribbled these
curious markings:
>>"James Harris" <james.harris.1(a)googlemail.com> a écrit dans le message de
>>news: daed461b-a37a-445a-8c7d-4791875fc4fe(a)t10g2000yqg.googlegroups.com...
>>On 4 June, 23:47, James Harris <james.harri...(a)googlemail.com> wrote:
>
>>> I'm trying to use wget -r to back up
>>>
>>> http://sundry.wikispaces.com/
>>>
>>> but it fails to back up more than the home page. The same command
>>> works fine elsewhere and I've tried various options for the above web
>>> site to no avail. The site seems to use a session id - if that's
>>> important - but the home page as downloaded clearly has the <a href
>>> links to further pages so I'm not sure why wget fails to follow them.
>>>
>>> Any ideas?
>
>>No response from comp.unix.admin. Trying comp.unix.shell. Maybe
>>someone there has an idea to fix the wget problem...?
>
>>James
>
> Try with a 'standard' user-agent : wget --user-agent="Mozilla/4.0
> (compatible; MSIE 7.0; Windows NT 6.0; GTB6.4; SLCC1; .NET CLR 2.0.50727;
> Media Center PC 5.0; Tablet PC 2.0; .NET CLR 3.5.21022; .NET CLR 3.5.30729;
> .NET CLR 3.0.30729)" ...

In addition: have you turned on debugging yet? Have you asked wget to
print the HTTP headers of the requests and responses yet? The server is
giving wget information that it's using to determine to not go any
further. Ask it for this information and you should be able to discern
why it's behaving the way it is. Otherwise you're just guessing in an
engineering discipline.

--
Thanks and best regards,
Chris Nehren
From: Use-Author-Supplied-Address-Header on
James Harris <james.harris.1(a)googlemail.com> wrote:
: On 4 June, 23:47, James Harris <james.harri...(a)googlemail.com> wrote:
[cut]
: No response from comp.unix.admin. Trying comp.unix.shell. Maybe
: someone there has an idea to fix the wget problem...?

The best place to deal with this is the wget mailing list. See,
http://lists.gnu.org/mailman/listinfo/bug-wget. For an nntp 'mirror'
see also NG gmane.comp.web.wget.general on the gmane server at
news.gmane.org.

HTH
Tom.

Ps. The email address in the header is just a spam-trap.
--
Tom Crane, Dept. Physics, Royal Holloway, University of London, Egham Hill,
Egham, Surrey, TW20 0EX, England.
Email: T.Crane at rhul dot ac dot uk
Fax: +44 (0) 1784 472794