From: Ashley Sheridan on
On Sun, 2010-04-25 at 22:17 +0900, ioannes(a)btinternet.com wrote:

> I can return a target page - once, but then on refresh within a few
> hours the script curl_error is that it cannot connect to the host and
> return is empty. The target URL is an ip address, not a named url, so
> maybe it has something to do with DNS. I am on a shared server. Any
> ideas on why this happens?
>
> John
>


No, DNS is a Domain Named Server used to turn a domain name into an IP
address. As you say you're using an IP address directly, it won't go
near DNS.

Are there any messages in the logs that would give more specific
information?

Thanks,
Ash
http://www.ashleysheridan.co.uk


From: Ashley Sheridan on
On Mon, 2010-04-26 at 12:05 +0900, ioannes(a)btinternet.com wrote:

> >>
> >> Just to eliminate all possibilities, are you to open the same URL/URI in
> > the
> >> web pages repeatedly? Also, what happens when you fake the user agent in
> >> the web browser? The target site may have some anti bot mechanism in
> >> place to reduce stress/load on the server(s).
> >>
> >> Regards,
> >> Tommy
> >
> > One more thing, check it with cookies enabled/disabled in the web browser
> > too.
> >
> >
>
> Having deleted cookies on the browser and disabled them, it still does
> not like various user agents:
>
> $useragent = array('Mozilla','Opera','Microsoft Internet
> Explorer','ia_archiver');
> $os = array('Windows','Windows XP','Linux','Windows NT','Windows
> 2000','OSX');
> //random user agent code
> $agent = $useragent[rand(0,3)].'/'.rand(1,8).'.'.rand(0,9).'
> ('.$os[rand(0,5)].' '.rand(1,7).'.'.rand(0,9).'; en-US;)';
> //would give something like Mozilla/3.5 (Windows 5.4; en-US;)
>
> -- OR --
>
> //$useragent='Google Image - Googlebot-Image/1.0 (
> http://www.googlebot.com/bot.html)';
> //$useragent="MSN Live - msnbot-Products/1.0
> (+http://search.msn.com/msnbot.htm)";
>
> -- OR --
> //$agent = "DocZilla/1.0 (Windows; U; WinNT4.0; en-US; rv:1.0.0)
> Gecko/20020804";
>
> I am just calling the page manually, once at a time. It is probable
> that there is some anti-bot measures. Page would probably not want to
> be indexed as it is providing ever changing content. How to use this
> for normal level of use for real user just in a different site?
>
> John
>


How frequently do you request the page? Maybe playing about with that
would resolve it? Is it possible to randomise the request frequency a
bit?

Thanks,
Ash
http://www.ashleysheridan.co.uk