From: Arne Vajhøj on
On 20-05-2010 08:23, Bob wrote:
> On Tue, 18 May 2010 21:50:52 -0400, Arne Vajh�j<arne(a)vajhoej.dk>
> wrote:
>> On 18-05-2010 20:40, Bob wrote:
>>> On Tue, 18 May 2010 19:55:42 -0400, Arne Vajh�j<arne(a)vajhoej.dk>
>>> wrote:
>>>> On 18-05-2010 07:29, Bob wrote:
>>>>> I need to scan a large number of web-resident files, primarily to get
>>>>> file size. IOW, a simple operation. Can anyone provide the benefit of
>>>>> their intuition on how to set the timeout, and how many retries to
>>>>> attempt?
>>>>>
>>>>> Currently I have the WebRequest timeout set for 2 seconds, and if the
>>>>> request times out, I loop back and try again. So just 2 tries. Not
>>>>> sure if that's optimal.
>
>>> I've been using a 2 second timeout, then retrying once if it fails. Is
>>> that what you meant by 'small'?
>>
>> 2 seconds is a pretty huge timeout for HTTP.
>
> Hi again, Arne. I've run some tests (time consuming) on the file info
> retrieval function. Reliabilty actually does stay pretty consistent
> when the timeout is dropped from 2 seconds to 1 second as long as I do
> at least one retry on failure. At 1/2 sec, I get a few errors, but at
> 1/4 sec, the error rate goes up.

Must be a slow connection.

> Doing at least one retry seems important. Otherwise, even with a 4
> second timeout, I get a consiiderable number of errors.
>
> When I say "errors" above, I mean that the WebRequest times out. IOW,
> setting the WebRequest timeout function to 4 seconds does not work as
> well as 1 sec with a single retry.
>
> Interesting how that works, but it took a long while to do those
> tests.
>
>> I think doing many in parallel would be speed up things a lot.
>>
>> And you can still use the progress bar.
>
> Now that you mention it, is there an easy way to determine the number
> of 'channels' that would be optimal? There's got to be a logical limit
> on connections.

I would go for 25 threads per core or something like that for
this purpose.

>>>>> Another thing: I've often got WebResponse file sizes that are one byte
>>>>> different from the actual size of the file. Any idea what's up there?
>>>>
>>>> Difficult to say without an example URL.
>>>
>>> I'll try to look for a few examples. I thought maybe that was a very
>>> common thing, given that it seems to be just one byte much of the
>>> time...just seemed 'coincidental'.
>>
>> OK.
>
> Of course I haven't been able to get that to happen again since my
> last post.

:-)

Arne