Get file size via WebRequest [CSharp]

Prev: DateTimePicker return wrong format
Next: Binary Deserialization w slight mismatch on target object

From: Bob on 18 May 2010 07:29

I need to scan a large number of web-resident files, primarily to get
file size. IOW, a simple operation. Can anyone provide the benefit of
their intuition on how to set the timeout, and how many retries to
attempt?

Currently I have the WebRequest timeout set for 2 seconds, and if the
request times out, I loop back and try again. So just 2 tries. Not
sure if that's optimal.

I realize that this is arbitrary, but the files reside on various
places on the net, so it's impossible to profile in advance. I'm sure
that someone else has done something similar though, and may have a
good feel for median values.

Another thing: I've often got WebResponse file sizes that are one byte
different from the actual size of the file. Any idea what's up there?

From: Arne Vajhøj on 18 May 2010 19:55

On 18-05-2010 07:29, Bob wrote:
> I need to scan a large number of web-resident files, primarily to get
> file size. IOW, a simple operation. Can anyone provide the benefit of
> their intuition on how to set the timeout, and how many retries to
> attempt?
>
> Currently I have the WebRequest timeout set for 2 seconds, and if the
> request times out, I loop back and try again. So just 2 tries. Not
> sure if that's optimal.
>
> I realize that this is arbitrary, but the files reside on various
> places on the net, so it's impossible to profile in advance. I'm sure
> that someone else has done something similar though, and may have a
> good feel for median values.

I assume that you send HEAD and not GET !?

A reasonable small timeout should be sufficient.

You should do it thread based - possible queuing work to the
ThreadPool to maximize throughput.

> Another thing: I've often got WebResponse file sizes that are one byte
> different from the actual size of the file. Any idea what's up there?

Difficult to say without an example URL.

Arne

From: Bob on 18 May 2010 20:40

On Tue, 18 May 2010 19:55:42 -0400, Arne Vajh�j <arne(a)vajhoej.dk>
wrote:

>On 18-05-2010 07:29, Bob wrote:
>> I need to scan a large number of web-resident files, primarily to get
>> file size. IOW, a simple operation. Can anyone provide the benefit of
>> their intuition on how to set the timeout, and how many retries to
>> attempt?
>>
>> Currently I have the WebRequest timeout set for 2 seconds, and if the
>> request times out, I loop back and try again. So just 2 tries. Not
>> sure if that's optimal.
>>
>> I realize that this is arbitrary, but the files reside on various
>> places on the net, so it's impossible to profile in advance. I'm sure
>> that someone else has done something similar though, and may have a
>> good feel for median values.
>
>I assume that you send HEAD and not GET !?

Yes.

>A reasonable small timeout should be sufficient.

I've been using a 2 second timeout, then retrying once if it fails. Is
that what you meant by 'small'?

The way I arrived at that: I noticed that a large timeout didn't
succeed much more than a shorter one; if it was going to fail, it
would just fail. But a -very- small timeout was under the response
time of many servers. There is a sort of median range that is
probably optimal. I just don't have a good feel for what the median
value might be.

Retries: I also found that some failures that would stall forever on
the first try, would succeed on the second try. Something about just
reinitiating the request. Not sure if it's worth doing a third or not.

So the 2 second timeout was arbitrary, and I haven't had time to test
over a huge number of servers. That's the main thing that I'd like
find from you web guys. (Hey, I'm a desktop programmer... I don't do
this stuff often).

>You should do it thread based - possible queuing work to the
>ThreadPool to maximize throughput.

I usually use a BackgroundWorker with a dialog with progress bar.
Watching the little bar move seems to provide some comfort during
those delays. (-:>

>> Another thing: I've often got WebResponse file sizes that are one byte
>> different from the actual size of the file. Any idea what's up there?
>
>Difficult to say without an example URL.
>
>Arne

I'll try to look for a few examples. I thought maybe that was a very
common thing, given that it seems to be just one byte much of the
time...just seemed 'coincidental'.

Thanks, Arne.

From: Arne Vajhøj on 18 May 2010 21:50

On 18-05-2010 20:40, Bob wrote:
> On Tue, 18 May 2010 19:55:42 -0400, Arne Vajh�j<arne(a)vajhoej.dk>
> wrote:
>
>> On 18-05-2010 07:29, Bob wrote:
>>> I need to scan a large number of web-resident files, primarily to get
>>> file size. IOW, a simple operation. Can anyone provide the benefit of
>>> their intuition on how to set the timeout, and how many retries to
>>> attempt?
>>>
>>> Currently I have the WebRequest timeout set for 2 seconds, and if the
>>> request times out, I loop back and try again. So just 2 tries. Not
>>> sure if that's optimal.
>>>
>>> I realize that this is arbitrary, but the files reside on various
>>> places on the net, so it's impossible to profile in advance. I'm sure
>>> that someone else has done something similar though, and may have a
>>> good feel for median values.
>>
>> I assume that you send HEAD and not GET !?
>
> Yes.
>
>> A reasonable small timeout should be sufficient.
>
> I've been using a 2 second timeout, then retrying once if it fails. Is
> that what you meant by 'small'?
>
> The way I arrived at that: I noticed that a large timeout didn't
> succeed much more than a shorter one; if it was going to fail, it
> would just fail. But a -very- small timeout was under the response
> time of many servers. There is a sort of median range that is
> probably optimal. I just don't have a good feel for what the median
> value might be.
>
> Retries: I also found that some failures that would stall forever on
> the first try, would succeed on the second try. Something about just
> reinitiating the request. Not sure if it's worth doing a third or not.
>
> So the 2 second timeout was arbitrary, and I haven't had time to test
> over a huge number of servers. That's the main thing that I'd like
> find from you web guys. (Hey, I'm a desktop programmer... I don't do
> this stuff often).

2 seconds is a pretty huge timeout for HTTP.

>> You should do it thread based - possible queuing work to the
>> ThreadPool to maximize throughput.
>
> I usually use a BackgroundWorker with a dialog with progress bar.
> Watching the little bar move seems to provide some comfort during
> those delays. (-:>

I think doing many in parallel would be speed up things a lot.

And you can still use the progress bar.

>>> Another thing: I've often got WebResponse file sizes that are one byte
>>> different from the actual size of the file. Any idea what's up there?
>>
>> Difficult to say without an example URL.
>
> I'll try to look for a few examples. I thought maybe that was a very
> common thing, given that it seems to be just one byte much of the
> time...just seemed 'coincidental'.

OK.

Arne

From: Bob on 20 May 2010 08:23

On Tue, 18 May 2010 21:50:52 -0400, Arne Vajh�j <arne(a)vajhoej.dk>
wrote:

>On 18-05-2010 20:40, Bob wrote:
>> On Tue, 18 May 2010 19:55:42 -0400, Arne Vajh�j<arne(a)vajhoej.dk>
>> wrote:
>>
>>> On 18-05-2010 07:29, Bob wrote:
>>>> I need to scan a large number of web-resident files, primarily to get
>>>> file size. IOW, a simple operation. Can anyone provide the benefit of
>>>> their intuition on how to set the timeout, and how many retries to
>>>> attempt?
>>>>
>>>> Currently I have the WebRequest timeout set for 2 seconds, and if the
>>>> request times out, I loop back and try again. So just 2 tries. Not
>>>> sure if that's optimal.

>> I've been using a 2 second timeout, then retrying once if it fails. Is
>> that what you meant by 'small'?
>
>2 seconds is a pretty huge timeout for HTTP.

Hi again, Arne. I've run some tests (time consuming) on the file info
retrieval function. Reliabilty actually does stay pretty consistent
when the timeout is dropped from 2 seconds to 1 second as long as I do
at least one retry on failure. At 1/2 sec, I get a few errors, but at
1/4 sec, the error rate goes up.

Doing at least one retry seems important. Otherwise, even with a 4
second timeout, I get a consiiderable number of errors.

When I say "errors" above, I mean that the WebRequest times out. IOW,
setting the WebRequest timeout function to 4 seconds does not work as
well as 1 sec with a single retry.

Interesting how that works, but it took a long while to do those
tests.

>I think doing many in parallel would be speed up things a lot.
>
>And you can still use the progress bar.

Now that you mention it, is there an easy way to determine the number
of 'channels' that would be optimal? There's got to be a logical limit
on connections.

>>>> Another thing: I've often got WebResponse file sizes that are one byte
>>>> different from the actual size of the file. Any idea what's up there?
>>>
>>> Difficult to say without an example URL.
>>
>> I'll try to look for a few examples. I thought maybe that was a very
>> common thing, given that it seems to be just one byte much of the
>> time...just seemed 'coincidental'.
>
>OK.

Of course I haven't been able to get that to happen again since my
last post.

| Next | Last
Pages: 1 2
Prev: DateTimePicker return wrong format
Next: Binary Deserialization w slight mismatch on target object