how to handle socket timeout? [Unix Programming]

Prev: About some snippet on APUE
Next: mmap(MAP_SHARED) and msync(MS_INVALIDATE)

From: Rick Jones on 15 Jan 2008 14:17

Arkadiy <vertleyb(a)gmail.com> wrote:
> On Jan 14, 7:10 pm, Rick Jones <rick.jon...(a)hp.com> wrote:
> > The server's host rebooting will cause an RST to come back to the
> > client end in response to the first segment the client sends to
> > the server after it reboots because the server's host TCP will
> > have no knowledge of the connection.

> This will take really long time.

Yes. Perhaps even longer than TCP might wait before giving-up on
retransmitting unACKnowledged data, perhaps not.

> Also, what if the server host crached and never got rebooted? Or
> got disconnected?

If you have no unACKnowledged data outstanding you need an
application-level keepalive (which effectively creates unACKed data),
or limp along with SO_KEEPALIVE.

rick jones
--
Wisdom Teeth are impacted, people are affected by the effects of events.
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

From: David Schwartz on 15 Jan 2008 15:56

On Jan 15, 10:15 am, Arkadiy <vertl...(a)gmail.com> wrote:
> On Jan 15, 12:10 pm, David Schwartz <dav...(a)webmaster.com> wrote:

> > You really want to backoff and retry though. And I'm not sure you want
> > to tear down the connection at the first sign of trouble.

> My problem is I can't understand the purpose of this "retry".

If you don't retry, then one failure would be the end of the world. By
retry, I don't mean the same operation, I mean retry getting the
server to work.

> If my
> timeout is 1 sec, and the first request timed out, and I retry it, why
> not to set the timeout to 2 sec in the first place?

Because if that happens, it will take you 2 seconds to detect a server
failure rather than 1. The first operation will timeout in a second,
and then your connect will fail. You won't have to wait another
second.

Note that "retry it" doesn't mean you send the same request or even
any request. It simply means that you try again. That might entail
making a connection, sending a 'VERSION' as a probe, or sending a
different request. But you try (to reach the server) again.

What you don't do is try a thousand concurrent requests just because
you got a thousand concurrent requests when you *know* the server is
likely overloaded. Because if you do that, the server will *never*
catch back up because it will be too busy handling the backlog of
queries you've already decided to ignore.

DS

From: Arkadiy on 15 Jan 2008 17:00

On Jan 15, 3:56 pm, David Schwartz <dav...(a)webmaster.com> wrote:
> On Jan 15, 10:15 am, Arkadiy <vertl...(a)gmail.com> wrote:
>
> > On Jan 15, 12:10 pm, David Schwartz <dav...(a)webmaster.com> wrote:
> > > You really want to backoff and retry though. And I'm not sure you want
> > > to tear down the connection at the first sign of trouble.
> > My problem is I can't understand the purpose of this "retry".
>
> If you don't retry, then one failure would be the end of the world. By
> retry, I don't mean the same operation, I mean retry getting the
> server to work.

OK, I think I understand now.

> > If my
> > timeout is 1 sec, and the first request timed out, and I retry it, why
> > not to set the timeout to 2 sec in the first place?
>
> Because if that happens, it will take you 2 seconds to detect a server
> failure rather than 1. The first operation will timeout in a second,
> and then your connect will fail. You won't have to wait another
> second.

Do you mean that, although the request times out, the connect fails
right away?

If the server is congested, wouldn't it fail to accept the connection
in reasonable amount of time? It seems to me that it's impossible to
tell congested server from unreachable by using connect the same way
as it's impossible to do by using timeout on a regular request... Am
I wrong?

Regards,
Arkadiy

From: Rick Jones on 15 Jan 2008 19:24

Arkadiy <vertleyb(a)gmail.com> wrote:
> If the server is congested, wouldn't it fail to accept the connection
> in reasonable amount of time?

Depending on one's definition of reasonable, not necessarily. What
happens depends on the TCP stack on the server. If we are taking
Unix/Unixlike then once the server application's listen queue fills,
subsequent attempts to connect will have the TCP SYNchronize segment
silently dropped. It will then be up to the client TCP stack's
behaviour on the connect() call and/or if the client _application_ has
done a non-blocking connect() and set its own timeout. (Or I suppose
arranged for a signal to be delivered to get it out of the blocking
connect() call...)

IIRC only the Windows TCP stack will "fail" a TCP SYN to a full listen
queue with an RST reply. And even then, there is no guarantee that
the RST will make it back to the client TCP stack, which brings us
right back to the Unix/Unixlike case...

> It seems to me that it's impossible to tell congested server from
> unreachable by using connect the same way as it's impossible to do
> by using timeout on a regular request... Am I wrong?

Depends on whether or not the server's TCP stack actively responds to
SYNs to a full listen queue with an RST and if those RST's arrive at
the client.

Also, the server may actually be "congested" before the listen queue
fills, although indeed one could consider a full listen queue as a
sign of a congested server. IE, a full queue is sufficient, but not
necessary.

rick jones
--
firebug n, the idiot who tosses a lit cigarette out his car window
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

From: Rainer Weikusat on 16 Jan 2008 02:19

Arkadiy <vertleyb(a)gmail.com> writes:
> On Jan 15, 3:56 pm, David Schwartz <dav...(a)webmaster.com> wrote:

[...]

>> The first operation will timeout in a second,
>> and then your connect will fail. You won't have to wait another
>> second.
>
> Do you mean that, although the request times out, the connect fails
> right away?

Usually not.

> If the server is congested, wouldn't it fail to accept the connection
> in reasonable amount of time? It seems to me that it's impossible to
> tell congested server from unreachable by using connect the same way
> as it's impossible to do by using timeout on a regular request...

Connect asynchronously and wait 'some time'. If the connection isn't
available by then, do something else. At the next request, if the
connect is still in progress, do the same and so forth until it
fails. Retry connecting for the next request. If you just drop the
connection at the first request timeout, subsequent replies from the
server should elicit a RST, which should help to get rid of an
eventual backlog. Preferably, don't do any this unless it is certain
that you are working around an actual problem.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8
Prev: About some snippet on APUE
Next: mmap(MAP_SHARED) and msync(MS_INVALIDATE)