Socket performance [Python]

Prev: non-blocking IO EAGAIN on write
Next: understanding the mro (long)

From: John Nagle on 25 Jul 2010 01:50

On 7/23/2010 5:06 PM, Navkirat Singh wrote:
> Hey Everyone,
>
> I had a question, programming sockets, what are the things that would
> degrade performance and what steps could help in a performance boost? I
> would also appreciate being pointed to some formal documentation or
> article.

1. When writing to a TCP socket, write everything you have to write
with one "send" or "write" operation if at all possible.
Don't write a little at a time. That results in sending small
packets, because sockets are "flushed" after each write.

2. Wait for input from multiple sources by using "select". (But
be aware that "select" doesn't work for Windows pipes.)

John Nagle

From: Roy Smith on 25 Jul 2010 08:22

In article <4c4bd0b1$0$1624$742ec2ed(a)news.sonic.net>,
John Nagle <nagle(a)animats.com> wrote:

> 1. When writing to a TCP socket, write everything you have to write
> with one "send" or "write" operation if at all possible.
> Don't write a little at a time. That results in sending small
> packets, because sockets are "flushed" after each write.

There's nothing that guarantees that a single write won't be split into
multiple packets, nor that multiple writes won't be coalesced into a
single packet. Or any combination of splitting and coalescing that the
kernel feels like.

That being said, for any sane implementation, what John says is true
most of the time, and is indeed a reasonable optimization. Just don't
depend on it being true all the time. The most common case where it
will not be true is if you're trying to send a large amount of data and
exceed the MTU of the network. Then you are certain to get
fragmentation.

Depending on what you're doing, this can be a point of networking
trivia, or it can be the difference between your application working and
not working. If you're just streaming data from one place to another,
you don't have to worry about it. But, if you're doing some sort of
interactive protocol where you send a command, wait for a respond, send
another command, etc, you really do need to be aware of how this works.

Let's say you're writing something like a HTTP client. You send a bunch
of headers, then expect to get back something like "200 OK\r\n", or "404
Not Found\r\n". You can't just do a read() on the socket and then
examine the string to see if the first three characters are "200" or
"404", because (regardless of how the server sent them), it is legal for
your read() to return just a single character (i.e. "2"), and then for
the next read() to get "00 OK\r\n". You need to do buffering inside
your application which keeps doing read() until you find the "\r\n" (and
stops there, even if the read() returned more data beyond that).

From: Navkirat Singh on 25 Jul 2010 08:50

On 25-Jul-2010, at 5:52 PM, Roy Smith wrote:

> In article <4c4bd0b1$0$1624$742ec2ed(a)news.sonic.net>,
> John Nagle <nagle(a)animats.com> wrote:
>
>> 1. When writing to a TCP socket, write everything you have to write
>> with one "send" or "write" operation if at all possible.
>> Don't write a little at a time. That results in sending small
>> packets, because sockets are "flushed" after each write.
>
> There's nothing that guarantees that a single write won't be split into
> multiple packets, nor that multiple writes won't be coalesced into a
> single packet. Or any combination of splitting and coalescing that the
> kernel feels like.
>
> That being said, for any sane implementation, what John says is true
> most of the time, and is indeed a reasonable optimization. Just don't
> depend on it being true all the time. The most common case where it
> will not be true is if you're trying to send a large amount of data and
> exceed the MTU of the network. Then you are certain to get
> fragmentation.
>
> Depending on what you're doing, this can be a point of networking
> trivia, or it can be the difference between your application working and
> not working. If you're just streaming data from one place to another,
> you don't have to worry about it. But, if you're doing some sort of
> interactive protocol where you send a command, wait for a respond, send
> another command, etc, you really do need to be aware of how this works.
>
> Let's say you're writing something like a HTTP client. You send a bunch
> of headers, then expect to get back something like "200 OK\r\n", or "404
> Not Found\r\n". You can't just do a read() on the socket and then
> examine the string to see if the first three characters are "200" or
> "404", because (regardless of how the server sent them), it is legal for
> your read() to return just a single character (i.e. "2"), and then for
> the next read() to get "00 OK\r\n". You need to do buffering inside
> your application which keeps doing read() until you find the "\r\n" (and
> stops there, even if the read() returned more data beyond that).
> --
> http://mail.python.org/mailman/listinfo/python-list

Thanks John, Roy. I really appreciate your valuable input. I have made a note of what you have said and will implement keeping the same in mind : )

Nav

First | Prev |
Pages: 1 2
Prev: non-blocking IO EAGAIN on write
Next: understanding the mro (long)