From: wkevin on
Hello,
I have a simple TCP client-server application, which I run between
two desktops on the same LAN.
The MTU on the NICS on both server and client are 1500 bytes.
The code for the client TCP socket is at the bottom of this message.
I send a large buffer, of 2010 bytes (I tried sending by calling
"write()" as you can see below, but also by calling send()).
What I see in sniffer that there is one data packet sent, and the size
of the data is only 562
bytes ! I expected it to be at least 1500 bytes !
Why is it so ? Is there a way to change it ?
I printed the return value of write, and it says
2010 bytes were sent. Also printing the buffer size shows 2010 bytes.
(echolen=2010)

here is the code of the client:


#include <sys/socket.h>
#include <arpa/inet.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <netinet/in.h>
#include <stdio.h>
void Die(char *mess) { perror(mess); exit(1); }

#define BUFFSIZE 32

int main(int argc, char *argv[]) {
int sock;
struct sockaddr_in echoserver;
char buffer[BUFFSIZE];
unsigned int echolen;
int received = 0;
int i;
int res;
char buf[3000] = "1234567890";
for (i=0; i<200; i++)
strcat(buf, "1234567890");
if ((sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0) {
Die("Failed to create socket");
}

memset(&echoserver, 0, sizeof(echoserver)); /* Clear struct */
echoserver.sin_family = AF_INET; /* Internet/IP */
echoserver.sin_addr.s_addr = inet_addr(argv[1]); /* IP address */
echoserver.sin_port = htons(atoi(argv[2])); /* server port */
if (connect(sock,
(struct sockaddr *) &echoserver,
sizeof(echoserver)) < 0) {
Die("Failed to connect with server");
}


echolen = strlen(buf);
printf("echolen=%d\n",echolen);
res=write(sock, buf, echolen);
printf("%d bytes were sent\n",res);
close(sock);
exit(0);
}

Rgs,
Kevin
From: pk on
wkevin wrote:

> Hello,
> I have a simple TCP client-server application, which I run between
> two desktops on the same LAN.
> The MTU on the NICS on both server and client are 1500 bytes.
> The code for the client TCP socket is at the bottom of this message.
> I send a large buffer, of 2010 bytes (I tried sending by calling
> "write()" as you can see below, but also by calling send()).
> What I see in sniffer that there is one data packet sent, and the size
> of the data is only 562
> bytes ! I expected it to be at least 1500 bytes !

The IP packet size in TCP usually depends on the outgoig interface MTU, but
also on the MSS (maximum segment size) received from the peer during the
three-way handshake. The minimum of the two determines the maximum packet
size. Since you say both the client and the server's MTU are 1500, they
should both advertise an MSS of 1460 (usually). But if for some reason that
is not happening, then it may be the cause of what you're seeing.
It may also be that the peer advertise a large MSS, but some device in the
path rewrites it to a lower value. That used to be (and to some extent still
is) a fairly common thing to do to work around broken PMTU discovery (you'll
find it described as "MSS clamping" or similar terms).

From: wkevin on
Hi,
Thanks a lot for your answer!

Well!
I looked again at your answer.
Both MSS values in the TCP syn packets are indeed 1460, and the
machines are connected
via a hub.
So I looked more carefully into the sniff, and what I found is that
there were in fact two packets, the first with 1448 bytes and the
second
with 562. And since we have 1448 + 562 = 2010, then it is OK.

What I don't understand here is:
1) why the value of is MSS which is exchanged is 1460 and not 1500
(the MTU value)?

2) the first packet had a "ACK" in the wireshark info coulmn, while
the
second packet had a "PSH". what is the meaning of this PSH ?

3) I also tried to set the TCP_MAXSEG with the little code below; it
did not
gave an error, still the TCP_MAXSEG was not changed. It was
536 and stayed 536 after calling set_sock_opt; any ideas why ?


--
socklen_t sl;
unsigned int mss ;
unsigned int maxseg;
....
/* Create the TCP socket */
if ((sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0) {
Die("Failed to create socket");
}
mss=0;
if(getsockopt(sock, IPPROTO_TCP, TCP_MAXSEG, &mss, &sl) == -1) {
perror("getsockopt");
close(sock);
return -1;
}
printf("mss=%d\n",mss);

maxseg = 500;
sl = sizeof(maxseg);
if(setsockopt(sock, IPPROTO_TCP, TCP_MAXSEG, (char *)&maxseg,sl)==-1)
{
perror("setsockopt");
exit(-1);
}
mss=0;
if(getsockopt(sock, IPPROTO_TCP, TCP_MAXSEG, &mss, &sl) == -1) {
perror("getsockopt");
close(sock);
return -1;
}
printf("new mss=%d\n",mss);
Rgs,
Kevin


On Apr 3, 2:10 pm, pk <p...(a)pk.invalid> wrote:
> wkevin wrote:
> > Hello,
> >   I have a simple TCP client-server application, which I run between
> > two desktops on the same LAN.
> > The MTU on the NICS on both server and client are 1500 bytes.
> > The code for the client TCP socket is at the bottom of this message.
> > I send a large buffer, of 2010 bytes (I tried sending by calling
> > "write()" as you can see below, but also by calling send()).
> > What I see in sniffer that there is one data packet sent, and the size
> > of the data is only 562
> > bytes ! I expected it to be at least 1500 bytes !
>
> The IP packet size in TCP usually depends on the outgoig interface MTU, but
> also on the MSS (maximum segment size) received from the peer during the
> three-way handshake. The minimum of the two determines the maximum packet
> size. Since you say both the client and the server's MTU are 1500, they
> should both advertise an MSS of 1460 (usually). But if for some reason that
> is not happening, then it may be the cause of what you're seeing.
> It may also be that the peer advertise a large MSS, but some device in the
> path rewrites it to a lower value. That used to be (and to some extent still
> is) a fairly common thing to do to work around broken PMTU discovery (you'll
> find it described as "MSS clamping" or similar terms).

From: pk on
wkevin wrote:

> I looked again at your answer.
> Both MSS values in the TCP syn packets are indeed 1460, and the
> machines are connected via a hub.
> So I looked more carefully into the sniff, and what I found is that
> there were in fact two packets, the first with 1448 bytes and the
> second with 562. And since we have 1448 + 562 = 2010, then it is OK.

Ok, so you missed a packet in your first analysys.

> What I don't understand here is:
> 1) why the value of is MSS which is exchanged is 1460 and not 1500
> (the MTU value)?

Well, because MSS != MTU. The MTU is usually understood to mean "the maximum
size of an IP packet, including IP header". So if the MTU is 1500, that
means that a 1500 byte IP packet can be sent out the interface. Those 1500
bytes include 20 bytes of IP header (normally), any upper layer header (eg,
TCP, UDP, ICMP) and finally the real data.
When the packet is to be sent, the datalink layer adds layer 2 information
to the packet, usually in form of a header and/or a trailer. For example,
ethernet adds a 14 bytes header and a 4 byte trailer, so what is sent out on
the wire is a 1518 byte frame (or less if the original packet was less than
1500 bytes of course). If you are using VLAN tagging, the header is 18
bytes, so you can have frames up to 1522 bytes.

But back to the topic. If the biggest IP packet can be 1500 bytes, and that
includes headers and all upper layer data, then of course the TCP segment,
which is contained in the IP packet, cannot be 1500 bytes. Since there are
at least 20 bytes of IP header, the TCP segment cannot be more than 1480
bytes. But TCP has its own header as well, which again is normally 20 bytes.
So the TCP net payload cannot be more than 1460 bytes, and that's what the
MSS indicates. So in a sense, TCP has to know what the underying interface
MTU is to calculate the MSS to advertise. Usually, but do not assume that
blindly, the MSS advertised by TCP is MTU - 40 for IPv4, and MTU - 60 for
IPv6.

When a MSS of 1460 is used, a full TCP segment with header will be 1480
bytes, which in turn, after adding the IP header, will become a 1500 bytes
IP packet. If TCP had advertised a MSS of 1500, that would lead to building
packets that are too big for the interface MTU.

> 2) the first packet had a "ACK" in the wireshark info coulmn, while
> the second packet had a "PSH". what is the meaning of this PSH ?

It's a TCP flag that means "PUSH". It's usually used to tell the receiving
TCP that the data that it is keeping in its internal buffer should be pushed
to the application (ie, they will become visible to the application using
the socket, so they could be read() etc.). It's usually set on the last data
segment, as you see, and some other special circumstances.

> 3) I also tried to set the TCP_MAXSEG with the little code below; it
> did not gave an error, still the TCP_MAXSEG was not changed. It was
> 536 and stayed 536 after calling set_sock_opt; any ideas why ?

This is indeed strange. Looking at the kernel source (a very quick look, so
I may be missing something here), it seems that when you call setsockopt() a
variable called opt.user_mss is set, while when you call getsockopt() the
value is taken from a different variable called mss_cache, which is
initialized to 536 when the socket is created, so no wonder you see that
value.
However, it seems that when the socket is used (ie when you call connect())
then the value *is* used to clamp the MSS. I guess you should try to go
ahead and connect the socket, and sniff the traffic and see what MSS is
advertised. It *should* be 536, but I may be wrong.

From: Vishal Swarankar on
On Apr 3, 8:56 pm, pk <p...(a)pk.invalid> wrote:
> wkevin wrote:
> > I looked again at your answer.
> > Both MSS values in the TCP syn packets are indeed 1460, and the
> > machines are connected via a hub.
> > So I looked more carefully into the sniff, and what I found is that
> > there were in fact two packets, the first with 1448 bytes and the
> > second with 562. And since we have 1448 + 562 = 2010, then it is OK.
>
> Ok, so you missed a packet in your first analysys.
>
> > What I don't understand here is:
> > 1) why the value of is MSS which is exchanged is 1460 and not 1500
> > (the MTU value)?
>
> Well, because MSS != MTU. The MTU is usually understood to mean "the maximum
> size of an IP packet, including IP header". So if the MTU is 1500, that
> means that a 1500 byte IP packet can be sent out the interface. Those 1500
> bytes include 20 bytes of IP header (normally), any upper layer header (eg,
> TCP, UDP, ICMP) and finally the real data.
> When the packet is to be sent, the datalink layer adds layer 2 information
> to the packet, usually in form of a header and/or a trailer. For example,
> ethernet adds a 14 bytes header and a 4 byte trailer, so what is sent out on
> the wire is a 1518 byte frame (or less if the original packet was less than
> 1500 bytes of course). If you are using VLAN tagging, the header is 18
> bytes, so you can have frames up to 1522 bytes.
>
> But back to the topic. If the biggest IP packet can be 1500 bytes, and that
> includes headers and all upper layer data, then of course the TCP segment,
> which is contained in the IP packet, cannot be 1500 bytes. Since there are
> at least 20 bytes of IP header, the TCP segment cannot be more than 1480
> bytes. But TCP has its own header as well, which again is normally 20 bytes.
> So the TCP net payload cannot be more than 1460 bytes, and that's what the
> MSS indicates. So in a sense, TCP has to know what the underying interface
> MTU is to calculate the MSS to advertise. Usually, but do not assume that
> blindly, the MSS advertised by TCP is MTU - 40 for IPv4, and MTU - 60 for
> IPv6.
>
> When a MSS of 1460 is used, a full TCP segment with header will be 1480
> bytes, which in turn, after adding the IP header, will become a 1500 bytes
> IP packet. If TCP had advertised a MSS of 1500, that would lead to building
> packets that are too big for the interface MTU.
>
> > 2) the first packet had a "ACK" in the wireshark info coulmn, while
> > the second packet had a "PSH". what is the meaning of this PSH ?
>
> It's a TCP flag that means "PUSH". It's usually used to tell the receiving
> TCP that the data that it is keeping in its internal buffer should be pushed
> to the application (ie, they will become visible to the application using
> the socket, so they could be read() etc.). It's usually set on the last data
> segment, as you see, and some other special circumstances.
>
> > 3) I also tried to set the TCP_MAXSEG with the little code below; it
> > did not gave an error, still the TCP_MAXSEG was not changed. It was
> > 536 and stayed 536 after calling set_sock_opt; any ideas why ?
>
> This is indeed strange. Looking at the kernel source (a very quick look, so
> I may be missing something here), it seems that when you call setsockopt() a
> variable called opt.user_mss is set, while when you call getsockopt() the
> value is taken from a different variable called mss_cache, which is
> initialized to 536 when the socket is created, so no wonder you see that
> value.
> However, it seems that when the socket is used (ie when you call connect())
> then the value *is* used to clamp the MSS. I guess you should try to go
> ahead and connect the socket, and sniff the traffic and see what MSS is
> advertised. It *should* be 536, but I may be wrong.

---------------------------------
> 3) I also tried to set the TCP_MAXSEG with the little code below; it
> did not gave an error, still the TCP_MAXSEG was not changed. It was
> 536 and stayed 536 after calling set_sock_opt; any ideas why ?

>>This is indeed strange. Looking at the kernel source (a very quick look, so
>>I may be missing something here), it seems that when you call setsockopt() a
>>variable called opt.user_mss is set, while when you call getsockopt() the
>>value is taken from a different variable called mss_cache, which is
>>initialized to 536 when the socket is created, so no wonder you see that
>>value.
>>However, it seems that when the socket is used (ie when you call connect())
>>then the value *is* used to clamp the MSS. I guess you should try to go
>>ahead and connect the socket, and sniff the traffic and see what MSS is
>>advertised. It *should* be 536, but I may be wrong.

You must be using a kernel < 2.6.27.26
Here are the changes in 2.6.27.26 which will accept TCP MSS set by the
user. the kernel before this version was considering the user set MSS
into the factor. For more details look @ commit
fe05dfbd8f652ac0ef0d30d9498066bb9b8da0e0


---
net/ipv4/tcp_ipv4.c | 4 ++++
net/ipv4/tcp_output.c | 13 ++++++++++---
2 files changed, 14 insertions(+), 3 deletions(-)

--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1364,6 +1364,10 @@ struct sock *tcp_v4_syn_recv_sock(struct
tcp_mtup_init(newsk);
tcp_sync_mss(newsk, dst_mtu(dst));
newtp->advmss = dst_metric(dst, RTAX_ADVMSS); - old one does not
consider the USER MSS.
+ if (tcp_sk(sk)->rx_opt.user_mss &&
+ tcp_sk(sk)->rx_opt.user_mss < newtp->advmss)
+ newtp->advmss = tcp_sk(sk)->rx_opt.user_mss;
+
tcp_initialize_rcv_mss(newsk);



>>It was 536 and stayed 536 after calling set_sock_opt; any ideas why ?
this is again a bug in the stack. the getsockopt always gives the
default MSS, it doesnt matter whatever the real MSS is, so dont rely
on getsockopt. I have seen it till 2.6.31.14.

thnx