From: Martijn de Munnik on
On Wed, 20 Jan 2010 07:20:01 -0500 (EST), wietse(a)porcupine.org (Wietse
Venema) wrote:
> Martijn de Munnik:
>> Hi list,
>>
>> I have a problem with delivering mail to a host and get this error:
>>
>> host mx2.amsterdam.nl[145.222.14.10] said: 421 enepmx02.amsterdam.nl
>> Error: timeout exceeded (in reply to end of DATA command)
>>
>> This error only seems to occur with 'large' mails. Currently I have a
>> mail
>> of ~600KB and ~8MB stuck in the queue. I don't think this is a postfix
>> issue on our site but an issue with the mailserver on the other site.
>> What
>> can cause such issues?
>
> Record a tcpdump trace. The way the session fails will indicate
> the kind of problem (MTU, Window scaling, and so on).
>
> http://www.postfix.org/DEBUG_README.html
>
> Wietse

Ok, I tried that and I'm not really sure where to look for. I opened the
tcpdump file in wireshark and there are a lot of warnings and notes in the
file.

--
Notes:
Duplicate ACK(#1) [145.222.14.10 -> 213.207.90.2]
Duplicate ACK(#2) [145.222.14.10 -> 213.207.90.2]
Duplicate ACK(#3) [145.222.14.10 -> 213.207.90.2]
Duplicate ACK(#4) [145.222.14.10 -> 213.207.90.2]
..
..
..
Duplicate ACK(#44) [145.222.14.10 -> 213.207.90.2]
Retransmission (suspected) [213.207.90.2 -> 145.222.14.10]

Warnings:
Fast retransmission (suspected) [213.207.90.2 -> 145.222.14.10]
Out-Of-Order segment [213.207.90.2 -> 145.222.14.10]
--

This is abracadabra for me ;)

Martijn
--
YoungGuns
Kasteleinenkampweg 7b
5222 AX 's-Hertogenbosch
T. 073 623 56 40
F. 073 623 56 39
www.youngguns.nl
KvK 18076568

From: Wietse Venema on
Martijn de Munnik:
> On Wed, 20 Jan 2010 07:20:01 -0500 (EST), wietse(a)porcupine.org (Wietse
> Venema) wrote:
> > Martijn de Munnik:
> >> Hi list,
> >>
> >> I have a problem with delivering mail to a host and get this error:
> >>
> >> host mx2.amsterdam.nl[145.222.14.10] said: 421 enepmx02.amsterdam.nl
> >> Error: timeout exceeded (in reply to end of DATA command)
> >>
> >> This error only seems to occur with 'large' mails. Currently I have a
> >> mail
> >> of ~600KB and ~8MB stuck in the queue. I don't think this is a postfix
> >> issue on our site but an issue with the mailserver on the other site.
> >> What
> >> can cause such issues?
> >
> > Record a tcpdump trace. The way the session fails will indicate
> > the kind of problem (MTU, Window scaling, and so on).
> >
> > http://www.postfix.org/DEBUG_README.html
> >
> > Wietse
>
> Ok, I tried that and I'm not really sure where to look for. I opened the
> tcpdump file in wireshark and there are a lot of warnings and notes in the
> file.
>
> --
> Notes:
> Duplicate ACK(#1) [145.222.14.10 -> 213.207.90.2]
> Duplicate ACK(#2) [145.222.14.10 -> 213.207.90.2]
> Duplicate ACK(#3) [145.222.14.10 -> 213.207.90.2]
> Duplicate ACK(#4) [145.222.14.10 -> 213.207.90.2]
> .
> .
> .
> Duplicate ACK(#44) [145.222.14.10 -> 213.207.90.2]
> Retransmission (suspected) [213.207.90.2 -> 145.222.14.10]
>
> Warnings:
> Fast retransmission (suspected) [213.207.90.2 -> 145.222.14.10]
> Out-Of-Order segment [213.207.90.2 -> 145.222.14.10]
> --
>
> This is abracadabra for me ;)

If you can make the "tcpdump -nr /file/name" output available then
people who understand TCP/IP can look at it.

Wietse

From: Wietse Venema on
Here's the TCP initial handshake:

17:30:44.951789 IP 213.207.90.2.48147 > 145.222.14.10.25: S 50514820:50514820(0) win 49640 <mss 1460,nop,wscale 0,nop,nop,sackOK>
17:30:44.954496 IP 145.222.14.10.25 > 213.207.90.2.48147: S 4148480248:4148480248(0) ack 50514821 win 5840 <mss 1380,nop,wscale 2>
17:30:44.954519 IP 213.207.90.2.48147 > 145.222.14.10.25: . ack 1 win 49680

Later, as the receiver processes the network packets, it acknowledges
the data received, sends its receive window size (how much more it
is willing to receive).

Above, with "wscale 2" the server at 145.222.14.10 announces that
its TCP receive window value needs to be multiplied by a factor of
4 (binary number shifted left by 2).

But, there is a broken router in the path that does not understand
window scaling.

Here is an example of what gets f-ed up:

17:30:45.412222 IP 213.207.90.2.48147 > 145.222.14.10.25: . 20853:22233(1380) ack 137 win 49680
17:30:45.412230 IP 213.207.90.2.48147 > 145.222.14.10.25: . 22233:23613(1380) ack 137 win 49680
17:30:45.412249 IP 213.207.90.2.48147 > 145.222.14.10.25: P 23613:24993(1380) ack 137 win 49680
17:30:45.412747 IP 145.222.14.10.25 > 213.207.90.2.48147: P ack 8433 win 5800
17:30:45.412748 IP 145.222.14.10.25 > 213.207.90.2.48147: P ack 8433 win 5800
17:30:45.412749 IP 145.222.14.10.25 > 213.207.90.2.48147: P ack 8433 win 5800

The receiver says they can receive bytes 8433-31633, but the broken
router does not know that 5800 needs to be multiplied by 4, and it
thinks the receiver can receive only bytes 8433-14233.

The broken router then throws away the bytes with higher sequence
numbers than 14233.

Workaround: turn off window scaling support on the sender's kernel.

Wietse

From: Victor Duchovni on
On Wed, Jan 20, 2010 at 03:22:56PM -0500, Wietse Venema wrote:

> The broken router then throws away the bytes with higher sequence
> numbers than 14233.
>
> Workaround: turn off window scaling support on the sender's kernel.

This problem is sufficiently common, that on Linux MTAs I always add:

net.ipv4.tcp_window_scaling = 0

to sysctl.conf. Adjust for other systems as necessary. This hurts
long-haul throughput, but email tolerates latency, provided most of your
outbound traffic is not a high-bandwidth channel to Mars (but then you
would not be using TCP anyway...)

--
Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the "Reply-To" header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:
<mailto:majordomo(a)postfix.org?body=unsubscribe%20postfix-users>

If my response solves your problem, the best way to thank me is to not
send an "it worked, thanks" follow-up. If you must respond, please put
"It worked, thanks" in the "Subject" so I can delete these quickly.

From: Wietse Venema on
Wietse Venema:
> You can do
>
> ndd /dev/tcp \?
>
> to find out what parameters are supported. On my Solaris9 and
> Solaris10 test boxes it is called tcp_wscale_always.
>
> According to Solaris10 documentation:
>
> When this parameter is enabled, which is the default setting
> [since Solaris10], TCP always sends a SYN segment with the
> window scale option, even if the window scale option value is
> 0.

With the default tcp_wscale_always setting, making a connection
from a Solaris 10 box to FreeBSD 8.0:

20:13:59.808828 IP 168.100.189.17.32799 > 168.100.189.10.25: Flags
[S], seq 118377775, win 49640, options [mss 1460,nop,wscale
0,nop,nop,sackOK], length 0
20:13:59.808892 IP 168.100.189.10.25 > 168.100.189.17.32799: Flags
[S.], seq 538094055, ack 118377776, win 65535, options [mss
1460,nop,wscale 3,sackOK,eol], length 0
20:13:59.809327 IP 168.100.189.17.32799 > 168.100.189.10.25: Flags
[.], ack 1, win 49640, length 0

Same system with tcp_wscale_always set to zero:

20:14:52.736959 IP 168.100.189.17.32800 > 168.100.189.10.25: Flags
[S], seq 131413865, win 49640, options [mss 1460,nop,nop,sackOK],
length 0
20:14:52.737016 IP 168.100.189.10.25 > 168.100.189.17.32800: Flags
[S.], seq 3072042607, ack 131413866, win 65535, options [mss
1460,sackOK,eol], length 0
20:14:52.737581 IP 168.100.189.17.32800 > 168.100.189.10.25: Flags
[.], ack 1, win 49640, length 0

Thus, Solaris 10 does not send wscale, and neither should the
remote server.

If this does not make your mail move, then you need to collect
another tcpdump recording.

In that case mail was not moving because of multiple problems.

Wietse