From: Wietse Venema on
Proniewski Patrick:
> Since the migration from Postfix 2.0.10 to Postfix 2.7.0, smtp
> logs on LB pool display a huge amount of "No answer,timeout"
> message. From about 0-30 per day, the timeout count has jumped to
> 1500-5500 per day.

That implies that the load balancer gets no response from Postfix.

> So it appears that the connection between MAILGW and LB is not
> always properly closed. Am I wrong?

There is no evidence to support that conclusion.

> Any help is greatly appreciated!

If Postfix does not accept connections, then it will log warnings.

You need to search the POSTFIX logfile as per instructions in
http://www.postfix.org/DEBUG_README.html#logging.

Wietse

From: Victor Duchovni on
On Wed, Jun 02, 2010 at 05:14:45PM +0200, Proniewski Patrick wrote:

> So it appears that the connection between MAILGW and LB is not always
> properly closed. Am I wrong?

http://www.postfix.org/postconf.5.html#smtp_connection_cache_on_demand
http://www.postfix.org/CONNECTION_CACHE_README.html
http://www.postfix.org/postconf.5.html#connection_cache_ttl_limit

> smtp_destination_concurrency_limit = 10

A bit low, especially when most traffic goes to the same place. I would
recommend the default of 20 or even 50 in some similar cases. Increasing
the concurrency limit will likely reduce the backlog and demand for
cached connections.

> smtp_destination_recipient_limit = 15

This too is likely counter-productive. Decreasing envelope splitting
will reduce the number of messages and thus demand for cached connections.

> tcpdump output:

> 15:51:22.010668
> 220 co7.domain.tld SMTP ready
> 15:51:22.010746
> EHLO mailgw.domain.tld
> 15:51:22.011256
> 250-co7.domain.tld
> 250-AUTH CRAM-MD5 DIGEST-MD5 GSSAPI LOGIN PLAIN
> 250-STARTTLS
> 250 8BITMIME
> 15:51:22.011337
> MAIL FROM:<sender(a)mail.domain.tld> BODY=7BIT
> 15:51:22.011736
> 250 OK
> 15:51:22.011803
> RCPT TO:<xxxxxxxx(a)mail.domain.tld>
> 15:51:22.021497
> 250 OK
> 15:51:22.021574
> DATA
> 15:51:22.022099
> 354 Start mail input; end with <CRLF>.<CRLF>
> 15:51:22.022211 . 120:1568(1448)
> 15:51:22.022224 . 1568:3016(1448)
> 15:51:22.022235 P 3016:4216(1200)
> 15:51:22.022786 P 5664:5721(57)
> ...
> .
> 15:51:22.062987 . ack 5721
> 15:51:22.275092 P 197:205(8)
> 250 OK
> 15:51:22.374644 . ack 205

Postfix caches the connection and does not send "QUIT".

> 15:51:24.276369 F 5721:5721(0)

Postfix closes the connection two seconds later.

> 15:51:24.276858 P 205:265(60)
> 221 co7.domain.tld service closing transmission channel

The LB or the server behind it violates SMTP by sending an out-of-turn
SMTP reply.

> 15:51:24.276910 R
> 15:51:24.276916 F 265:265(0)
> 15:51:24.276930 R 4094209693:4094209693(0) win 0

Postfix is already gone. This is harmless, but the either LB or
custom mail-server behind it mishandles EOF by sending an out-of-turn
reply. If you don't want connection caching, turn it off.

--
Viktor.

From: Wietse Venema on
Proniewski Patrick:
> Wietse,
>
> Thank you for your fast reply.
>
> On 2 juin 2010, at 17:28, Wietse Venema wrote:
>
> > If Postfix does not accept connections, then it will log warnings.
>
>
> No warning on postfix side, otherwise I would have posted a sample here of course.
> This is a big jump between 2.0.10 and 2.7. I was not afraid about

I suppose you missed the Postfix RELEASE_NOTES files. I spend a
great deal of time maintaining this document, in the hope that it
will save system adminstrators time.

Only the simplest setups can skip six years of Postfix development
and expect that configurations don't need updates.

Wietse