From: Gary Smith on
I've been getting a lost of "lost connection after DATA" this last week. On our low volume servers (that houses some minor clients) we are receiving 800/day. We switched over to ipvsadm about 3 weeks ago and I though maybe it's because of non-persistent connections. So I reset ipvsadm to be persistent to a given server.

Anyway, we are still receiving them. The firewall allows port 25 incoming, everything outgoing but there is also some nat'ing going on because of the ipvsadm. Anyone ever seen this type of issue with this type of config?

We are getting mail flow pretty consistently, but today we had a report of a bounce from a user who was trying to receive a rather large document via email and that's when I noticed that in most cases it's "lost connection after DATA from ... " but in the case of this users, "lost connection after DATA (xxxxxx bytes) from ..."

Gary-

From: Charles Marcus on
On 2010-05-13 9:59 PM, Gary Smith wrote:
> Anyway, we are still receiving them. The firewall allows port 25
> incoming, everything outgoing but there is also some nat'ing going on
> because of the ipvsadm. Anyone ever seen this type of issue with
> this type of config?

Per the welcome message you received when you joined the list:

TO REPORT A PROBLEM see:
http://www.postfix.org/DEBUG_README.html#mail

At a minimum, postfix version, output of postconf -n and unedited
NON-verbose logs exhibiting the problem should be provided...

--

Best regards,

Charles

From: Gary Smith on
> Per the welcome message you received when you joined the list:
>

That would be like 5+ years ago. I've slept since then.

> TO REPORT A PROBLEM see:
> http://www.postfix.org/DEBUG_README.html#mail
>
> At a minimum, postfix version, output of postconf -n and unedited
> NON-verbose logs exhibiting the problem should be provided...


Config aside, which is included below, it seems to cleanup some after resetting all of the ipvsadm's for port 25 to be persistent. I'm sure it's a partial problem with the connections being lost after x seconds from NAT'ing. So, as per the original question, does anyone else run this similar configuration (postfix + ipvsadm + wlc nat) and if so, how do they table these lost connection problems.

$ postconf -n (same on all 6 nodes)
alias_database = hash:/etc/aliases
alias_maps = hash:/etc/aliases
command_directory = /usr/sbin
config_directory = /etc/postfix
daemon_directory = /usr/libexec/postfix
data_directory = /var/lib/postfix
debug_peer_level = 2
html_directory = no
inet_interfaces = all
inet_protocols = all
mail_owner = postfix
mailq_path = /usr/bin/mailq.postfix
manpage_directory = /usr/share/man
message_size_limit = 40960000
mydestination = $myhostname, localhost.$mydomain, localhost
newaliases_path = /usr/bin/newaliases.postfix
queue_directory = /var/spool/postfix
readme_directory = /usr/share/doc/postfix-2.7.0/README_FILES
relay_domains = hash:/etc/postfix/maps/relay_domain
relay_recipient_maps = hash:/etc/postfix/maps/relay_recipient
relay_transport = hash:/etc/postfix/maps/relay_transport
sample_directory = /usr/share/doc/postfix-2.7.0/samples
sendmail_path = /usr/sbin/sendmail.postfix
setgid_group = postdrop
smtp_tls_note_starttls_offer = yes
smtp_use_tls = yes
smtpd_recipient_restrictions = permit_mynetworks, reject_unknown_sender_domain, reject_unauth_destination, hash:/etc/postfix/custom/access, hash:/etc/postfix/custom/postmaster, reject_non_fqdn_recipient, reject_unlisted_recipient, reject_unknown_sender_domain, reject_invalid_hostname, reject_rbl_client zen.spamhaus.org, reject_rbl_client bl.spamcop.net, reject_rbl_client rhsbl.ahbl.org, check_policy_service inet:xxxxx:5847, reject_unauth_pipelining
smtpd_sender_restrictions = reject_unknown_sender_domain, hash:/etc/postfix/custom/sender_reject, permit
smtpd_tls_cert_file = /etc/ssl/certs/
smtpd_tls_key_file = /etc/ssl/private/
smtpd_tls_loglevel = 1
smtpd_tls_received_header = yes
smtpd_tls_session_cache_timeout = 3600s
smtpd_use_tls = yes
strict_rfc821_envelopes = yes
tls_random_source = dev:/dev/urandom
transport_maps = hash:/etc/postfix/maps/transport
unknown_local_recipient_reject_code = 550

relevant log file entries (sender in this case is a large commercial entity):
May 13 18:48:33 host01 postfix/smtpd[18110]: connect from sender[senderip]
May 13 18:48:33 host01 postfix/smtpd[18110]: setting up TLS connection from sender[senderip]
May 13 18:48:33 host01 postfix/smtpd[18110]: Anonymous TLS connection established from sender[senderip]: TLSv1 with cipher RC4-SHA (128/128 bits)
May 13 18:48:37 host01 postfix/smtpd[18110]: B30AAAFE4F: sender[senderip]]
May 13 18:48:42 host01 postfix/smtpd[18110]: lost connection after DATA (1723601 bytes) from sender[senderip]
May 13 18:48:42 host01 postfix/smtpd[18110]: disconnect from sender[senderip]

From: Gary Smith on
Weitse,

For some reason, random mails from you pop up in my inbox, instead of my postfix list instead delivery on behalf of postfix-users(a)postfix.org like most others. Just an FYI

> If the NAT assumes that everything is a web client and drops
> connections after a few seconds, then Postfix will report lost
> connections.
>
> If the NAT keeps connections open but it is a crappy box that can
> maintain state for only 100 connections, then it will be forced to
> to drop connections, and Postfix will report lost connections.

I was thinking that at first. The firewall has a high connection timeout and we tweaked up the connection tracking buckets pretty high, but still under the 4g of ram it has. The case that was pointed out failed after receiving a few mb in the first transmission and only a couple hundred k in the retries.

I'm sure it's not a probable with postfix, I'm just looking for postfix cases where they have overcome this type of config issue.

Gary-

From: Victor Duchovni on
On Fri, May 14, 2010 at 09:23:12AM -0700, Gary Smith wrote:

> I'm sure it's not a probable with postfix, I'm just looking for postfix
> cases where they have overcome this type of config issue.

Have you disabled window scaling on your Postfix server. Lost connections
are often the result of firewalls mangling "advanced" TCP features.

- Disable window scaling
- Disable ECN

--
Viktor.

P.S. Morgan Stanley is looking for a New York City based, Senior Unix
system/email administrator to architect and sustain our perimeter email
environment. If you are interested, please drop me a note.