From: Nataraj on
Hi,

I would appreciate any suggestions anyone can offer on the following
problem that I'm having with postfix...

I'm running postfix+pgsql-2.3.3-2.1.el5_2 on a CentOS 5.4 server. I see
what looks likes a server in stress mode as described in
http://www.postfix.org/STRESS_README.html except the odd think about it
is that the server is not heavily loaded and I sure can't see where it's
exceeding any process limits. What's even odder is it doesn't appear
that the stress code is implemented in this version.

If I telnet to port 25 I get an immediate SMTP greeting followed in 10
seconds by
421 4.4.2 mymail.com Error: timeout exceeded
and the connection being closed.
The following maillog entry is logged:
May 3 16:44:06 mymail postfix/smtpd[22573]: timeout after CONNECT from
173-12-149-200.client.myisp.com[173.12.149.200]

This is like this constantly. I see 0-4 smtpd processes on the server
at any one time (I'm not sure if it's limited at 4, I just haven't seen
more). There are a similar number of policy daemons. There is a "-" in
the maxproc field for smtpd in master.cf. From everything I can tell
the default is a limit of 100. I do run a policy daemon
(vpostmaster). I've changed its maxproc field to 0 per the
recommendation in the STRESS_README (and restarted postfix). It's
master.cf entry looks like this...
vpm-pfpolicy unix - n n - 0 spawn
user=vpostmaster argv=/usr/lib/vpostmaster/postfix/vpm-pfpolicy

I also get lots of log entries like this for timeouts on the policy daemon:
May 2 05:36:20 aspen postfix/spawn[6003]: warning:
/usr/lib/vpostmaster/postfix/vpm-pfpolicy: process id 6004: command time
limit exceeded

and occasionally similar timeouts on the transport daemon (which is part
of vpostmaster as well).


My system load (which is running on a Vmware ESXi virtual machine) is:
16:59:36 up 2 days, 22:17, 3 users, load average: 0.08, 0.02, 0.01

Then, on outbound mail, I found this one site that adds delays of 18
seconds before the helo message to their SMTP server. My server cannot
get a message through to that server at all.

I tried adding -o stress= for the smtpd and nothing changed. The
system does not automatically add the stress parameter to smtpd if I
don't add it myself, so I'm not inclined to believe that Centos/Redhat
5.4 has the stress patch applied.

I do not see any slowness in the DNS servers. I have at most 2 RBL
lists that I check.

I did have limits set on the number of connections and timeouts etc, but
I've restored them all to the defaults for purposes of
debugging. I get these timeouts even when there's only one smtpd process.

Some of my config parameters right now are:
ipc_timeout = 3600s
command_time_limit = 1000s
smtpd_error_sleep_time = 1s
smtpd_policy_service_timeout = 100s
smtpd_proxy_timeout = 100s
smtpd_starttls_timeout = 300s
smtpd_timeout = 300s
smtpd_tls_session_cache_timeout = 3600s
smtpd_client_connection_count_limit = 50
smtpd_client_connection_rate_limit = 0
smtpd_client_message_rate_limit = 0
smtpd_client_new_tls_session_rate_limit = 0
smtpd_client_recipient_rate_limit = 0
smtpd_hard_error_limit = 20
smtpd_junk_command_limit = 100
smtpd_recipient_limit = 1000
smtpd_recipient_overshoot_limit = 1000
smtpd_soft_error_limit = 10


THank you,
Nataraj

From: Nataraj on
lst_hoe02(a)kwsoft.de wrote:
> Zitat von Nataraj <incoming-postfix(a)rjl.com>:
>
>> Hi,
>>
>> I would appreciate any suggestions anyone can offer on the following
>> problem that I'm having with postfix...
>>
>> I'm running postfix+pgsql-2.3.3-2.1.el5_2 on a CentOS 5.4 server. I
>> see what looks likes a server in stress mode as described in
>> http://www.postfix.org/STRESS_README.html except the odd think about
>> it is that the server is not heavily loaded and I sure can't see
>> where it's exceeding any process limits. What's even odder is it
>> doesn't appear that the stress code is implemented in this version.
>>
>> If I telnet to port 25 I get an immediate SMTP greeting followed in
>> 10 seconds by
>> 421 4.4.2 mymail.com Error: timeout exceeded
>> and the connection being closed.
>> The following maillog entry is logged:
>> May 3 16:44:06 mymail postfix/smtpd[22573]: timeout after CONNECT
>> from 173-12-149-200.client.myisp.com[173.12.149.200]
>
> Why do you think your Postfix server is "stressed"?? The automatic
> stress-dependant features are introduced in version 2.5 as far as i
> know so your Postfix does not support -o stress at all. If you only
> have 4 smtpd running and your server show greeting immediately when
> telneting to port 25 all should be fine.

I don't think it is a stress condition. The problem is that it is
timing everything out. Further checking shows, it times out inbound
SMTP connections in like 3-4 seconds and fails outbound deliveries to
slow servers. The transports also timeout and bounce messages.
>
> Regards
>
> Andreas
>

From: Nataraj on
lst_hoe02(a)kwsoft.de wrote:
> Zitat von Nataraj <incoming-postfix(a)rjl.com>:
>
>> lst_hoe02(a)kwsoft.de wrote:
>>> Zitat von Nataraj <incoming-postfix(a)rjl.com>:
>>>
>>>> Hi,
>>>>
>>>> I would appreciate any suggestions anyone can offer on the
>>>> following problem that I'm having with postfix...
>>>>
>>>> I'm running postfix+pgsql-2.3.3-2.1.el5_2 on a CentOS 5.4 server.
>>>> I see what looks likes a server in stress mode as described in
>>>> http://www.postfix.org/STRESS_README.html except the odd think
>>>> about it is that the server is not heavily loaded and I sure can't
>>>> see where it's exceeding any process limits. What's even odder is
>>>> it doesn't appear that the stress code is implemented in this version.
>>>>
>>>> If I telnet to port 25 I get an immediate SMTP greeting followed in
>>>> 10 seconds by
>>>> 421 4.4.2 mymail.com Error: timeout exceeded
>>>> and the connection being closed.
>>>> The following maillog entry is logged:
>>>> May 3 16:44:06 mymail postfix/smtpd[22573]: timeout after CONNECT
>>>> from 173-12-149-200.client.myisp.com[173.12.149.200]
>>>
>>> Why do you think your Postfix server is "stressed"?? The automatic
>>> stress-dependant features are introduced in version 2.5 as far as i
>>> know so your Postfix does not support -o stress at all. If you only
>>> have 4 smtpd running and your server show greeting immediately when
>>> telneting to port 25 all should be fine.
>>
>> I don't think it is a stress condition. The problem is that it is
>> timing everything out. Further checking shows, it times out inbound
>> SMTP connections in like 3-4 seconds and fails outbound deliveries to
>> slow servers. The transports also timeout and bounce messages.
>
> Sorry i still don't understand what the problem is...
> Are you concerned about "timeout after SOMETHING" messages in the logs?
> Do you have problems with mails not arriving?
> Do you have problems with mails not leaving?
>
> Please try to explain, maybe with log data for a problematic mail.
>
> Regards
>
> Andreas
>
"fails outbound deliveries to slow servers." means there are problems
with outbound mail deliveries. Yes there are complaints of people
sending mail from the outside whose delivery is either delayed or does
not get through. This means there are problems with inbound mail. An
SMTP server that times out after 4 seconds of inactivity is not
reasonable given the possibility of delays in others mail systems and on
the internet.

The question that I am asking is what else can cause these timeouts if
there is no stress code in my current version of postfix?

Nataraj

From: Charles Gregory on
On Tue, 4 May 2010, Nataraj wrote:
>> > > > If I telnet to port 25 I get an immediate SMTP greeting followed in
>> > > > 10 seconds by
>> > > > 421 4.4.2 mymail.com Error: timeout exceeded

Firstly, this is meaningless. All SMTP connections will time out
quickly if no initial commands are received. If you really want to test
your machine's timeout, enter MAI FROM and RCPT TO and DATA, send at least
one line, then wait....

> Yes there are complaints of people sending mail from the outside whose
> delivery is either delayed or does not get through. This means there are
> problems with inbound mail. An SMTP server that times out after 4
> seconds of inactivity is not reasonable given the possibility of delays
> in others mail systems and on the internet.

It is possible that the timeout is NOT in postfix. Please
post some logs shownig the actual 'timeout' error message.

- C

From: Nataraj on
lst_hoe02(a)kwsoft.de wrote:
> Zitat von Nataraj <incoming-postfix(a)rjl.com>:
>
>> lst_hoe02(a)kwsoft.de wrote:
>>> Zitat von Nataraj <incoming-postfix(a)rjl.com>:
>>>
>>>> lst_hoe02(a)kwsoft.de wrote:
>>>>> Zitat von Nataraj <incoming-postfix(a)rjl.com>:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I would appreciate any suggestions anyone can offer on the
>>>>>> following problem that I'm having with postfix...
>>>>>>
>>>>>> I'm running postfix+pgsql-2.3.3-2.1.el5_2 on a CentOS 5.4
>>>>>> server. I see what looks likes a server in stress mode as
>>>>>> described in http://www.postfix.org/STRESS_README.html except the
>>>>>> odd think about it is that the server is not heavily loaded and I
>>>>>> sure can't see where it's exceeding any process limits. What's
>>>>>> even odder is it doesn't appear that the stress code is
>>>>>> implemented in this version.
>>>>>>
>>>>>> If I telnet to port 25 I get an immediate SMTP greeting followed
>>>>>> in 10 seconds by
>>>>>> 421 4.4.2 mymail.com Error: timeout exceeded
>>>>>> and the connection being closed.
>>>>>> The following maillog entry is logged:
>>>>>> May 3 16:44:06 mymail postfix/smtpd[22573]: timeout after CONNECT
>>>>>> from 173-12-149-200.client.myisp.com[173.12.149.200]
>>>>>
>>>>> Why do you think your Postfix server is "stressed"?? The automatic
>>>>> stress-dependant features are introduced in version 2.5 as far as
>>>>> i know so your Postfix does not support -o stress at all. If you
>>>>> only have 4 smtpd running and your server show greeting
>>>>> immediately when telneting to port 25 all should be fine.
>>>>
>>>> I don't think it is a stress condition. The problem is that it is
>>>> timing everything out. Further checking shows, it times out
>>>> inbound SMTP connections in like 3-4 seconds and fails outbound
>>>> deliveries to slow servers. The transports also timeout and bounce
>>>> messages.
>>>
>>> Sorry i still don't understand what the problem is...
>>> Are you concerned about "timeout after SOMETHING" messages in the logs?
>>> Do you have problems with mails not arriving?
>>> Do you have problems with mails not leaving?
>>>
>>> Please try to explain, maybe with log data for a problematic mail.
>>>
>>> Regards
>>>
>>> Andreas
>>>
>> "fails outbound deliveries to slow servers." means there are problems
>> with outbound mail deliveries. Yes there are complaints of people
>> sending mail from the outside whose delivery is either delayed or
>> does not get through. This means there are problems with inbound
>> mail. An SMTP server that times out after 4 seconds of inactivity is
>> not reasonable given the possibility of delays in others mail systems
>> and on the internet.
>
> Please show the relevant log entries for a failed delivery attempt
> from connect until disconnect. Postfix has reasonable defaults and for
> sure no 4s timeout.
>
>> The question that I am asking is what else can cause these timeouts
>> if there is no stress code in my current version of postfix?
>
> The smtpd_<mumble>_timeout settings are relevant on postfix side. The
> default for smtpd is 300s. If your settings are at default or far from
> the values you are seeing you should check if some stateful firewall
> in between drops "inactive" connections earlier. If this does not help
> you should capture a tcpdump to see what is going on.
>
> Regards
>
> Andreas
>
Hi Andreas,

I did post my configuration parameters in my first message. I mentioned
that I had changed them, but had restored the defaults to see if that
would solve the problem. The smtpd_timeout was never changed from
300s. Here is a log of a failed outbound delivery. As I mentioned,
the sysadmin for this server told me they have an 18 second delay before
sending the greeting, which they've found effective at stopping
spambots. I've done crazier things myself, and I think that a properly
working mailserver should be able to deliver to them even if it is a bit
unfriendly. Certainly I was able to deliver to them before this problem
started.

May 2 05:22:32 aspen postfix/smtp[6575]: 4CF471D0200:
to=<xxxxxx(a)cs.rutgers.edu>, relay=mail.cs.rutgers.edu[128.6.4.3]:25,
delay=124076, delays=124072/0.05/4\
..3/0, dsn=4.4.2, status=deferred (conversation with
mail.cs.rutgers.edu[128.6.4.3] timed out while receiving the initial
server greeting)

It most definitely is valid to telnet to an smtp server and type
commands to test it. I've been doing this for years. It was only until
this problem came up that I am no longer able to do it because I can't
type fast enough before it times me out (unless I enable pipelining and
paste the commands).

I maintain the firewall and there are no logs of problems there. I'm
fairly certain this is the SMTP server. If a firewall was timing out
the connection postfix would not know that it was a timeout and would
not be able to log a timeout error.

penguin> telnet aspen 25
Trying 66.35.48.14...
Connected to aspen.
Escape character is '^]'.
220 aspen.rjl.com ESMTP vPostMaster
421 4.4.2 aspen.rjl.com Error: timeout exceeded
Connection closed by foreign host.

From this log of the telnet session, the 421 error comes from the smtp
server and running tcpdump is useless because the problem is at the SMTP
level and not at the TCP or IP level.

Thank you. I appreciate your efforts to help. I guess I will have to
dig into the code and see if I can figure out what is going on, but I
was hoping somebody else might have seen a problem like this that would
save me the efforts of debugging the code. I may try upgrading to a
more recent release of postfix first.

Nataraj