performance tuning - relay [Postfix]

Prev: Versioned documentation, was Re: postmap -q and ldap
Next: Multiple sites (and mail servers) for one domain

From: Christian Purnomo on 25 Jun 2010 00:33

Hi

We have 2 postfix servers.
1. One is our mail gateway which is also the primary MX for our domains, inbound and outbound emails all passes this (let's call this server1)
server.
2. The other server is a standalone postfix with tons of disk. (let's
call this server2).
Server 2 doesn't relay email, it only accepts email from always_bcc from
server1.

Mail gateway is set to use always_bcc = emailaddress(a)server1 above

This configuration works perfectly fine. One of web server application
often sends out a mass mail (about 20,000-30,000k messages) via server1,
and when this happens, it would take 1-2 days for the postfix queue on
server1 to clear.

I check the logs and notice the following error:
(delivery temporarily suspended: connect to 10.0.2.73[10.0.2.73]: read
timeout)

I'm quite confident this is a performance tuning related issue.

Server1 has the following configurations:

/etc/postfix/main.cf:
always_bcc = email(a)server2.com

/etc/postfix/transport:
server2.com: relay:[10.0.2.73]

/etc/postfix/master.cf:
relay unix - - n - 200 smtp
-o smtp_helo_timeout=3s
-o smtp_connect_timeout=3s
-o disable_dns_lookups=yes
-o fallback_relay=

Server 2 has the following configurations:

/etc/postfix/master.cf:
smtp inet n - - - 200 smtpd

Could you please tell me what I'm missing here? I would like to improve
the rate that Server1 can relay messages to Server2.

Thanks

From: Stan Hoeppner on 25 Jun 2010 02:53

Christian Purnomo put forth on 6/24/2010 11:33 PM:

> /etc/postfix/transport:
> server2.com: relay:[10.0.2.73]
>
> /etc/postfix/master.cf:
> relay unix - - n - 200 smtp
> -o smtp_helo_timeout=3s
> -o smtp_connect_timeout=3s
> -o disable_dns_lookups=yes
> -o fallback_relay=

This was answered by Wietse 4 years ago on this list. Took me ten seconds to
find it via Google. Read the entire thread on Neohapsis carefully and you'll
find your answer, which is to remove all this custom stuff and go back to the
defaults. The first 2 of 4 above are the cause of your immediate problem, as
they are waaaay too low. The other two are just unnecessary. And change
max_proc back to 100. You're probably not getting close to 100 processes
running anyway.

http://archives.neohapsis.com/archives/postfix/2006-01/thread.html#1866

> Server 2 has the following configurations:
>
> /etc/postfix/master.cf:
> smtp inet n - - - 200 smtpd

Change the max process limit back to 100. If everything else is configured
correctly, you can drain an unbelievable amount of mail with less than 100
smtp/smtpd processes.

> Could you please tell me what I'm missing here? I would like to improve
> the rate that Server1 can relay messages to Server2.

If I may be frank, you missed the fact that you shouldn't mess with the
default settings unless you really know what you're doing. Custom settings
here would require an extreme scenario. I don't believe your scenario is
extreme, but rather common. I'm not pretending to be an expert on this, or to
create the image that _I_ know how/when to customize these settings. I simply
know when _not_ to.

--
Stan

From: Christian Purnomo on 25 Jun 2010 09:01

HI Stan,

Thanks for your feedback.

I did try google for about an hour before turning to this list, I also
read http://postfix.nctu.edu.tw/TUNING_README.html several times. It
all starts making some sense after reading it a couple of times today.

This is what I have done so far which works:

Server1 (MX host)

/etc/postfix/transport:
server2.com: relayhigh:[10.0.2.73]

/etc/postfix/main.cf:
relayhigh_destination_concurrency_limit = 150

/etc/postfix/master.cf:
relayhigh unix - - n - 200 smtp
-o smtp_connect_timeout=1s
-o fallback_relay=

I tried putting the original setting back to original as your per
suggestion, the mail count in the queue was still hovering at 9800 mark
for about 15 minutes, going down at a rate of 10-15 per minute which
was unsustainable.

With the settings above, the queue is now down to 2442 within 20
minutes. It was at 21,000 mark when I sent my first email below
(nearly 12 hours ago), so the progress has been very minimal until the
change above. The bottleneck has now switched from Server1 queue to
Server2 queue as server2 uses maildrop for local delivery.

I would take any suggestions - the settings above are based from reading
TUNING_README.html, it's trial and error.

CP

Subject: Re: performance tuning - relay
Date: Fri, Jun 25, 2010 at 01:53:46AM -0500
Quoting Stan Hoeppner (stan(a)hardwarefreak.com):

: Christian Purnomo put forth on 6/24/2010 11:33 PM:
:
: > /etc/postfix/transport:
: > server2.com: relay:[10.0.2.73]
: >
: > /etc/postfix/master.cf:
: > relay unix - - n - 200 smtp
: > -o smtp_helo_timeout=3s
: > -o smtp_connect_timeout=3s
: > -o disable_dns_lookups=yes
: > -o fallback_relay=
:
: This was answered by Wietse 4 years ago on this list. Took me ten seconds to
: find it via Google. Read the entire thread on Neohapsis carefully and you'll
: find your answer, which is to remove all this custom stuff and go back to the
: defaults. The first 2 of 4 above are the cause of your immediate problem, as
: they are waaaay too low. The other two are just unnecessary. And change
: max_proc back to 100. You're probably not getting close to 100 processes
: running anyway.
:
: http://archives.neohapsis.com/archives/postfix/2006-01/thread.html#1866
:
: > Server 2 has the following configurations:
: >
: > /etc/postfix/master.cf:
: > smtp inet n - - - 200 smtpd
:
: Change the max process limit back to 100. If everything else is configured
: correctly, you can drain an unbelievable amount of mail with less than 100
: smtp/smtpd processes.
:
: > Could you please tell me what I'm missing here? I would like to improve
: > the rate that Server1 can relay messages to Server2.
:
: If I may be frank, you missed the fact that you shouldn't mess with the
: default settings unless you really know what you're doing. Custom settings
: here would require an extreme scenario. I don't believe your scenario is
: extreme, but rather common. I'm not pretending to be an expert on this, or to
: create the image that _I_ know how/when to customize these settings. I simply
: know when _not_ to.
:
: --
: Stan

From: Victor Duchovni on 25 Jun 2010 11:18

On Fri, Jun 25, 2010 at 01:53:46AM -0500, Stan Hoeppner wrote:

> Christian Purnomo put forth on 6/24/2010 11:33 PM:
>
> > /etc/postfix/transport:
> > server2.com: relay:[10.0.2.73]
> >
> > /etc/postfix/master.cf:
> > relay unix - - n - 200 smtp
> > -o smtp_helo_timeout=3s
> > -o smtp_connect_timeout=3s
> > -o disable_dns_lookups=yes
> > -o fallback_relay=
>
> This was answered by Wietse 4 years ago on this list. Took me ten seconds to
> find it via Google. Read the entire thread on Neohapsis carefully and you'll
> find your answer, which is to remove all this custom stuff and go back to the
> defaults. The first 2 of 4 above are the cause of your immediate problem, as
> they are waaaay too low. The other two are just unnecessary. And change
> max_proc back to 100. You're probably not getting close to 100 processes
> running anyway.

The connect timeout is actually reasonable for internal
destinations. The helo timeout is a bit light. Both are only useful
if there are multiple internal servers, which seems unlikely given the
"disable_dns_lookups=yes". Why is that setting there? It became obsolete
with Postfix 2.0 which was released 8 years ago.

The "fallback_relay" setting is correct, but even better is:

-o smtp_fallback_relay=

because the parameter has been renamed and the "fallback_relay" name
is a legacy alias, so is not always effective if the underlying real
variable is set in main.cf.

--
Viktor.

From: Stan Hoeppner on 25 Jun 2010 19:21

Christian Purnomo put forth on 6/25/2010 8:01 AM:

> With the settings above, the queue is now down to 2442 within 20
> minutes. It was at 21,000 mark when I sent my first email below
> (nearly 12 hours ago), so the progress has been very minimal until the
> change above. The bottleneck has now switched from Server1 queue to
> Server2 queue as server2 uses maildrop for local delivery.

Can you provide some more specs on server2? IIRC you said you had a multidisk
RAID array on serv2. What RAID level and how many disks? What filesystem?
Are you running Courier with maildrop or the standalone maildrop with another
IMAP server? What filtering, if any, are you doing with maildrop? Using mbox
or maildir storage? IIRC you previously said you're BCC'ing _everything_ into
a single mailbox (single address) on server2. Is this correct?

And, lastly, was server2 in production for any amount of time before these
problems occurred, prompting your post, or is this a new server that you just
brought online?

--
Stan

| Next | Last
Pages: 1 2
Prev: Versioned documentation, was Re: postmap -q and ldap
Next: Multiple sites (and mail servers) for one domain