Large incoming queues [Postfix]

Prev: Interface aliases and sending mail from postfix
Next: Limiting .forward file processing

From: Ram on 22 Jun 2010 05:29

On my central postfix server I do typically 100k mail transactions per
hour. Postfix 2.7 on a Dual Quadcore Xeon 4 GB Ram RHEL5 box.

Sometimes this happens that mails move very slowly from incoming queue
to the active queue.

I think I got the basic hygiene right:
This server has absolutely no header-checks , no content-checks ,
transport file ( hash) has less than 2k lines and syslog is not an issue
too. ( I dev-nulled the mail and tested that )

I suspect that the machine is starving on I/O , but "iostat " shows an
iowait of only 10%

>From the qshape readme
http://www.postfix.com/QSHAPE_README.html
"If the problem is I/O starvation, consider striping the queue over more
disks"

Does that mean I can have them over different partitions on different
disks. I had initially assumed all the postfix spool must be on the
same partition

Thanks
Ram

From: lst_hoe02 on 22 Jun 2010 07:50

Zitat von Ram <ram(a)netcore.co.in>:

> On my central postfix server I do typically 100k mail transactions per
> hour. Postfix 2.7 on a Dual Quadcore Xeon 4 GB Ram RHEL5 box.
>
>
> Sometimes this happens that mails move very slowly from incoming queue
> to the active queue.
>
>
> I think I got the basic hygiene right:
> This server has absolutely no header-checks , no content-checks ,
> transport file ( hash) has less than 2k lines and syslog is not an issue
> too. ( I dev-nulled the mail and tested that )

I guess you have read http://www.postfix.org/QSHAPE_README.html#incoming_queue

> I suspect that the machine is starving on I/O , but "iostat " shows an
> iowait of only 10%

From my point of view 10% can be quite a lot taken into account that
it is for all processes on the machine. As there is only one "qmgr"
but many "smtpd" it is possible that qmgr is limited by I/O on your
machine.
For possible band-aid have a look at in_flow_delay. Also check that
mail is leaving fast, otherwise the active queue will max-out by
default at 20000 mails.

>> From the qshape readme
> http://www.postfix.com/QSHAPE_README.html
> "If the problem is I/O starvation, consider striping the queue over more
> disks"
>
> Does that mean I can have them over different partitions on different
> disks. I had initially assumed all the postfix spool must be on the
> same partition

From my understanding the spool must be on the same partition. The
"different disks" is meant to be RAID 0/1/10 whatever or a seperate
disk for the spool.

Regards

Andreas

From: Stan Hoeppner on 22 Jun 2010 19:53

lst_hoe02(a)kwsoft.de put forth on 6/22/2010 6:50 AM:
> Zitat von Ram <ram(a)netcore.co.in>:

>> Does that mean I can have them over different partitions on different
>> disks. I had initially assumed all the postfix spool must be on the
>> same partition
>
> From my understanding the spool must be on the same partition. The
> "different disks" is meant to be RAID 0/1/10 whatever or a seperate disk
> for the spool.

This isn't even fully correct. Replace the word "partition" with
"filesystem". Postfix has no knowledge of disk provisioning, whether
partitions, or logical volumes created by some like LVM, etc. Postfix reads
from and writes to a filesystem, period.

The suggestion in the documentation assumes the reader is educated with
respect to *nix disk subsystems and filesystems. The suggestion is to create
a filesystem on a partition or logical volume on a disk subsystem comprised of
multiple disks and some form of striping to increase read/write throughput.
Striping allows read/write to multiple disk simultaneously. One would put the
postfix spool directory on the resulting filesystem. In order of maximal
performance the preferred striping RAID level would be:

1. RAID 0
2. RAID 10
3. RAID 5
4. RAID 50
5. RAID 6

RAID 0 decreases reliability but has vastly superior performance. If one disk
fails, both fail, from the OS's perspective. 2-5 all offer increased
reliability but 3-5 have decreased performance due to parity calculations,
especially so for RAID 6 which calculates parity _twice_. My advice would be
to only use 3-5 on a good hardware RAID controller with 256MB or more of cache.

Due to performance vs reliability factors, my recommendation would be to use 4
drives in a software or hardware RAID 10, preferably 10k or 15k RPM drives of
the SCSI or SAS flavor. SATA will work also but will be 30% to 100% slower.
From what the OP states, I'd say he'd probably do best with at least 73GB
drives, which would yield 146GB of RAID 10 storage for the spool. This will
yield twice the throughput of a single disk with double or more the
reliability. RAID 10 can suffer two disk failures simultaneously, but they
must be the "right" two disks because of the way the striping and mirroring is
performed. Normally one would simply count on RAID 10 to gracefully suffer
one disk failure, just like a mirrored set.

--
Stan

|
Pages: 1
Prev: Interface aliases and sending mail from postfix
Next: Limiting .forward file processing