From: JonB on
Hi,

We have a number of sendmail servers that have particularly 'deep'
queues (30-60k messages) with queue run times often in tens of minutes
(a lot more for very deep queues).

We're looking at the best strategy for handling this.

At the moment we just run:

sendmail -q30s


Although 30 seconds may sound too aggressive - despite the deep queues
the machines aren't heaily disk bound so they do seem to cope.

Can someone confirm that the .cf setting (with no queue groups apart
from the default one defined):

O MaxRunnersPerQueue=80

Will dictate the maximum number of queue runners the above will ever
have running in parallel?


Also the above gives a very 'slow start' should sendmail be restarted
(assuming that the MaxRunnersPerQueue limit is adhered to) - it could
take up to 40 minutes for us to be back to the 80 runners again.


We've also looked at 'sendmail -qp' - with a different strategy to be
run up say 80 of these '-qp' processes (obviously staggered).

This would appear to get us a much faster 'ramp up' time - we realise
there'll be a chance that at some point multiple sendmails are going
to be going through the queue files at the same time but that can
happen with the current system anyway.

Would that work?

I guess the 'best' stragegy would be to somehow split the queue
equally between the 80 runners (using a modulo of the queue ID would
be ideal) - but I can't see any way to do that unless we had queue
groups (and if we do that - I can't see any way to get sendmail to
'round robin' between the queue groups when it receives the mail) -
only filter to a particular group based on domain, priority etc.

-Jon
From: Jose-Marcio Martins da Cruz on

Hi,

JonB wrote:
> Hi,
>
> We have a number of sendmail servers that have particularly 'deep'
> queues (30-60k messages) with queue run times often in tens of minutes
> (a lot more for very deep queues).
>
> We're looking at the best strategy for handling this.
>
> At the moment we just run:
>
> sendmail -q30s

The good strategy depends where you are between two extreme situations :

* are you running a big listserver and you're putting out many messages which are delivered soon :
i.e., most of then goes out in the first tries.

* most of the messages are temporary failed and stay a very long time in the queue.

These two extremes means : what's the "mean stay time" of messages in your server, and why they stay
long.

One idea, just the idea, you should adapt it, tune it, and add some home made sauce (queue runners
and other things).

Instead of running the queue each 30 secs (too fast), put something like this :

O MinQueueAge=30m

This way each message in the queue won't be tried in intervals shorter than 30 minutes (or even
longer, if you prefer). And run the queue each, say, not less than 5 or 10 minutes.

sendmail -q10m

Also, you can have other queues, depending on message age. The idea is : sendmail puts the message
in the normal queue. You run it as explained above. Each hour you scan the queue and move messages
older than, say 12 or 18 hours to another queue with lower priority. In this queue, You'll still run
sendmail with "sendmail -q10m" but with something like "O MinQueueAge=2h". This way, older messages
will be run less frequently. You can use qtool.pl, a perl script which you'll find inside the
contrib directory, to move messages from one queue to the other one. These queues are just different
directories.

Also, take a look at the Sendmail book from Brian Costales. There are some hints there. Many years
ago, there were an interesting book (Sendmail Performance Tuning), but it refers to old sendmail
versions (well, the ideas are still valid) and it's out of print.

Hope this help

From: David F. Skoll on
JonB wrote:

> We have a number of sendmail servers that have particularly 'deep'
> queues (30-60k messages) with queue run times often in tens of minutes
> (a lot more for very deep queues).

> We're looking at the best strategy for handling this.

We use a variety of strategies, some of which might not be appropriate for
you:

1) We use define(`confQUEUE_SORT_ORDER',`random')dnl (or `none')
so the queue runner doesn't have to read all the qf files before starting up.
This is not appropriate if out-of-order delivery can't be tolerated.
It does greatly reduce the "ramp-up" time, though.

2) We sometimes limit confMAX_QUEUE_RUN_SIZE on very large queues just
to pick them off a manageable chunk at a time. This also greatly
reduces the ramp-up time.

3) We use a fallback MX host (with appropriately-tuned queue settings)
to keep the queues on our main mail server small.

Regards,

David.
From: David F. Skoll on
Chih-Cherng Chin wrote:

> It is my understanding that traditionally, unix performs badly when
> there are too many files in a single directory.

I don't think that's true any more for modern file systems. Even ext3
has a "dir_index" option that speeds up operations in large directories
(and most distros enable it.)

Regards,

David.
From: JonB on
On Jan 25, 8:16 pm, Andrzej Adam Filip <a...(a)onet.eu> wrote:

> You use hoststatus directory to avoid excessive retries to "inaccessible
> sites", do not you?

We did look at using that - but it caused more issues than it solved,
firstly because some of the sites we deliver to use load-balancers,
with a single published MX (not nice - as Sendmail thinks that IP is
down - and will 'not try again' for some time, when in reality
subsequent connects are likely to get through).

Also - some sites will return 4xx defers for certain email addresses.
Sendmail appears to cache this fact, and again - will not even attempt
the other addresses for that MX for a time period - sure this is
correct behaviour if the destination MX is having issues but hurts if
it's just that one destination address thats having issues.

(Infact I've just done another post asking for confirmation about
Timeout.hoststatus vs. 4xx and 5xx responses in another post).

-Jon