From: Attila Nagy on
Hello,

I have a somewhat busy mail relay running postfix 2.7, which has
problems with a slow destination.
The symptom: the incoming queue grows large, the active queue is always
at qmgr_message_active_limit and only (well, mostly) contains messages
for the slow domain.
What I have already tried:
- growing the active_limit, which of course could help only by setting
so high that it could suck in all messages in the incoming queue
- defining a different transport for the slow domain and setting
destination concurrency limits

I don't really get the point in this, but I guess I've just overlooked
something. Why is it good to move as much as qmgr_message_active_limit
messages for the same domain into the active queue, without taking the
outbound bandwidth into account? I mean if postfix sees that it can't
deliver that much messages for the given domain as it moves into the
active queue, it means it will lock (slow) everybody out, like the case
below and inflate the size of the incoming queue and delivery times for
other destinations.

I can't limit the number (or rate) of incoming e-mails for that domain,
and I can't increase the throughput of the destination, because I don't
operate it (OK, that may be false, because postfix's destination
concurrency adjusments can make it worse than what it could accept).

So:
- is there any way to let other domains get into the active queue in a
fair manner?
- is it possible to adjust the incoming->active rate according to the
active->removed (delivered) rate? (reading through the docs I guess the
basic idea is to make the mails into the deferred queue instead if the
target behaves oddly, by blacklisting it, and decreasing the
concurrency, but this doesn't help (maybe the opposite, it makes things
worse) in this case)

qshape outputs (incoming queue is truncated, it contains a lot more
destinations)

# qshape active
T 5 10 20 40 80 160 320 640
1280 1280+
TOTAL 19994 0 9 64 212 704 2108 2066 4746
8557 1528
citromail.hu 19994 0 9 64 212 704 2108 2066 4746
8557 1528


# qshape incoming
T 5 10 20 40 80 160 320
640 1280 1280+
TOTAL 372213 14382 5276 10390 19830 31805 55481 46843
103169 59415 25622
citromail.hu 125378 2645 919 1830 3649 5775 10539 9705
23019 41749 25548
freemail.hu 123731 6264 2280 4482 8526 13907 26275 17530
37212 7255 0
gmail.com 26613 1402 547 1094 2139 3149 4453 4200
8135 1494 0
hotmail.com 8384 524 181 349 636 1019 1340 1323
2515 497 0
yahoo.com 7261 228 91 171 450 596 2489 930
2079 227 0
vipmail.hu 6925 416 157 271 505 747 1174 1032
2182 441 0
t-online.hu 4413 193 86 176 307 479 795 592
1602 183 0
chello.hu 1737 104 43 72 127 186 289 237
590 84 5
indamail.hu 1120 61 20 48 78 138 151 198
367 59 0
invitel.hu 949 36 19 38 59 88 163 165
346 35 0
mailbox.hu 724 48 14 34 57 78 112 107
247 27 0
t-email.hu 645 35 8 30 36 68 107 91
206 40 24
windowslive.com 623 41 13 22 60 65 67 114
186 55 0
msn.com 612 35 17 20 38 70 77 73
256 26 0
index.hu 561 39 11 28 43 83 88 72
157 40 0
fibermail.hu 547 30 9 13 36 46 102 58
225 24 4
c2.hu 521 30 9 32 29 53 79 86
174 29 0
[...]

I think qmgr would be "fair", if the above table would contain the same
line as now for citromail, and a lot of zeroes in the 5 and older
columns for the other destinations (and of course lower numbers in the
first column as well, because mails could get out quickly).

For example delivery times after the messages could get into the active
queue are fast for the other destinations:
Mar 19 15:44:15 mail postfix/smtp[31804]: E55A981133: to=<@freemail.hu>,
relay=fmx.freemail.hu[195.228.245.2]:25, delay=7161,
delays=7160/0.01/0.19/1, dsn=2.0.0, status=sent (250 ok 1269009853 qp 89615)
Mar 19 15:47:17 mail postfix/smtp[33163]: E8F598BD97: to=<@gmail.com>,
relay=gmail-smtp-in.l.google.com[209.85.210.81]:25, delay=5222,
delays=5221/0.01/0.35/0.92, dsn=2.0.0, status=sent (250 2.0.0 OK
1269010037 13si2214707yxe.45)
Mar 19 15:47:12 mail postfix/smtp[33144]: E8FEA90176: to=<@hotmail.com>,
relay=mx1.hotmail.com[65.54.188.126]:25, delay=4103,
delays=4102/0/0.53/0.64, dsn=2.0.0, status=sent (250
<26885169.544531269005928867.JavaMail.noreply(a)be> Queued mail for delivery)

And this is one for citromail:
Mar 19 15:47:47 mail postfix/smtp[33147]: 28E47768F4:
to=<@citromail.hu>, relay=server03.citromail.hu[91.83.45.3]:25,
conn_use=76, delay=9538, delays=5062/4475/0/0.33, dsn=2.0.0, status=sent
(250 ok 1269010067 qp 29585)

From: Wietse Venema on
Attila Nagy:
> So:
> - is there any way to let other domains get into the active queue in a

No.

Just like ordinary programs read large files sequentially using a
limited amount of intermediate buffer space, the Postfix queue
manager "reads" a large queue sequentially using a limited amount
of buffer space called "active queue".

There is no mechanism to prioritizes which messages will enter the
active queue. If the active queue is congested by slow destinations,
then you have a few options:

- Find out what is slowing down the deliveries. If a receiving
site is smart, then it will rightfully rate-limit mail from
strangers that send lots of mail without prior arrangements.

- Use a transport map that routes mail to problem domains to a
"graveyard" MTA, so that it won't clog up the deliveries to
"fast" destinations. With a bit of scripting fu, you can kludge
up transport maps on the fly by looking at mailq output.

- Increase the size of the Postfix active queue, and make the
active queue large enough so that it will pick up enough good
destinations (besides bad ones) to keep mail flowing.

Wietse

From: Wietse Venema on
Attila Nagy:
> I've only written this, because I was sure that somebody would miss it.
> This destination is not slow because of slow delivery times on the
> already open connections, but because of connection timeouts (I can
> observe this on other, mostly silent systems, which send only few
> messages there) and artificial limits on the recipient side.
> I'm aware of this, and we are always trying to make that better, but
> what I would like to know is why does postfix behaves this way.
> This is a built-in DoS feature, which could be easily solved, or I miss
> something?

Perhaps you have a suggestion for how Postfix would decide which
of thousands of queue files contain a recipient in a slow or fast
domain. Remember, one message may have any number of recipients,
not just one, and all this needs to be accomplished while using a
finite amount of memory, and in a manner that allows fast recovery
from crash (i.e. no global database state with information about
every message and receipient).

Wietse