From: Florin Andrei on
On 06/14/2010 11:54 AM, Florin Andrei wrote:
>
> Well, that does it. I got RPM packages with 2.7 from two different
> sources. Time for testing, then upgrade, and I'll keep y'all posted with
> the results.

And here it is, the status update.

I got the 2.7.0 src.rpm packages made by Simon J Mudd
http://ftp.wl0.org/official/ and rebuilt them on my devel instance.
Installed them on the gateways and migrated the old custom settings.

Just this upgrade, alone, from 2.3.3 (the default version that comes
with Red Hat 5) to 2.7.0 increased our delivery success to Yahoo by, I
guess, 50%. Now a lot more messages get delivered before we have to
purge the queue because it's too late. Still not perfect, but better.

Related and relevant non-default settings:

master.cf:
fragile unix - - n - 5 smtp
-o smtp_helo_timeout=5 -o smtp_connect_timeout=5

main.cf:
transport_maps = hash:/etc/postfix/transport
fragile_destination_concurrency_limit = 2
fragile_destination_concurrency_failed_cohort_limit = 1
fragile_destination_rate_delay = 2s

transport:
yahoo.com fragile:
# bunch of other Yahoo domains here
hotmail.com fragile:
comcast.net fragile:

Before the upgrade, delivery to Yahoo would get interrupted for two
reasons: Yahoo getting annoyed by our delivery rate and stopped
accepting messages, and users clicking the Spam button (unjustified,
IMO, but that's another story).
After the upgrade, it appears that the only reason why Yahoo stops
accepting messages (it still does, even now) is users clicking Spam.

Things I'm planning to try from now on:

1. Use slightly more aggressive *_destination_* settings, as indicated
by Mike Hutchinson earlier in this thread.

2. Separate the various finicky destinations into their own pathways in
master.cf, instead of lumping them together under "fragile". Perhaps
even dump some of them back into general delivery, since only Yahoo
seems to really cause trouble.

3. Use two outbound email gateways instead of one. This might double the
delivery rate "for free".

4. Upgrade to 2.7.1, try to customize other parameters, etc.

Suggestions are welcome.

I'll keep you posted if I find anything new and notable.

--
Florin Andrei
http://florin.myip.org/

From: Victor Duchovni on
On Fri, Jun 18, 2010 at 02:05:36PM -0700, Florin Andrei wrote:

> main.cf:
> transport_maps = hash:/etc/postfix/transport
> fragile_destination_concurrency_limit = 2
> fragile_destination_concurrency_failed_cohort_limit = 1
> fragile_destination_rate_delay = 2s

Try:

# Change from 1 above
fragile_destination_concurrency_failed_cohort_limit = 5
# New, more stable feedback controls from 2.5
fragile_destination_concurrency_positive_feedback = 1/3
fragile_destination_concurrency_negative_feedback = 1/8

I think that will significantly reduce the rate at which delivery is
unnecessarily throttled.

--
Viktor.

From: Stefan Foerster on
* Florin Andrei <florin(a)andrei.myip.org>:
> Looking at the Postfix queue graphs in Munin, one thing I noticed is
> that when the scheduled emails go out (it's not a continuous
> trickle, it's in batches, that's just how the software works), a
> fraction, maybe 25%, go into the active queue right away, the rest
> seem to be dropped into deferred either immediately or very quickly.
> Then they stay in deferred a long time. Then they move to active for
> a short while, being delivered at a slow rate, then they fall back
> into deferred.

I might be wrong here, but what you describe - "the rest seem to be
dropped into deferred either immediately or very quickly" - sounds a
little like the "use case" described in

http://www.postfix.org/QSHAPE_README.html#backlog

The deferrals on Yahoo's part might cause qmgr(8) to mark that
destinations as "dead" (so we can't know that for sure, since you
didn't post the rejection logs, and the
_destination_concurrency_failed_cohort_limit only applies to
connection and handshake failures).

I don't send any large volumes to Yahoo, but I had to use a dedicated
transport which ignored much more errors for a popular German freemail
provider. Since you are using rate delays, your concurrency limit will
basically be one, and this might very well be related to what you see.

I don't know if you need to reload postfix and/or requeue the messages
with "postsuper -r" after changing
transport_destination_concurrency_failed_cohort_limit.


Stefan

From: Wietse Venema on
Stefan Foerster:
> I don't send any large volumes to Yahoo, but I had to use a dedicated
> transport which ignored much more errors for a popular German freemail
> provider. Since you are using rate delays, your concurrency limit will
> basically be one, and this might very well be related to what you see.

This is a good point.

To compensate for this unwanted side effect of reduced concurrency
INCREASE the fragile_destination_concurrency_failed_cohort_limit
to 10-20 or so (or REDUCE fragile_destination_concurrency_negative_feedback
to 1/10 or 1/20).

> I don't know if you need to reload postfix and/or requeue the messages
> with "postsuper -r" after changing
> transport_destination_concurrency_failed_cohort_limit.

No. Use "postfix reload" to update the queue manager.

Wietse

From: Wietse Venema on
Wietse Venema:
> Stefan Foerster:
> > I don't send any large volumes to Yahoo, but I had to use a dedicated
> > transport which ignored much more errors for a popular German freemail
> > provider. Since you are using rate delays, your concurrency limit will
> > basically be one, and this might very well be related to what you see.
>
> This is a good point.
>
> To compensate for this unwanted side effect of reduced concurrency
> INCREASE the fragile_destination_concurrency_failed_cohort_limit
> to 10-20 or so (or REDUCE fragile_destination_concurrency_negative_feedback
> to 1/10 or 1/20).

I just did a few experiments to confirm this.

With the default fragile_destination_concurrency_failed_cohort_limit,
the scheduler defers all mail after one connection/handshake failure.

With fragile_destination_concurrency_failed_cohort_limit > 1 the
scheduler defers all mail after multiple connection/handshake
failures in a row, which may be more desirable in the Yahoo scenario.

For now, I'll add a note to the documentation. A clever software
solution is not obvious - when a home office user needs to limit
their output rate, it may not be a good idea to keep sending mail
after the ISP starts pushing back.

> > I don't know if you need to reload postfix and/or requeue the messages
> > with "postsuper -r" after changing
> > transport_destination_concurrency_failed_cohort_limit.
>
> No. Use "postfix reload" to update the queue manager.
>
> Wietse
>
>