From: Victor Duchovni on
On Thu, Jul 08, 2010 at 01:37:08PM -0700, Florin Andrei wrote:

> On 07/06/2010 01:10 PM, Victor Duchovni wrote:
>>
>> So you have multiple exit points with non-uniform latency, but the more
>> severe congestion is downstream, so you want to load the exit points
>> uniformly. Yes, the solution is to disable the connection cache, and
>> set reasonably low connection and helo timeouts in the transport feeding
>> the two exit points, so that when one is down and non-responsive (no TCP
>> reset), you don't suffer excessive hand-off latency for 50% of deliveries.
>
> I did that.
>
> You know what? It's amazingly accurate actually. After tens of thousands of
> messages, the logs on the two exit points showed almost exactly the same
> amount of messages relayed - within 1.2% or so. That was a very nice result
> to contemplate.
>
> After disabling the connection cache for internal delivery, it looks like
> we took a 2x performance hit internally, which is exactly what I expected.
> But that's ok, the internal rate is orders of magnitude above the Yahoo
> rate anyway. From an external perspective, things are actually much better
> now.

That performance hit is why the connection cache works the way it works,
in most cases the sensible loading of multiple MX hosts is to ensure
concurrency fairness rather than message-count fairness, so the slow
servers do less work, resulting in better over-all throughput. It
is nice to see the default algorithm clearly working as intended
(speeding delivery by a factor of two or so).

Of course in your edge-case, the fact that per-sending IP limits are
enforced on your exit gateways, and this bottleneck is more severe makes
optimizing the first-hop throughput non-optimal, but that's an edge-case
that is easily optimized with non-default settings (short timeouts +
disabled connection cache).

--
Viktor.