From: Richard Mealing on
Hi everyone,

I have a relay server that passes mail onto a cluster of filter
servers and I'm seeing lots of deferred, connection reset by [filter
server] on my relay server.

The filter servers (running mailscanner) are not very busy and I have
been playing with timeouts for ages now trying to get the numbers of
deferred connections down. I just wondered if anyone had any ideas?

Here are the timeouts for the relay server that initially receives the
mail.

define(`confTO_COMMAND', `5m')dnl
define(`confTO_IDENT',`0s')dnl
define(`confTO_ICONNECT', `20s')dnl
define(`confTO_CONNECT', `4m')dnl
define(`confTO_HELO', `2m')dnl
define(`confTO_MAIL', `4m')dnl
define(`confTO_RCPT', `4m')dnl
define(`confTO_DATAINIT', `3m')dnl
define(`confTO_DATABLOCK', `3m')dnl
define(`confTO_DATAFINAL', `10m')dnl
define(`confTO_RSET', `1m')dnl
define(`confTO_QUIT', `1m')dnl
define(`confTO_MISC', `1m')dnl

There doesn't seem to be any issues connecting to this server.

Here's the timeouts that I have on my filter servers (one of them) -

define(`confTO_COMMAND',`3m')dnl
define(`confTO_ICONNECT', `15s')dnl
define(`confTO_CONNECT', `4m')dnl
define(`confTO_HELO', `2m')dnl
define(`confTO_MAIL', `4m')dnl
define(`confTO_RCPT', `4m')dnl
define(`confTO_DATAINIT', `3m')dnl
define(`confTO_DATABLOCK', `10m')dnl
define(`confTO_DATAFINAL', `10m')dnl
define(`confTO_RSET', `1m')dnl
define(`confTO_QUIT', `1m')dnl
define(`confTO_MISC', `1m')dnl


Both servers seem to have around 150 - 250+ sendmail processes running
at any one time. Do you think I just need to add more servers or play
with the timeouts more?
Sometimes when you send an email directly to the filter cluster it
takes ages for it to go through, Plus I'm getting issues on our
monitoring software saying it can't connect on port 25, probably twice
a day on each server in that cluster.

Any comments on my timeouts?
From: Richard Mealing on
On Dec 23, 2:12 pm, Richard Mealing <richard.meal...(a)gmail.com> wrote:
> Hi everyone,
>
> I have a relay server that passes mail onto a cluster of filter
> servers and I'm seeing lots of deferred, connection reset by [filter
> server] on my relay server.
>
> The filter servers (running mailscanner) are not very busy and I have
> been playing with timeouts for ages now trying to get the numbers of
> deferred connections down. I just wondered if anyone had any ideas?
>
> Here are the timeouts for the relay server that initially receives the
> mail.
>
> define(`confTO_COMMAND', `5m')dnl
> define(`confTO_IDENT',`0s')dnl
> define(`confTO_ICONNECT', `20s')dnl
> define(`confTO_CONNECT', `4m')dnl
> define(`confTO_HELO', `2m')dnl
> define(`confTO_MAIL', `4m')dnl
> define(`confTO_RCPT', `4m')dnl
> define(`confTO_DATAINIT', `3m')dnl
> define(`confTO_DATABLOCK', `3m')dnl
> define(`confTO_DATAFINAL', `10m')dnl
> define(`confTO_RSET', `1m')dnl
> define(`confTO_QUIT', `1m')dnl
> define(`confTO_MISC', `1m')dnl
>
> There doesn't seem to be any issues connecting to this server.
>
> Here's the timeouts that I have on my filter servers (one of them) -
>
> define(`confTO_COMMAND',`3m')dnl
> define(`confTO_ICONNECT', `15s')dnl
> define(`confTO_CONNECT', `4m')dnl
> define(`confTO_HELO', `2m')dnl
> define(`confTO_MAIL', `4m')dnl
> define(`confTO_RCPT', `4m')dnl
> define(`confTO_DATAINIT', `3m')dnl
> define(`confTO_DATABLOCK', `10m')dnl
> define(`confTO_DATAFINAL', `10m')dnl
> define(`confTO_RSET', `1m')dnl
> define(`confTO_QUIT', `1m')dnl
> define(`confTO_MISC', `1m')dnl
>
> Both servers seem to have around 150 - 250+ sendmail processes running
> at any one time. Do you think I just need to add more servers or play
> with the timeouts more?
> Sometimes when you send an email directly to the filter cluster it
> takes ages for it to go through, Plus I'm getting issues on our
> monitoring software saying it can't connect on port 25, probably twice
> a day on each server in that cluster.
>
>  Any comments on my timeouts?

Sorry, I thought I would mention I am using barracuda and spamcop for
spam dnsbl. I'm doing queuewarn and stuff too. It's just the timeouts
I need help with if you kind people can help.!

Oh, and Merry Christmas sendmail people.!
From: Andrzej Adam Filip on
Richard Mealing <richard.mealing(a)gmail.com> wrote:
> On Dec 23, 2:12 pm, Richard Mealing <richard.meal...(a)gmail.com> wrote:
>> Hi everyone,
>>
>> I have a relay server that passes mail onto a cluster of filter
>> servers and I'm seeing lots of deferred, connection reset by [filter
>> server] on my relay server.
>>
>> The filter servers (running mailscanner) are not very busy and I have
>> been playing with timeouts for ages now trying to get the numbers of
>> deferred connections down. I just wondered if anyone had any ideas?
>>
>> Here are the timeouts for the relay server that initially receives the
>> mail.
>>
>> define(`confTO_COMMAND', `5m')dnl
>> define(`confTO_IDENT',`0s')dnl
>> define(`confTO_ICONNECT', `20s')dnl
>> define(`confTO_CONNECT', `4m')dnl
>> define(`confTO_HELO', `2m')dnl
>> define(`confTO_MAIL', `4m')dnl
>> define(`confTO_RCPT', `4m')dnl
>> define(`confTO_DATAINIT', `3m')dnl
>> define(`confTO_DATABLOCK', `3m')dnl
>> define(`confTO_DATAFINAL', `10m')dnl
>> define(`confTO_RSET', `1m')dnl
>> define(`confTO_QUIT', `1m')dnl
>> define(`confTO_MISC', `1m')dnl
>>
>> There doesn't seem to be any issues connecting to this server.
>>
>> Here's the timeouts that I have on my filter servers (one of them) -
>>
>> define(`confTO_COMMAND',`3m')dnl
>> define(`confTO_ICONNECT', `15s')dnl
>> define(`confTO_CONNECT', `4m')dnl
>> define(`confTO_HELO', `2m')dnl
>> define(`confTO_MAIL', `4m')dnl
>> define(`confTO_RCPT', `4m')dnl
>> define(`confTO_DATAINIT', `3m')dnl
>> define(`confTO_DATABLOCK', `10m')dnl
>> define(`confTO_DATAFINAL', `10m')dnl
>> define(`confTO_RSET', `1m')dnl
>> define(`confTO_QUIT', `1m')dnl
>> define(`confTO_MISC', `1m')dnl
>>
>> Both servers seem to have around 150 - 250+ sendmail processes running
>> at any one time. Do you think I just need to add more servers or play
>> with the timeouts more?
>> Sometimes when you send an email directly to the filter cluster it
>> takes ages for it to go through, Plus I'm getting issues on our
>> monitoring software saying it can't connect on port 25, probably twice
>> a day on each server in that cluster.
>>
>>  Any comments on my timeouts?
>
> Sorry, I thought I would mention I am using barracuda and spamcop for
> spam dnsbl. I'm doing queuewarn and stuff too. It's just the timeouts
> I need help with if you kind people can help.!

Have you considered limiting number or concurrent SMTP (TCP) connections
to scanners instead of playing with timeouts?

e.g.
queue all messages to scanners (e.g. using "expensive" mailer)
and
use more frequent queue runs (with MinQueueAge) or persistent queue runners

--
[pl>en Andrew] Andrzej Adam Filip : anfi(a)onet.eu : Andrzej.Filip(a)gmail.com
Open-Sendmail: http://open-sendmail.sourceforge.net/
I know it's weird, but it does make it easier to write poetry in perl. :-)
-- Larry Wall in <7865(a)jpl-devvax.JPL.NASA.GOV>
[ http://groups.google.com/groups?selm=8ege7n6ry6-9CN(a)cynthia.brudna.chmurka.net ]
From: Jose-Marcio Martins da Cruz on

Richard Mealing wrote:
> Hi everyone,
>
> I have a relay server that passes mail onto a cluster of filter
> servers and I'm seeing lots of deferred, connection reset by [filter
> server] on my relay server.
>
> The filter servers (running mailscanner) are not very busy and I have
> been playing with timeouts for ages now trying to get the numbers of
> deferred connections down. I just wondered if anyone had any ideas?
>
> Here are the timeouts for the relay server that initially receives the
> mail.

> define(`confTO_ICONNECT', `20s')dnl

>
> There doesn't seem to be any issues connecting to this server.
>
> Here's the timeouts that I have on my filter servers (one of them) -
>
> define(`confTO_COMMAND',`3m')dnl
> define(`confTO_ICONNECT', `15s')dnl

....

>
>
> Both servers seem to have around 150 - 250+ sendmail processes running
> at any one time. Do you think I just need to add more servers or play
> with the timeouts more?
> Sometimes when you send an email directly to the filter cluster it
> takes ages for it to go through, Plus I'm getting issues on our
> monitoring software saying it can't connect on port 25, probably twice
> a day on each server in that cluster.
>
> Any comments on my timeouts?

I suppose you were inspired by the old edition of bat book chapter on performance tuning.

Ther's a bug there. First of all, change the confT0_ICONNECT to it's default value, or simply remove
that line.


From: Richard Mealing on
On Dec 28 2009, 12:33 pm, Jose-Marcio Martins da Cruz <Jose-
Marcio.Mart...(a)ensmp.fr> wrote:
> Richard Mealing wrote:
> > Hi everyone,
>
> > I have a relay server that passes mail onto a cluster of filter
> > servers and I'm seeing lots of deferred, connection reset by [filter
> > server] on my relay server.
>
> > The filter servers (running mailscanner) are not very busy and I have
> > been playing with timeouts for ages now trying to get the numbers of
> > deferred connections down. I just wondered if anyone had any ideas?
>
> > Here are the timeouts for the relay server that initially receives the
> > mail.
> > define(`confTO_ICONNECT', `20s')dnl
>
> > There doesn't seem to be any issues connecting to this server.
>
> > Here's the timeouts that I have on my filter servers (one of them) -
>
> > define(`confTO_COMMAND',`3m')dnl
> > define(`confTO_ICONNECT', `15s')dnl
>
> ...
>
>
>
> > Both servers seem to have around 150 - 250+ sendmail processes running
> > at any one time. Do you think I just need to add more servers or play
> > with the timeouts more?
> > Sometimes when you send an email directly to the filter cluster it
> > takes ages for it to go through, Plus I'm getting issues on our
> > monitoring software saying it can't connect on port 25, probably twice
> > a day on each server in that cluster.
>
> >  Any comments on my timeouts?
>
> I suppose you were inspired by the old edition of bat book chapter on performance tuning.
>
> Ther's a bug there. First of all, change the confT0_ICONNECT to it's default value, or simply remove
> that line.

Really? It's undefined on the sendmail timeout pages, I was told to
put this quite low to weed out slow hosts.

Why do you say there is a bug and where is this documented?