From: Jose-Marcio Martins da Cruz on
Richard Mealing wrote:
> On Jan 7, 9:19 am, Richard Mealing <richard.meal...(a)gmail.com> wrote:
>> On Jan 6, 11:12 pm, Res <r...(a)ausics.net> wrote:
>>
>>
>>
>>
>>
>>> On Wed, 6 Jan 2010, Richard Mealing wrote:
>>>> Hi,
>>>> I have 3 servers that are not very busy, all running mailscanner and
>>>> sendmail. I don't know why but all the time the servers sendmail seems
>>>> to crash and not accept any mail, even though processes seem fine and
>>>> there are 150 - 200 sendmail processes running. Sometimes it just
>>>> starts accepting mail again, other times it just doesn't and I have to
>>>> killall -9 sendmail and restart the mta.
>>> Is your DNS OK?
>>> You are not using some defunct or high latency RBL ?
>>> Oh yes you are...
>>>> FEATURE(`dnsbl',`list.dsbl.org')dnl
>>> ^^^^^^^^^^^^^^
>>> PING... This list has been dead for some time! Remove it!
>>> --
>>> Res
>>> "What does Windows have that Linux doesn't?" - One hell of a lot of bugs!
>> Res,
>>
>> Thank-you. Well spotted. I wonder if that cures it..
>>
>> I will report back tomorrow.!
>
> Hi Res,
>
> As far as I can see that's made no difference unfortunately. I think
> Jose-Marcio is correct by thinking it could be the server load.
>
> What I see is - Deferred: Connection reset by myhost.com - from my
> relay server. I also have some monitoring software called nagios and I
> repeatedly get smtp critical 141 codes maybe twice a day from each of
> the 3 servers in the cluster.
>
> When I push the mail through it goes through fine, just now I tried
> and it didn't go through until the 3rd time I pushed some.
> I don't think we ever had this issue until I came along and started
> adding extra clamav signatures and things to improve the spam
> scanning.

How frequently do you update your clamav signatures ? I don't know how your filter interfaces with
clamav, but if I'm right, clamd stops virus checking while it's updating, in memory, its signature
database, after downloading it. This can take some big seconds. During this interval, clamd accept
connections but defer handling them. You can see how long it takes on clamd log file. Check if it
happens at the same time as the problems. Try to update less frequently clamav signatures to see if
the problem goes away. Or temporarily disable the additionnal signatures, mainly if the signature
file is huge (this will decrease the delay to reload signatures in memory).

Also, If I'm right, mailscanner manages it's own mailqueue - so it can generate a lot disk activity
- and wait times.

Try to use iostat to evaluate the disk activity. top can give you some idea about CPU usage : see
"wait" and "idle". Also, you can put timeouts values back to their default in the cluster side. It's
useful to tune then in the mail relay connected to internet, but less useful in your internal
servers as they talk only to local peers.

I'm only guessing... 8-)

JM

>
> But when it happens, I look into the processes on top and nothing is
> really doing much. I check the messages logs and there are some mx
> issues but nothing big. I was having dns high memory loads but I've
> fixed that now and it's still doing this.
>
> I guess if there are no issues with my mc file it's got to be load.
From: Res on
On Thu, 7 Jan 2010, Richard Mealing wrote:

> Hi Res,
>
> As far as I can see that's made no difference unfortunately. I think
> Jose-Marcio is correct by thinking it could be the server load.


Hrmm thats a concern given you said the servers are not really that busy,
we used to use sendmail up front protecting qmail based backend (but
since made it more efficient by changing to postfix since it has native
mysql support and sendmail doesnt and never will) the sendmail
boxes ran mailscanner and easily handled 800 concurrent connections each.


> I don't think we ever had this issue until I came along and started
> adding extra clamav signatures and things to improve the spam
> scanning.

How are you using clam ? Via the clamd method? Some methods with
using clamav and mailscanner (and amavisd and I suspect all others) are
resource pigs :( What is your MailScanner batch's set to?

Key values to watch I think are...

Max Unscanned Bytes Per Scan = 100m
Max Unsafe Bytes Per Scan = 30m
Max Unscanned Messages Per Scan = 100
Max Unsafe Messages Per Scan = 30

Max Normal Queue Size = 1000


Virus Scanners = clamd

ClamAVmodule Maximum Recursion Level = 50
ClamAVmodule Maximum Files = 5000
ClamAVmodule Maximum File Size = 22000000
ClamAVmodule Maximum Compression Ratio = 250


....and using clamd
Clamd Port = 3310
Clamd Socket = /var/run/clamd/clamd.sock
Clamd Lock File = # /var/lock/subsys/clamd
Clamd Use Threads = no



> But when it happens, I look into the processes on top and nothing is
> really doing much. I check the messages logs and there are some mx
> issues but nothing big. I was having dns high memory loads but I've
> fixed that now and it's still doing this.


I'll dig out my old sendmail.mc......


define(`confDEF_USER_ID',``8:13'')dnl
define(`confTRUSTED_USER', `smmsp')dnl
define(`confTRY_NULL_MX_LIST',true)dnl
define(`confDONT_PROBE_INTERFACES',true)dnl
define(`ALIAS_FILE', `/etc/mail/aliases,/etc/mail/ecartis.aliases')dnl
define(`STATUS_FILE', `/etc/mail/statistics')dnl
define(`confLOG_LEVEL',`9')dnl
define(`confMAX_MESSAGE_SIZE', `20480000')dnl
define(`confUSERDB_SPEC', `/etc/mail/userdb.db')dnl
define(`confPRIVACY_FLAGS', `goaway,restrictqrun,restrictmailq')dnl
define(`confCONNECTION_RATE_THROTTLE', `150')dnl
define(`confMAX_DAEMON_CHILDREN',`400')dnl
define(`confMAX_QUEUE_CHILDREN',`800')dnl
dnl define(`confQUEUE_SORT_ORDER', `none')dnl
define(`confBAD_RCPT_THROTTLE',`2')dnl
define(`confTO_CONNECT', `5m')dnl
define(`confTO_MAIL', `5m')dnl
define(`confTO_DATAINIT', `3m')dnl
define(`confTO_DATABLOCK', `3m')dnl
define(`confTO_DATAFINAL', `10m')dnl
define(`confTO_RCPT', `5m')dnl
define(`confTO_COMMAND', `5m')dnl
define(`confTO_IDENT', `0s')dnl
define(`confTO_QUEUEWARN', `6h')dnl
define(`confTO_QUEUERETURN', `7d')dnl
define(`confQUEUE_LA', `50')dnl
define(`confREFUSE_LA', `100')dnl
define(`confSEPARATE_PROC',`True')dnl
define(`confDOUBLE_BOUNCE_ADDRESS',`')dnl
FEATURE(`no_default_msa',`dnl')dnl
FEATURE(`mailertable',`hash -o /etc/mail/mailertable.db')dnl
FEATURE(redirect)dnl
FEATURE(always_add_domain)dnl
FEATURE(use_cw_file)dnl
FEATURE(use_ct_file)dnl
FEATURE(local_procmail,`',`procmail -t -Y -a $h -d $u')dnl
FEATURE(`access_db',`hash -T<TMPF> -o /etc/mail/access.db')dnl
FEATURE(`blacklist_recipients')dnl
FEATURE(`greet_pause',`4000')dnl
FEATURE(`delay_checks')dnl
FEATURE(`compat_check')dnl
FEATURE(`require_rdns')dnl
FEATURE(`badmx')dnl
FEATURE(`block_bad_helo')dnl
FEATURE(`smrsh',`/usr/sbin/smrsh')
EXPOSED_USER(`root')dnl
DAEMON_OPTIONS(`Port=25, Name=MTA')dnl

MAILER(`local')dnl
MAILER(`smtp')dnl
MAILER(procmail)dnl

INPUT_MAIL_FILTER(`milter-regex',`S=unix:/var/run/milter-regex.sock,
T=S:30s;R:2m')dnl
INPUT_MAIL_FILTER(`smf-spf', `S=unix:/var/run/smfs/smf-spf.sock,
T=S:30s;R:1m')dnl


--
Res

"What does Windows have that Linux doesn't?" - One hell of a lot of bugs!