From: John Marshall on
Context: Sendmail 8.14.4 on FreeBSD 8.0

During recent months we've had a few instances (on different machines)
of a queue runner stopping. Mail continues to queue in the affected
queue and never gets processed: the responsible queue runner stays in
an IDLE state and never wakes up. The queue can be flushed manually.
Re-starting sendmail "fixes" the problem.

I don't know how to troubleshoot this. Does anyone have any
suggestions?

I noticed this on one of our relays this morning. This is what the
sendmail processes look like. The one in the "Idle" state
(responsible for the "rw2" queue group) is the one which has stopped
working.

rwsrv04# ps `ps -x | awk '(/sendmail/) {print $1}'`
PID TT STAT TIME COMMAND
64281 ?? Ss 0:06.23 sendmail: accepting connections (sendmail)
64282 ?? S 0:28.39 sendmail: running queue: /var/spool/mqueue/qd1/df (sendmail)
64283 ?? S 0:17.84 sendmail: running queue: /var/spool/mqueue/mby/df (sendmail)
64284 ?? S 0:17.21 sendmail: running queue: /var/spool/mqueue/oz/df (sendmail)
64285 ?? I 0:03.05 sendmail: running queue: /var/spool/mqueue/rw2/df (sendmail)
64286 ?? S 0:17.53 sendmail: running queue: /var/spool/mqueue/hold/df (sendmail)

rwsrv04# grep -i queue /etc/mail/sendmail.cf | grep -v ^#
O MaxQueueChildren=20
O MaxRunnersPerQueue=5
O MinQueueAge=15
O QueueDirectory=/var/spool/mqueue
O Timeout.queuereturn=5d
O Timeout.queuewarn=4h
Qmqueue, P=/var/spool/mqueue/qd*, R=5, r=10, F=f
Qhold, P=/var/spool/mqueue/hold, R=0
Qmby, P=/var/spool/mqueue/mby, R=1
Qoz, P=/var/spool/mqueue/oz, R=1
Qrw2, P=/var/spool/mqueue/rw2, R=1
Squeuegroup

--
John Marshall
From: Claus Aßmann on
John Marshall wrote:

> During recent months we've had a few instances (on different machines)
> of a queue runner stopping. Mail continues to queue in the affected

> I don't know how to troubleshoot this. Does anyone have any
> suggestions?

Attach a debugger to the stuck process and see where it hangs.
Post the results here. Thanks.
From: John Marshall on
On 2 Feb 2010 01:52:21 GMT, Claus A?mann wrote:
> Attach a debugger to the stuck process and see where it hangs.
> Post the results here. Thanks.

I tried figuring out by myself how to do that. I got it wrong (invoked
gdb without pointing it at sendmail) and it fell down in a screaming
heap when I attached to the stuck queue runner process. When I quit gdb
I lost the stuck process :-(

If it happens again, I'll know what to do. What would you like to see
after the attach, just 'bt'?

Sorry for the missed opportunity.

--
John Marshall
From: David F. Skoll on
John Marshall wrote:

> I tried figuring out by myself how to do that. I got it wrong (invoked
> gdb without pointing it at sendmail) and it fell down in a screaming
> heap when I attached to the stuck queue runner process. When I quit gdb
> I lost the stuck process :-(

gdb is a pretty heavy tool... what does "strace" say? (or truss or ktrace
or whatever the system-call tracer is on your OS.)

Regards,

David.
From: hume.spamfilter on
David F. Skoll <dfs(a)roaringpenguin.com> wrote:
> gdb is a pretty heavy tool... what does "strace" say? (or truss or ktrace
> or whatever the system-call tracer is on your OS.)

Also, I believe "pstack" is available out of sysutils in the ports tree.
It might make getting a stack trace much easier.

# pstack <pid>

--
Brandon Hume - hume -> BOFH.Ca, http://WWW.BOFH.Ca/