From: sunckell on
On Mar 13, 9:17 am, Andrzej Adam Filip <a...(a)onet.eu> wrote:
> sunckell<sunck...(a)gmail.com> wrote:
> > Got a Strange one here.
>
> >     We are running Sendmail 8.13.8+Sun/8.13.8 on a fully patched
> > Solaris 10 server.  Some business users wanted notifications if
> > sendmail were to stop responding.
>
> > So we added a Keep-alive to our
> > CSS, all it does is connects to the port issues a helo and looks for a
> > 250 in return.  This runs every 5 secs.  If it can't connect to the
> > port or does not receive a 250 response, it marks it down and
> > notifies.
>
> How long does you wait for the greeting?
> In some *super nasty* cases involving "out of date aliases" it could
> take up to 10 minutes to get the greeting. Making test script wait
> *at least* 2 minutes is a sensible minimum.
> Do you record "time to greeting"?
>
> >     We added this last Friday, and since then we are seeing several
> > "failures".  The strange part is, the service is still running, there
> > is nothing odd in the logs, except you don't see the CSS connects
> > during the time period the CSS sees the port as down (or any other
> > traffic for that matter.)
>
> > We have seen time gaps in the logs up to 40 secs.
> > We have snooped the traffic and everything appears to be correct,
> > (from an ACK/SYN/FIN) standpoint.
>
> Have you ruled out short DNS resolution problems?
>   40s gaps may be caused by it.
> Have you tried to correlate gaps with updates of aliases?
>
> > [...]
> > I am hoping someone else out there has experience in troubleshooting
> > this sort of issue and can give me some suggestions.
>
> It may be a variation on Sendmail-FAQ-3.12 giving a "a from time to
> time" delays:http://www.sendmail.org/faq/section3.html#3.12
>
> --
> [pl>en: Andrew] Andrzej Adam Filip : a...(a)priv.onet.pl : a...(a)xl.wp.pl
> Open-Sendmail:http://open-sendmail.sourceforge.net/
> "We have the right to survive!"
> "Not by killing others."
>   -- Deela and Kirk, "Wink of An Eye", stardate 5710.5



Just to followup with the issue. We finally found out what was
causing the issues.

nscd (name server cacheing daemon) was running on that server. As
soon as we disabled it and turned it off, no more problems.....

Maybe it will help someone in the future.

sunckell