From: Kal-El on
We are running a Solaris 2.8 NIS server on an Ultra 10. We have several
clients that talk to it. At various times throughout the day, we'll
notice the _server_ (lets call it sun1 here and our nis domain sun.yp)
reporting, "NIS server not responding for domain "sun.yp".

Maybe 2 or three times a day, we'll have a 15 or so minute window when
we'll get these errors... again, _from the NIS server_ on the NIS
server. So, when that happens, and if that happens for a long enough
period, the clients start reporting that as well. But, it usually
starts with the NIS server first reporting that it can't see itself.

Usually, the error will occur, and then, sometimes within the same
second, it starts to work again, like:

May 10 12:22:48 sun1 ypbind[12604]: [ID 337329 daemon.error] NIS server
not responding for domain "sun.yp"; still trying
May 10 12:22:58 sun1 ypbind[12659]: [ID 337329 daemon.error] NIS server
not responding for domain "sun.yp"; still trying
May 10 12:23:02sun1 ypbind[12727]: [ID 647655 daemon.error] NIS server
for domain "sun.yp" OK
May 10 12:40:42sun1 ypbind[19420]: [ID 337329 daemon.error] NIS server
not responding for domain "sun.yp"; still trying
May 10 12:40:42 sun1 ypbind[19475]: [ID 647655 daemon.error] NIS server
for domain "sun.yp" OK

So, for example, this happened today at 4:10am, 9:20am, 10:20am, and
(as seen above) 12:22am, 12:23am, and 12:40am.

The problem usually goes away for a few hours then comes back and fixes
itself. However, as I mentioned before, if the NIS server loses
connection to its _own_ NIS service for a few seconds or more, the
clients can start to notice too, and then hang.

Anyone know what's going on here? How can an NIS server lose connection
to itself?

Note that the NIS server also run sendmail, bind, spamassassin, and is
an imap and pop server. This server has been in place for years, but we
haven't really noticed these problems until a few months ago.

I've tried to find something in the logs that is consistantly there at
the time these problems occur, but haven't found a pattern. For
example, today, I see a lot of sendmail errors that happen to be
occuring at the same time:

sendmail[540]: [ID 801593 mail.crit] NOQUEUE: SYSERR(root):
getrequests: accept: Software caused connection abort

But, it's not always the case that the NIS server loses connection to
itself at the same times we get errors like the sendmail one above.

Any ideas? Any bugs? Needed patches?

Thanks!

Kal

From: Chris Cox on
Kal-El wrote:
> We are running a Solaris 2.8 NIS server on an Ultra 10. We have several
> clients that talk to it. At various times throughout the day, we'll
> notice the _server_ (lets call it sun1 here and our nis domain sun.yp)
> reporting, "NIS server not responding for domain "sun.yp".
>
> Maybe 2 or three times a day, we'll have a 15 or so minute window when
> we'll get these errors... again, _from the NIS server_ on the NIS
> server. So, when that happens, and if that happens for a long enough
> period, the clients start reporting that as well. But, it usually
> starts with the NIS server first reporting that it can't see itself.
>
> Usually, the error will occur, and then, sometimes within the same
> second, it starts to work again, like:
>
> May 10 12:22:48 sun1 ypbind[12604]: [ID 337329 daemon.error] NIS server
> not responding for domain "sun.yp"; still trying
> May 10 12:22:58 sun1 ypbind[12659]: [ID 337329 daemon.error] NIS server
> not responding for domain "sun.yp"; still trying
> May 10 12:23:02sun1 ypbind[12727]: [ID 647655 daemon.error] NIS server
> for domain "sun.yp" OK
> May 10 12:40:42sun1 ypbind[19420]: [ID 337329 daemon.error] NIS server
> not responding for domain "sun.yp"; still trying
> May 10 12:40:42 sun1 ypbind[19475]: [ID 647655 daemon.error] NIS server
> for domain "sun.yp" OK
>
....
>
> sendmail[540]: [ID 801593 mail.crit] NOQUEUE: SYSERR(root):
> getrequests: accept: Software caused connection abort
>
....
>
> Any ideas? Any bugs? Needed patches?

Network problem?

Is the NIS accessing itself via localhost?? How is it
reaching itself. Could still be network related.

Could be something else (hardware, etc.. as you have said).

Anything weird running via cron?

.... a mystery...

From: Kal-El on
Nothing weird in cron... Cron has been the same for years, and this is
something that started kicking in when we started loading more onto the
machine (like spamassassin, and imapping of large inboxes --100megs or
more). I'm really thinking there's just too much running on the
system.

Not a network problem as far as I can see.... even moved the server it
one of its clients to a single switch.

Not sure what you're asking when you say, "Is the NIS accessing itself
via localhost??". In this specific case, the server and the client
are the same. The client accesses the NIS service on the machine by
the hostname of the computer.

Again, things work great most of the time, but then we have these
periods where we get these "cannot connect" errors to a service running
on the same machine as the client.

Steve

From: Chris Cox on
Kal-El wrote:
> Nothing weird in cron... Cron has been the same for years, and this is
> something that started kicking in when we started loading more onto the
> machine (like spamassassin, and imapping of large inboxes --100megs or
> more). I'm really thinking there's just too much running on the
> system.
>
> Not a network problem as far as I can see.... even moved the server it
> one of its clients to a single switch.
>
> Not sure what you're asking when you say, "Is the NIS accessing itself
> via localhost??". In this specific case, the server and the client
> are the same. The client accesses the NIS service on the machine by
> the hostname of the computer.

Does a ypwhich yield localhost or the machine by name? It "might"
make a difference.

>
> Again, things work great most of the time, but then we have these
> periods where we get these "cannot connect" errors to a service running
> on the same machine as the client.

Still a mystery to me...
From: gorman on
Hi,

Two questions:
Do you have any NIS slaves which is unreachable ?
At which line in /etc/host is the NIS-master's entry ?

We had a similar problem a while ago which was a combination of these
two.

 | 
Pages: 1
Prev: ssh logging
Next: ip filter solaris...