From: D.M. Procida on
I have a server that periodically stops responding for several minutes;
a few minutes later it come back.

It's one of three identical virtual servers on the same machine
(identical in the sense that they were cloned from each other).

When I say it stops responding - I mean to ssh, web requests and other
remote connections. Existing connections are lost.

The other two virtual servers remain fine.

It seems to me it might be really hanging, or alternatively just not
receiving requests because of something to do with virtualisation
(perhaps the virtual network).

How can I best work out what it's doing?

Daniele
From: unruh on
On 2010-04-24, D.M. Procida <real-not-anti-spam-address(a)apple-juice.co.uk> wrote:
> I have a server that periodically stops responding for several minutes;
> a few minutes later it come back.
>
> It's one of three identical virtual servers on the same machine
> (identical in the sense that they were cloned from each other).
>
> When I say it stops responding - I mean to ssh, web requests and other
> remote connections. Existing connections are lost.
>
> The other two virtual servers remain fine.
>
> It seems to me it might be really hanging, or alternatively just not
> receiving requests because of something to do with virtualisation
> (perhaps the virtual network).
>
> How can I best work out what it's doing?

Examine the logs-- start with /var/log/messages.
Are the existing connections active during that time-- ie if you have
sshed in, can you continue typing into that terminal and it returns the
characters, and runs the commands? If so it is not a network problem.
Why in the world do you have three identical virtual servers on that
same machine.

>
> Daniele
From: D.M. Procida on
unruh <unruh(a)wormhole.physics.ubc.ca> wrote:

> On 2010-04-24, D.M. Procida <real-not-anti-spam-address(a)apple-juice.co.uk>
wrote:
> > I have a server that periodically stops responding for several minutes;
> > a few minutes later it come back.
> >
> > It's one of three identical virtual servers on the same machine
> > (identical in the sense that they were cloned from each other).
> >
> > When I say it stops responding - I mean to ssh, web requests and other
> > remote connections. Existing connections are lost.
> >
> > The other two virtual servers remain fine.
> >
> > It seems to me it might be really hanging, or alternatively just not
> > receiving requests because of something to do with virtualisation
> > (perhaps the virtual network).
> >
> > How can I best work out what it's doing?
>
> Examine the logs-- start with /var/log/messages.

There's nothing in messages - the most recent message is from several
hours ago (rsyslogd was HUPed).

> Are the existing connections active during that time-- ie if you have
> sshed in, can you continue typing into that terminal and it returns the
> characters, and runs the commands? If so it is not a network problem.

As I said, existing connections are lost.

Though having said that, on this occasion it has just come back after
several minutes and the ssh terminal session I had open has come back
too, whereas usually the sessions are lost completely.

> Why in the world do you have three identical virtual servers on that
> same machine.

One is the real live web server, one is the development server, and one
is regularly re-cloned from the live server so that any new stuff that
seems ready for deployment can be tested against it.

Daniele
From: Chris Davies on
D.M. Procida <real-not-anti-spam-address(a)apple-juice.co.uk> wrote:
> I have a server that periodically stops responding for several minutes;
> a few minutes later it come back.

> It's one of three identical virtual servers on the same machine
> (identical in the sense that they were cloned from each other).

What virtualisation technology? VMs on our ESX server (VMware) hang
for a good minute while they're being backed up - long enough to lose
ssh connections. Our VMware Sysadmin got round this by moving the backup
slot to 5am (12 hours earlier than previously).


> It seems to me it might be really hanging, or alternatively just not
> receiving requests because of something to do with virtualisation
> (perhaps the virtual network).

How often is this "periodically"? (Daily, hourly, more often?) If
it's really quite frequent, check that the VMs' ethernet MAC addresses
are unique. Particularly as you say they were cloned from each other.

Chris
From: D.M. Procida on
Chris Davies <chris-usenet(a)roaima.co.uk> wrote:

> D.M. Procida <real-not-anti-spam-address(a)apple-juice.co.uk> wrote:
> > I have a server that periodically stops responding for several minutes;
> > a few minutes later it come back.
>
> > It's one of three identical virtual servers on the same machine
> > (identical in the sense that they were cloned from each other).
>
> What virtualisation technology? VMs on our ESX server (VMware) hang
> for a good minute while they're being backed up - long enough to lose
> ssh connections. Our VMware Sysadmin got round this by moving the backup
> slot to 5am (12 hours earlier than previously).

It is indeed ESX. But, it's not backups that are causing it, unless
someone changed something yesterday without telling me.

> > It seems to me it might be really hanging, or alternatively just not
> > receiving requests because of something to do with virtualisation
> > (perhaps the virtual network).
>
> How often is this "periodically"? (Daily, hourly, more often?) If
> it's really quite frequent, check that the VMs' ethernet MAC addresses
> are unique. Particularly as you say they were cloned from each other.

It seems to happen every 30 minutes or so.

It first happened yesterday, out of the blue; everything has been
working happily for months.

I have a little Python script running now:

for x in range(1000000):
print " "[:int(str(x)[-1])], datetime.now()
time.sleep(1)

and I can see that it doesn't miss a beat, even when things stop working
(when they start again, it catches up). SO that shows that the machine
itself is still ticking away, the problem is connecting to it.

Daniele
 |  Next  |  Last
Pages: 1 2 3 4
Prev: Lost sound after kernel upgrade
Next: do_IRQ: 0.83 error