From: Pádraig Brady on
On 23/02/10 15:40, Simon Kagstrom wrote:
> Currently, the watchdog is turned off when the system shuts down or the
> module is unloaded. If nowayout has been selected, this makes no sense
> and fails to restart the system if it hangs during reboot, so make it
> conditional.
>
> Signed-off-by: Simon Kagstrom<simon.kagstrom(a)netinsight.net>
> ---
> We have a system which has such a hang, and therefore want the watchdog
> to be on until the bitter end.
>
> drivers/watchdog/iTCO_wdt.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
> index 4bdb7f1..927df26 100644
> --- a/drivers/watchdog/iTCO_wdt.c
> +++ b/drivers/watchdog/iTCO_wdt.c
> @@ -839,7 +839,8 @@ static int __devexit iTCO_wdt_remove(struct platform_device *dev)
>
> static void iTCO_wdt_shutdown(struct platform_device *dev)
> {
> - iTCO_wdt_stop();
> + if (!nowayout)
> + iTCO_wdt_stop();
> }
>
> #define iTCO_wdt_suspend NULL

I see the issue, however what happens if you're
rebooting into a system that doesn't then renable the watchdog.
I've seen systems where the hardware watchdog is not reset
during the reboot process, in which case you'll get a
reboot while running the other system.

If you had a readonly system, then perhaps you
can just WDIOC_SETTIMEOUT the hardware watchdog timeout to 1s
and wait for it to reboot the system?

cheers,
P�draig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Simon Kagstrom on
On Tue, 23 Feb 2010 16:24:52 +0000
> > + if (!nowayout)
> > + iTCO_wdt_stop();
> > }
> >
> > #define iTCO_wdt_suspend NULL
>
> I see the issue, however what happens if you're
> rebooting into a system that doesn't then renable the watchdog.
> I've seen systems where the hardware watchdog is not reset
> during the reboot process, in which case you'll get a
> reboot while running the other system.

Well, in that case I would run without nowayout. I just think the
behavior is a bit strange if the watchdog is turned off at all if we
have nowayout set.

Thanks for your suggestion though!

// Simon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Wim Van Sebroeck on
Hi,

> Currently, the watchdog is turned off when the system shuts down or the
> module is unloaded. If nowayout has been selected, this makes no sense
> and fails to restart the system if it hangs during reboot, so make it
> conditional.

the nowayout option is there to make sure that the watchdog keeps running
as long as the system is running. If you however do a system shutdown (which
means that you are going to reboot your server in a controlled fasion and thus
not as result of a crash or hang-situation), then either the shutdown function
of your platform_driver or the reboot_notifier call will be executed.
In the case of a watchdog device idriver we will then stop the watchdog to
prevent reboot's during the fsck that might happen after reboot.
If you run into a reboot operation during an fsck then chances a very big that
after the reboot your system will again be rebooted during the next fsck.
To prevent this fsck-reboot-loop issue we turn of the watchdog when rebooting.

Because of this, I'm not going to apply this patch.

> We have a system which has such a hang, and therefore want the watchdog
> to be on until the bitter end.

Hmm, the correct question here should be: why do we have a hang in a clean boot.
Do you have more info on what exactly happens? This might be an initialization problem.

If we better understand what happens, then we might consider having an option to keep
the watchdog on after a reboot (which has nothing to do with the nowayout functionality imho).
or even power-on.

Kind regards,
Wim.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Simon Kagstrom on
On Sun, 7 Mar 2010 16:16:56 +0100
Wim Van Sebroeck <wim(a)iguana.be> wrote:

> > Currently, the watchdog is turned off when the system shuts down or the
> > module is unloaded. If nowayout has been selected, this makes no sense
> > and fails to restart the system if it hangs during reboot, so make it
> > conditional.
>
> the nowayout option is there to make sure that the watchdog keeps running
> as long as the system is running. If you however do a system shutdown (which
> means that you are going to reboot your server in a controlled fasion and thus
> not as result of a crash or hang-situation), then either the shutdown function
> of your platform_driver or the reboot_notifier call will be executed.
> In the case of a watchdog device idriver we will then stop the watchdog to
> prevent reboot's during the fsck that might happen after reboot.
> If you run into a reboot operation during an fsck then chances a very big that
> after the reboot your system will again be rebooted during the next fsck.
> To prevent this fsck-reboot-loop issue we turn of the watchdog when rebooting.

At least on the system I run on, the watchdog is turned off by the
reboot itself, so it won't trigger on the next start anyway. But from
Padraigs mail earlier I understand that this isn't the case everywhere,
so it's a valid concern.

However, I still think it would be nice to have this option avaiable
for those that need it. Perhaps some option like "noshutdown" to keep
it running during reboots.

> > We have a system which has such a hang, and therefore want the watchdog
> > to be on until the bitter end.
>
> Hmm, the correct question here should be: why do we have a hang in a clean boot.
> Do you have more info on what exactly happens? This might be an initialization problem.

Sorry, I should have been more clear here: The system hangs during
shutdown (for reboot), not during the next bootup (when it's turned off
anyway). So what my patch was trying to protect for is a hang before
restart.


You are of course right - the core issue is the hang itself. The hang
occurs very rarely, and I don't have a way to reproduce it. We've seen
it on both an old 2.6.23 kernel and 2.6.31 (what we use currently).
I've manually inspected all shutdown calls and reboot notifiers which
gets called on reboot, but not seen any obvious places where the system
can hang. I'm suspecting some interaction with an interrupt handler or
similar, but I can't really tell.

The patch I sent provides protection against this hang, and it's
something we really need until we've found the real issue.


Unfortunately, iTCO_wdt is the first driver the shutdown() call is made
to, so the hang could be in any of the other shutdown() calls. I could
perhaps also go with a solution where the watchdog was guaranteed to be
turned off last right before reboot.

// Simon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/