combine nmi_watchdog and softlockup [Kernel]

Prev: [PATCH v3] lockdep: Make lockstats counting per cpu
Next: [PATCH] rcu: make dead code really dead

From: Frederic Weisbecker on 8 Apr 2010 20:10

On Tue, Apr 06, 2010 at 07:31:15PM +0400, Cyrill Gorcunov wrote:
> > I fear the cpu clock is not going to help you detecting any hard lockups.
> > If you're stuck in an interrupt or an irq disabled loop, your cpu clock is
> > not going to fire.
> >
>
> I guess it's not supposed to. For such cases only nmi irqs may help for which
> the perf events are there (/me need to check if we program apic timer for anything
> like that). But it should help for other deadlocks. Or I miss something?

Actually not. What the hardlockup detector does it to check the progression
of irqs.

So it detects true hardlockups: stuck in an irq disabled section.
If you don't have NMI to detect that (here this made by hardware clock based
on cpu cycles overflows), then you're screwed. The hardlockup detector is
useless with a maskable irq based clock.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Frederic Weisbecker on 8 Apr 2010 20:30

On Tue, Apr 06, 2010 at 02:59:38PM -0400, Don Zickus wrote:
> > I fear the cpu clock is not going to help you detecting any hard lockups.
> > If you're stuck in an interrupt or an irq disabled loop, your cpu clock is
> > not going to fire.
>
> Yup. I put that in the changelog of some nmi code I posted a few weeks
> ago. I believe Ingo was looking to have a fallback plan in case hw perf
> events wasn't available.

I'm not sure a fallback plan is welcome. What we want is a single
implementation.
If archs already have their own hardlockup detectors, then good
for them as they have this fallback implementation already.

For those who haven't it or who want to profit from a generic
code that does a good deal of the error prone work for them already,
they can just implement a basic perf support for the cpu-cycle
events.

Perf events gives a chance to provide a generic support for hardlockup
detection and supporting a fallback solution from generic code would
be a waste of time.

> With this code, moving to sw perf events, kinda kills the ability to
> detect hard lockups, but you can still detect softlockups.

Totally. Actually the hardlockup part should be even ifdef'ed.
What is not nice is that we have a dependency to CONFIG_PERF_EVENT
for the softlockup detector now while that could be avoided easily.

You have the following layer:

- A task that updates softlockup last touch
- An hrtimer that wakes up this task and logs the irq progression
- A (in the best case) NMI that checks the irq progression and the softlockup
last touch progression.

Why not moving the softlockup last touch progression check from the hrtimer too?
This way you avoid the perf events dependency for the softlockup detector.
The new layer can then be:

- The task
- hrtimer: check softlockup last touch progression (is_softlockup()) and
then wake up the task again. Log irq progression.
- NMI: check irq progression.

This way can just ifdef all the perf events related things, only useful
for the hardlockup detector. And the softlockup detector can still be used
by archs that don't have perf events support.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Frederic Weisbecker on 8 Apr 2010 21:10

On Mon, Apr 05, 2010 at 10:11:22AM -0400, Don Zickus wrote:
> On Tue, Mar 23, 2010 at 05:33:38PM -0400, Don Zickus wrote:
> > The new nmi_watchdog (which uses the perf event subsystem) is very
> > similar in structure to the softlockup detector. Using Ingo's suggestion,
> > I combined the two functionalities into one file, kernel/watchdog.c.
> >
> > Now both the nmi_watchdog (or hardlockup detector) and softlockup detector
> > sit on top of the perf event subsystem, which is run every 60 seconds or so
> > to see if there are any lockups.
>
> I raised some questions privately to Ingo, he asked I re-iterate them with
> Peter Z. and Frederic W. cc'd.
>
> > Ok thanks. When you get a chance I had a couple of questions I was hoping
> > you could answer for me.
> >
> > - does the hrtimer stuff look ok?

IMO, only partially, as explained in my previous mail.

> > - I wanted to merge the hung task detector code into watchdog.c. The main
> > logic of the code is to walk the task list which i thought about doing
> > in the watchdog kthread. I assume that is the right way to go, but i was a
> > little confused on how the scheduler worked. I thought the watchdog kthread
> > would be scheduled very frequently (being a high priority task) but it seems
> > to only schedule when the code wakes it up. Is that right?

Yeah but high-prio doesn't mean that it is scheduled often.
It means that once it is in a runnable state (TASK_RUNNING), it
will have a higher priority to get into the cpu (lower prio tasks
will have less time in the cpu than the higher prio until the higher prio
get to sleep). Especially here this is a SCHED_FIFO class, so usual
tasks (SCHED_OTHER) won't ever run until it goes to sleep.

But when it goes to sleep, it doesn't need the cpu, so other tasks
can run.
And it is only woken up every 30 secs, just to call
__touch_softlockup_watchdog() and then it goes to sleep again
until the timer wakes it up. That's why it doesn't run often.
The high priority is just here to ensure it will do its job
without too much latency, may be even to avoid rt-tasks to
trigger spurious soft lockups just because the softlockup task
couldn't run because of them taking the cpu for too long.
If it starves because of a higher priority task running for
too long, it can't touch the softlockup_touch_ts, and the timer
will think there is a softlockup.

Concerning the hung task detector, I think it should be left as is in
its own file and dedicated task. IIRC the hung task and softlockup
detectors were in the same file before but they were split up.

We can't factorize both in the same task. The softlockup detector
needs to be a real time task for the reasons stated above. And it's fine
because it does very few things so it doesn't bother the other tasks
with its high prio (unless there are strong rt requirement elsewhere).
But the hung task detector must be a normal task, because it doesn't
have latency requirements, it just checks if a task is blocked for too
long, it's not like the softlockup detector that really needs to keep
up with a timer. Also it does too much things to be an rt task (walking
through the entire task list).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Don Zickus on 9 Apr 2010 09:40

On Fri, Apr 09, 2010 at 03:02:53AM +0200, Frederic Weisbecker wrote:
> On Mon, Apr 05, 2010 at 10:11:22AM -0400, Don Zickus wrote:
> > On Tue, Mar 23, 2010 at 05:33:38PM -0400, Don Zickus wrote:
> > > The new nmi_watchdog (which uses the perf event subsystem) is very
> > > similar in structure to the softlockup detector. Using Ingo's suggestion,
> > > I combined the two functionalities into one file, kernel/watchdog.c.
> > >
> > > Now both the nmi_watchdog (or hardlockup detector) and softlockup detector
> > > sit on top of the perf event subsystem, which is run every 60 seconds or so
> > > to see if there are any lockups.
> >
> > I raised some questions privately to Ingo, he asked I re-iterate them with
> > Peter Z. and Frederic W. cc'd.
> >
> > > Ok thanks. When you get a chance I had a couple of questions I was hoping
> > > you could answer for me.
> > >
> > > - does the hrtimer stuff look ok?
>
>
> IMO, only partially, as explained in my previous mail.

Yup, makes sense, thanks.

>
>
> > > - I wanted to merge the hung task detector code into watchdog.c. The main
> > > logic of the code is to walk the task list which i thought about doing
> > > in the watchdog kthread. I assume that is the right way to go, but i was a
> > > little confused on how the scheduler worked. I thought the watchdog kthread
> > > would be scheduled very frequently (being a high priority task) but it seems
> > > to only schedule when the code wakes it up. Is that right?
>
>
> Yeah but high-prio doesn't mean that it is scheduled often.
> It means that once it is in a runnable state (TASK_RUNNING), it
> will have a higher priority to get into the cpu (lower prio tasks
> will have less time in the cpu than the higher prio until the higher prio
> get to sleep). Especially here this is a SCHED_FIFO class, so usual
> tasks (SCHED_OTHER) won't ever run until it goes to sleep.
>
> But when it goes to sleep, it doesn't need the cpu, so other tasks
> can run.
> And it is only woken up every 30 secs, just to call
> __touch_softlockup_watchdog() and then it goes to sleep again
> until the timer wakes it up. That's why it doesn't run often.
> The high priority is just here to ensure it will do its job
> without too much latency, may be even to avoid rt-tasks to
> trigger spurious soft lockups just because the softlockup task
> couldn't run because of them taking the cpu for too long.
> If it starves because of a higher priority task running for
> too long, it can't touch the softlockup_touch_ts, and the timer
> will think there is a softlockup.

Ok.

>
>
> Concerning the hung task detector, I think it should be left as is in
> its own file and dedicated task. IIRC the hung task and softlockup
> detectors were in the same file before but they were split up.

I was doing that work based on a request by Ingo. Ingo, thoughts?

>
> We can't factorize both in the same task. The softlockup detector
> needs to be a real time task for the reasons stated above. And it's fine
> because it does very few things so it doesn't bother the other tasks
> with its high prio (unless there are strong rt requirement elsewhere).
> But the hung task detector must be a normal task, because it doesn't
> have latency requirements, it just checks if a task is blocked for too
> long, it's not like the softlockup detector that really needs to keep
> up with a timer. Also it does too much things to be an rt task (walking
> through the entire task list).

Ok. Makes sense.

Cheers,
Don

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Cyrill Gorcunov on 9 Apr 2010 11:00

On Fri, Apr 09, 2010 at 02:00:38AM +0200, Frederic Weisbecker wrote:
> On Tue, Apr 06, 2010 at 07:31:15PM +0400, Cyrill Gorcunov wrote:
> > > I fear the cpu clock is not going to help you detecting any hard lockups.
> > > If you're stuck in an interrupt or an irq disabled loop, your cpu clock is
> > > not going to fire.
> > >
> >
> > I guess it's not supposed to. For such cases only nmi irqs may help for which
> > the perf events are there (/me need to check if we program apic timer for anything
> > like that). But it should help for other deadlocks. Or I miss something?
>
>
> Actually not. What the hardlockup detector does it to check the progression
> of irqs.
>

yup, i know what nmi-watchdog is doing. I guess you've misunderstood me. I meant
that sw-driven detector is not supposed to guard against the cases you're
referring to. I don't remember the details but someone proposed to make a
fallback to sw-watchdog if there is no ability to use nmi from perf-events
(for any reason) which eventually being implemented in Don's patch. And
there will be a message that watchdog has been switched to sw-driven
scaffold. So user will (or should) see this message and mark it I believe.
This sw-watchdog is like "ok, we've been trying our best but there is a
problem and the only solution we could offer -- is to use sw-watchdog".
That is how I understand the reason for sw-watchdog there.

>
> So it detects true hardlockups: stuck in an irq disabled section.
> If you don't have NMI to detect that (here this made by hardware clock based
> on cpu cycles overflows), then you're screwed. The hardlockup detector is
> useless with a maskable irq based clock.
>
-- Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: [PATCH v3] lockdep: Make lockstats counting per cpu
Next: [PATCH] rcu: make dead code really dead