Unified NMI delayed call mechanism [Kernel]

Prev: mpd client timeouts (bisected) 2.6.35-rc3
Next: 2.6.35-rc{12} regression: inactive console corrupted

From: Peter Zijlstra on 18 Jun 2010 08:50

On Fri, 2010-06-18 at 14:25 +0200, Andi Kleen wrote:
> > So aside from the should this be perf or not, the above is utter
> > gibberish. Whoever came up with this nonsense?
>
> This is pretty much how softirqs (and before them bottom halves) work.
> I believe Linus invented that scheme originally back in the early
> days of Linux.

Doesn't mean its the right abstraction for this.

> It's actually quite simple and works well

And adds more code than it removes whilst providing a very limited
service.

You generally want to pass more information along anyway, now your
callback function needs to go look for it. Much better to pass a
work_struct like thing around that is contained in the state it needs.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ingo Molnar on 18 Jun 2010 08:50

* huang ying <huang.ying.caritas(a)gmail.com> wrote:

> Hi, Ingo,
>
> On Fri, Jun 18, 2010 at 5:48 PM, Ingo Molnar <mingo(a)elte.hu> wrote:
> >
> > * Hidetoshi Seto <seto.hidetoshi(a)jp.fujitsu.com> wrote:
> >
> >> (2010/06/12 19:25), Ingo Molnar wrote:
> >> >
> >> > * Huang Ying <ying.huang(a)intel.com> wrote:
> >> >
> >> >> NMI can be triggered even when IRQ is masked. So it is not safe for NMI
> >> >> handler to call some functions. One solution is to delay the call via self
> >> >> interrupt, so that the delayed call can be done once the interrupt is
> >> >> enabled again. This has been implemented in MCE and perf event. This patch
> >> >> provides a unified version and make it easier for other NMI semantic handler
> >> >> to take use of the delayed call.
> >> >
> >> > Instead of introducing this extra intermediate facility please use the same
> >> > approach the unified NMI watchdog is using (see latest -tip): a perf event
> >> > callback gives all the extra functionality needed.
> >> >
> >> > The MCE code needs to be updated to use that - and then it will be integrated
> >> > into the events framework.
> >>
> >> Hi Ingo,
> >>
> >> I think this "NMI delayed call mechanism" could be a part of "the events
> >> framework" that we are planning to get in kernel soon. [...]
> >
> > My request was to make it part of perf events - which is a generic event
> > logging framework. We dont really need/want a second 'events framework' as
> > we have one already ;-)
>
> This patchset is simple and straightforward, [...]

We wouldnt want to add another workqueue or memory allocation mechanism
either, even if it was 'simple and straightforward'. We try to make things
more generally useful.

> [...] it is just a delayed execution mechanism, not another 'events
> framework'. There are several other NMI users other than perf, should we
> integrate all NMI users into perf framework?

We already did so with the NMI watchdog. What other significant NMI event
users do you have in mind?

> >> [...] ??At least APEI will use NMI to report some hardware events (likely
> >> error) to kernel. ??So I suppose we will go to have a delayed call as an
> >> event handler for APEI.
> >
> > Yep, that makes sense. I wasnt arguing against the functionality itself, i
> > was arguing against the illogical layering that limits its utility. By
> > making it part of perf events it becomes a generic part of that framework
> > and can be used by anything that deals with events and uses that
> > framework.
>
> I think the the 'layering' in the patchset helps instead of 'limits' its
> utility. It is designed to be as general as possible, so that it can be used
> by both perf and other NMI users. Do you think so?

What other NMI users do you mean? EDAC/MCE is going to go utilize events as
well (away from the horrible /dev/mcelog interface), the NMI watchdog already
did it and the perf tool obviously does as well. There's a few leftovers like
kcrash which isnt really event centric and i dont think it needs to be
converted.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andi Kleen on 18 Jun 2010 09:10

> You generally want to pass more information along anyway, now your
> callback function needs to go look for it. Much better to pass a
> work_struct like thing around that is contained in the state it needs.

But how would you allocate the work queue in an NMI?

If it's only a single instance (like this bit) it can be always put
into a per cpu variable.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Peter Zijlstra on 18 Jun 2010 09:20

On Fri, 2010-06-18 at 15:09 +0200, Andi Kleen wrote:
> > You generally want to pass more information along anyway, now your
> > callback function needs to go look for it. Much better to pass a
> > work_struct like thing around that is contained in the state it needs.
>
> But how would you allocate the work queue in an NMI?
>
> If it's only a single instance (like this bit) it can be always put
> into a per cpu variable.

Pre-allocate. For the perf-event stuff we use the perf_event allocated
at creation time. But yeah, per-cpu storage also works.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Peter Zijlstra on 18 Jun 2010 09:30

On Fri, 2010-06-18 at 15:23 +0200, Andi Kleen wrote:
> On Fri, Jun 18, 2010 at 03:12:49PM +0200, Peter Zijlstra wrote:
> > On Fri, 2010-06-18 at 15:09 +0200, Andi Kleen wrote:
> > > > You generally want to pass more information along anyway, now your
> > > > callback function needs to go look for it. Much better to pass a
> > > > work_struct like thing around that is contained in the state it needs.
> > >
> > > But how would you allocate the work queue in an NMI?
> > >
> > > If it's only a single instance (like this bit) it can be always put
> > > into a per cpu variable.
> >
> > Pre-allocate. For the perf-event stuff we use the perf_event allocated
> > at creation time. But yeah, per-cpu storage also works.
>
> So you could just preallocate the bits instead ?

You mean the bits in your function array? Those are limited to 32 and
you'd need a secondary lookup to match them to your data object, not
very useful.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: mpd client timeouts (bisected) 2.6.35-rc3
Next: 2.6.35-rc{12} regression: inactive console corrupted