From: Ingo Molnar on

* Huang Ying <ying.huang(a)intel.com> wrote:

> NMI can be triggered even when IRQ is masked. So it is not safe for NMI
> handler to call some functions. One solution is to delay the call via self
> interrupt, so that the delayed call can be done once the interrupt is
> enabled again. This has been implemented in MCE and perf event. This patch
> provides a unified version and make it easier for other NMI semantic handler
> to take use of the delayed call.
>
> Signed-off-by: Huang Ying <ying.huang(a)intel.com>
> ---
> arch/x86/include/asm/entry_arch.h | 1
> arch/x86/include/asm/hw_irq.h | 1
> arch/x86/include/asm/irq_vectors.h | 5 +
> arch/x86/include/asm/nmi.h | 7 ++
> arch/x86/kernel/entry_64.S | 3 +
> arch/x86/kernel/irqinit.c | 3 +
> arch/x86/kernel/traps.c | 104 +++++++++++++++++++++++++++++++++++++
> 7 files changed, 124 insertions(+)

Instead of introducing this extra intermediate facility please use the same
approach the unified NMI watchdog is using (see latest -tip): a perf event
callback gives all the extra functionality needed.

The MCE code needs to be updated to use that - and then it will be integrated
into the events framework.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Huang Ying on
On Sat, 2010-06-12 at 18:25 +0800, Ingo Molnar wrote:
> * Huang Ying <ying.huang(a)intel.com> wrote:
>
> > NMI can be triggered even when IRQ is masked. So it is not safe for NMI
> > handler to call some functions. One solution is to delay the call via self
> > interrupt, so that the delayed call can be done once the interrupt is
> > enabled again. This has been implemented in MCE and perf event. This patch
> > provides a unified version and make it easier for other NMI semantic handler
> > to take use of the delayed call.
> >
> > Signed-off-by: Huang Ying <ying.huang(a)intel.com>
> > ---
> > arch/x86/include/asm/entry_arch.h | 1
> > arch/x86/include/asm/hw_irq.h | 1
> > arch/x86/include/asm/irq_vectors.h | 5 +
> > arch/x86/include/asm/nmi.h | 7 ++
> > arch/x86/kernel/entry_64.S | 3 +
> > arch/x86/kernel/irqinit.c | 3 +
> > arch/x86/kernel/traps.c | 104 +++++++++++++++++++++++++++++++++++++
> > 7 files changed, 124 insertions(+)
>
> Instead of introducing this extra intermediate facility please use the same
> approach the unified NMI watchdog is using (see latest -tip): a perf event
> callback gives all the extra functionality needed.

Sorry, if my understanding is correct, the perf event overflow callback
should be run in NMI context instead of a delayed context (such as IRQ,
soft_irq, process context). That is, the backtrace of
watchdog_overflow_callback should be something as follow:

x86_pmu_handle_irq
perf_event_overflow
__perf_event_overflow
watchdog_overflow_callback

I do not find the delayed mechanism here.

> The MCE code needs to be updated to use that - and then it will be integrated
> into the events framework.

MCE is NMI-like, and there are other NMI users too. I think some of them
will need some kind of delayed call mechanism. In fact, perf itself uses
self-made NMI delayed call mechanism too, I just want to generalize it
for other users too.

Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Hidetoshi Seto on
(2010/06/12 19:25), Ingo Molnar wrote:
>
> * Huang Ying <ying.huang(a)intel.com> wrote:
>
>> NMI can be triggered even when IRQ is masked. So it is not safe for NMI
>> handler to call some functions. One solution is to delay the call via self
>> interrupt, so that the delayed call can be done once the interrupt is
>> enabled again. This has been implemented in MCE and perf event. This patch
>> provides a unified version and make it easier for other NMI semantic handler
>> to take use of the delayed call.
>
> Instead of introducing this extra intermediate facility please use the same
> approach the unified NMI watchdog is using (see latest -tip): a perf event
> callback gives all the extra functionality needed.
>
> The MCE code needs to be updated to use that - and then it will be integrated
> into the events framework.

Hi Ingo,

I think this "NMI delayed call mechanism" could be a part of "the events
framework" that we are planning to get in kernel soon. At least APEI will
use NMI to report some hardware events (likely error) to kernel. So I
suppose we will go to have a delayed call as an event handler for APEI.

Generally speaking "event" can occur independently of the situation.
NMI can tell us some of external events, expecting urgent reaction for
the event, but we cannot do everything in NMI context. Or we might have
a sudden urge to generate an internal event while interrupts are disabled.

I agree that generating a self interrupt is reasonable solution.
Note that it could be said that both of "MCE handled (=event log should
be delivered to userland asap)" and "perf events pending (=pending events
should be handled asap)" are kind of internal event that requires urgent
handling in non-NMI kernel context. One question here is why we should
have different vectors for these events that uses same mechanism.

How about calling the vector LOCAL_EVENT_VECTOR or so?
I guess there should be better name if it is possible to inject an event
to other cpus via IPI with this vector...


Thanks,
H.Seto


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Don Zickus on
On Mon, Jun 14, 2010 at 12:45:21PM +0900, Hidetoshi Seto wrote:
> (2010/06/12 19:25), Ingo Molnar wrote:
> >
> > * Huang Ying <ying.huang(a)intel.com> wrote:
> >
> >> NMI can be triggered even when IRQ is masked. So it is not safe for NMI
> >> handler to call some functions. One solution is to delay the call via self
> >> interrupt, so that the delayed call can be done once the interrupt is
> >> enabled again. This has been implemented in MCE and perf event. This patch
> >> provides a unified version and make it easier for other NMI semantic handler
> >> to take use of the delayed call.
> >
> > Instead of introducing this extra intermediate facility please use the same
> > approach the unified NMI watchdog is using (see latest -tip): a perf event
> > callback gives all the extra functionality needed.
> >
> > The MCE code needs to be updated to use that - and then it will be integrated
> > into the events framework.
>
> Hi Ingo,
>
> I think this "NMI delayed call mechanism" could be a part of "the events
> framework" that we are planning to get in kernel soon. At least APEI will
> use NMI to report some hardware events (likely error) to kernel. So I
> suppose we will go to have a delayed call as an event handler for APEI.
>
> Generally speaking "event" can occur independently of the situation.
> NMI can tell us some of external events, expecting urgent reaction for
> the event, but we cannot do everything in NMI context. Or we might have
> a sudden urge to generate an internal event while interrupts are disabled.
>
> I agree that generating a self interrupt is reasonable solution.
> Note that it could be said that both of "MCE handled (=event log should
> be delivered to userland asap)" and "perf events pending (=pending events
> should be handled asap)" are kind of internal event that requires urgent
> handling in non-NMI kernel context. One question here is why we should
> have different vectors for these events that uses same mechanism.

I think the perf event subsytem can log events in NMI context already and
deliver them to userspace when the NMI is done. This is why I think Ingo
wants MCE to be updated to sit on top of the perf event subsytem to avoid
re-invent everything again.

Then again I do not know enough about the MCE stuff to understand what you
mean when an event comes in but you can't handle it in an NMI-safe
context. An example would be helpful.

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on
> I think the perf event subsytem can log events in NMI context already and
> deliver them to userspace when the NMI is done. This is why I think Ingo
> wants MCE to be updated to sit on top of the perf event subsytem to avoid
> re-invent everything again.

perf is not solving the problem this is trying to solve.

> Then again I do not know enough about the MCE stuff to understand what you
> mean when an event comes in but you can't handle it in an NMI-safe
> context. An example would be helpful.

At least for MCE hwpoison recovery needs to sleep and you obviously cannot sleep in
NMI like context. The way it's done is to first do a self interrupt, then do a work queue
wakeup and finally the sleeping operations.

perf does not fit into this because it has no way to process such an event
inside the kernel.

Anyways this just cleans up the existing mechanism to share some code.

-Andi

--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/