x86_64 page fault NMI-safe [Kernel]

Prev: [PATCH] enable readback to get HPET working on ATI SB4x00, kernel 2.6.35_rc5
Next: input: Fix wrong dimensions check for synaptics

From: Andi Kleen on 14 Jul 2010 16:10

> > In fact I'm pretty sure it worked originally. Perhaps it regressed?
>
> I'd first ask the obvious to Perf authors: does perf issue vmalloc_sync_all()
> between percpu data allocation and tracing activation ? The generic ring buffer
> library I posted last week does it already as a precaution for this very
> specific reason (making sure NMIs never trigger page faults).

I suspect the low level per cpu allocation functions should
just call it.

-Andi

--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: H. Peter Anvin on 14 Jul 2010 16:20

On 07/14/2010 01:07 PM, Andi Kleen wrote:
>>> In fact I'm pretty sure it worked originally. Perhaps it regressed?
>>
>> I'd first ask the obvious to Perf authors: does perf issue vmalloc_sync_all()
>> between percpu data allocation and tracing activation ? The generic ring buffer
>> library I posted last week does it already as a precaution for this very
>> specific reason (making sure NMIs never trigger page faults).
>
> I suspect the low level per cpu allocation functions should
> just call it.
>

Yes, specifically the point at which we allocate new per cpu memory blocks.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Mathieu Desnoyers on 14 Jul 2010 16:20

* Linus Torvalds (torvalds(a)linux-foundation.org) wrote:
> On Wed, Jul 14, 2010 at 12:36 PM, Frederic Weisbecker
> <fweisbec(a)gmail.com> wrote:
> >
> > There is also the fact we need to handle the lost NMI, by defering its
> > treatment or so. That adds even more complexity.
>
> I don't think your read my proposal very deeply. It already handles
> them by taking a fault on the iret of the first one (that's why we
> point to the stack frame - so that we can corrupt it and force a
> fault).

It only handles the case of a single NMI coming in. What happens in this
scenario?

- NMI (1) comes in.
- takes a fault
- iret
- NMI (2) comes in.
- nesting detected, popf/ret
- takes another fault
- NMI (3) comes in.
- nesting detected, popf/ret
- iret faults
- executes only one extra NMI handler

We miss NMI (3) here. I think this is an important change from a semantic where,
AFAIK, the hardware should be allowed to assume that the CPU will execute as
many nmi handlers are there are NMIs acknowledged.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Mathieu Desnoyers on 14 Jul 2010 16:40

* Linus Torvalds (torvalds(a)linux-foundation.org) wrote
[...]
> In fact, I wonder if we couldn't just do a software NMI disable
> instead? Hav ea per-cpu variable (in the _core_ percpu areas that get
> allocated statically) that points to the NMI stack frame, and just
> make the NMI code itself do something like
>
> NMI entry:

Let's try to figure out how far we can go with this idea. First, to answer
Ingo's critic, let's assume we do a stack frame copy before entering the
"generic" nmi handler routine.

> - load percpu NMI stack frame pointer
> - if non-zero we know we're nested, and should ignore this NMI:
> - we're returning to kernel mode, so return immediately by using
> "popf/ret", which also keeps NMI's disabled in the hardware until the
> "real" NMI iret happens.

Maybe incrementing a per-cpu missed NMIs count could be appropriate here so we
know how many NMIs should be replayed at iret ?

> - before the popf/iret, use the NMI stack pointer to make the NMI
> return stack be invalid and cause a fault

I assume you mean "popf/ret" here. So assuming we use a frame copy, we should
change the nmi stack pointer in the nesting 0 nmi stack copy, so the nesting 0
NMI iret will trigger the fault.

> - set the NMI stack pointer to the current stack pointer

That would mean bringing back the NMI stack pointer to the (nesting - 1) nmi
stack copy.

>
> NMI exit (not the above "immediate exit because we nested"):
> clear the percpu NMI stack pointer

This would be rather:
- Copy the nesting 0 stack copy back onto the real nmi stack.
- clear the percpu nmi stack pointer

** !

> Just do the iret.

Which presumably faults if we changed the return stack for an invalid one and
executes as many NMIs as there are "missed nmis" counted (missed nmis should
probably be read with an xchg() instruction).

So, one question persists, regarding the "** !" comment: what do we do if an NMI
comes in exactly at that point ? I'm afraid it will overwrite the "real" nmi
stack, and therefore drop all the "pending" nmis by setting the nmi stack return
address to a valid one.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Linus Torvalds on 14 Jul 2010 17:00

On Wed, Jul 14, 2010 at 1:17 PM, Mathieu Desnoyers
<mathieu.desnoyers(a)efficios.com> wrote:
>
> It only handles the case of a single NMI coming in. What happens in this
> scenario?

[ two nested NMI's ]

The _right_ thing happens.

What do you think the hardware would have done itself? The NMI was
blocked. It wouldn't get replayed twice. If you have two NMI's
happening while another NMI is active, you will get a single NMI after
the first NMI has completed.

So stop these _idiotic_ complaints. And face the music:

- NMI's aren't that important. They are a _hell_ of a lot less
important than the standard page fault path, for example.

- We do _not_ want to add more NMI magic outside of the NMI
codepaths. It's much better to handle NMI special cases in the NMI
code, rather than sprinkle them in random other codepaths (like percpu
allocators) that have NOTHING WHAT-SO-EVER to do with NMI's!

Linus

>
> - NMI (1) comes in.
> - takes a fault
> � �- iret
> - NMI (2) comes in.
> �- nesting detected, popf/ret
> - takes another fault
> - NMI (3) comes in.
> �- nesting detected, popf/ret
> - iret faults
> �- executes only one extra NMI handler
>
> We miss NMI (3) here. I think this is an important change from a semantic where,
> AFAIK, the hardware should be allowed to assume that the CPU will execute as
> many nmi handlers are there are NMIs acknowledged.
>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
Prev: [PATCH] enable readback to get HPET working on ATI SB4x00, kernel 2.6.35_rc5
Next: input: Fix wrong dimensions check for synaptics