x86_64 page fault NMI-safe [Kernel]

Prev: [PATCH] enable readback to get HPET working on ATI SB4x00, kernel 2.6.35_rc5
Next: input: Fix wrong dimensions check for synaptics

From: H. Peter Anvin on 15 Jul 2010 18:30

On 07/15/2010 03:16 PM, Linus Torvalds wrote:
>
>> This code assumes NMIs have a separate stack.
>
> It also needs to be made per-cpu (and the flags be per-cpu).
>
> Then you could in fact possibly test the stack pointer for whether it
> is in the NMI stack area, and use the value of %rsp itself as the
> flag. So you could avoid the flag entirely. Because testing %rsp is
> valid - testing %rip is not.
>
> That would also avoid the race, because %rsp (as a flag) now gets
> cleared atomically by the "iret". So that might actually solve things.
>

This seems really clean to me.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Linus Torvalds on 15 Jul 2010 18:30

On Thu, Jul 15, 2010 at 3:16 PM, Linus Torvalds
<torvalds(a)linux-foundation.org> wrote:
>
> Then you could in fact possibly test the stack pointer for whether it
> is in the NMI stack area, and use the value of %rsp itself as the
> flag. So you could avoid the flag entirely. Because testing %rsp is
> valid - testing %rip is not.
>
> That would also avoid the race, because %rsp (as a flag) now gets
> cleared atomically by the "iret". So that might actually solve things.

Hmm. So on x86-32, it's easy: if the NMI is nested, you can literally
look at the current %rsp value, and see if it's within the NMI stack
region.

But on x86-64, due to IST, you need to look at the saved-rsp value on
the stack, since the %rsp always gets reset to the NMI stack region
regardless of where it was before.

Why do we force IST use for NMI, btw? Maybe we shouldn't, and just use
the normal kernel stack mechanisms?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Mathieu Desnoyers on 15 Jul 2010 18:40

* Linus Torvalds (torvalds(a)linux-foundation.org) wrote:
> On Thu, Jul 15, 2010 at 3:01 PM, Mathieu Desnoyers
> <mathieu.desnoyers(a)efficios.com> wrote:
> >
> > . NMI exit code
> > and fake NMI entry are made reentrant with respect to NMI handler interruption
> > by testing, at the very beginning of the NMI handler, if a NMI is nested over
> > the whole nmi_atomic .. nmi_atomic_end code region.
>
> That is totally bogus. The NMI can be nested by exceptions and
> function calls - the whole _point_ of this thing. So testing "rip" for
> anything else than the specific final "iret" is meaningless. You will
> be in an NMI region regardless of what rip is.

There are 2 tests done on NMI handler entry:

1) test if nested over nmi_atomic region (which is a very restrained region
around nmi_exit, which does not do any function call nor take traps).
2) test if the per-cpu nmi_nesting flag is set.

Test #2 takes care of NMIs nested over functions called and traps.

>
> > This code assumes NMIs have a separate stack.
>
> It also needs to be made per-cpu (and the flags be per-cpu).

Sure, that was implied ;)

>
> Then you could in fact possibly test the stack pointer for whether it
> is in the NMI stack area, and use the value of %rsp itself as the
> flag. So you could avoid the flag entirely. Because testing %rsp is
> valid - testing %rip is not.

That could be used as a way to detect "nesting over NMI", but I'm not entirely
sure it would deal with the "we need a fake NMI" flag set/clear (more or less
equivalent to setting CS to 0 in your implementation and then back to some other
value). The "set" is done with NMIs disabled, but the "clear" is done at fake
NMI entry, where NMIs are active.

>
> That would also avoid the race, because %rsp (as a flag) now gets
> cleared atomically by the "iret". So that might actually solve things.

Well, I'm still unconvinced there is anything to solve, as I built my NMI entry
with 2 tests: one for "nmi_atomic" code range and the other for per-cpu nesting
flag. Given that I set/clear the per-cpu nesting flag either with NMIs off or
within the nmi_atomic code range, this should all work fine.

Unless I am missing something else ?

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: H. Peter Anvin on 15 Jul 2010 18:50

On 07/15/2010 03:26 PM, Linus Torvalds wrote:
> On Thu, Jul 15, 2010 at 3:16 PM, Linus Torvalds
> <torvalds(a)linux-foundation.org> wrote:
>>
>> Then you could in fact possibly test the stack pointer for whether it
>> is in the NMI stack area, and use the value of %rsp itself as the
>> flag. So you could avoid the flag entirely. Because testing %rsp is
>> valid - testing %rip is not.
>>
>> That would also avoid the race, because %rsp (as a flag) now gets
>> cleared atomically by the "iret". So that might actually solve things.
>
> Hmm. So on x86-32, it's easy: if the NMI is nested, you can literally
> look at the current %rsp value, and see if it's within the NMI stack
> region.
>
> But on x86-64, due to IST, you need to look at the saved-rsp value on
> the stack, since the %rsp always gets reset to the NMI stack region
> regardless of where it was before.
>
> Why do we force IST use for NMI, btw? Maybe we shouldn't, and just use
> the normal kernel stack mechanisms?
>

The reasons for using TSS (32 bits) or IST (64 bits) are: concern about
the size of the regular kernel stack, and a concern that the kernel
stack pointer may not be in a usable state. The former is not a problem
here: we're doing a stack switch anyway, and so the additional overhead
on the main stack is pretty minimal, but the latter may be.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andi Kleen on 15 Jul 2010 19:00

> Why do we force IST use for NMI, btw? Maybe we shouldn't, and just use
> the normal kernel stack mechanisms?

If you don't use IST the SYSCALL entry is racy during the window
when RSP is not set up yet (same for MCE etc.)

-Andi
--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Prev: [PATCH] enable readback to get HPET working on ATI SB4x00, kernel 2.6.35_rc5
Next: input: Fix wrong dimensions check for synaptics