From: Maciej W. Rozycki on
On Wed, 14 Jul 2010, Mathieu Desnoyers wrote:

> This patch makes all faults, traps and exception safe to be called from NMI
> context *except* single-stepping, which requires iret to restore the TF (trap
> flag) and jump to the return address in a single instruction. Sorry, no kprobes

Watch out for the RF flag too, that is not set correctly by POPFD -- that
may be important for faulting instructions that also have a hardware
breakpoint set at their address.

> support in NMI handlers because of this limitation. This cannot be emulated
> with popf/lret, because lret would be single-stepped. It does not apply to
> "immediate values" because they do not use single-stepping. This code detects if
> the TF flag is set and uses the iret path for single-stepping, even if it
> reactivates NMIs prematurely.

What about the VM flag for VM86 tasks? It cannot be changed by POPFD
either.

How about only using the special return path when a nested exception is
about to return to the NMI handler? You'd avoid all the odd cases then
that do not happen in the NMI context.

Maciej
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mathieu Desnoyers on
* Maciej W. Rozycki (macro(a)linux-mips.org) wrote:
> On Wed, 14 Jul 2010, Mathieu Desnoyers wrote:
>
> > This patch makes all faults, traps and exception safe to be called from NMI
> > context *except* single-stepping, which requires iret to restore the TF (trap
> > flag) and jump to the return address in a single instruction. Sorry, no kprobes
>
> Watch out for the RF flag too, that is not set correctly by POPFD -- that
> may be important for faulting instructions that also have a hardware
> breakpoint set at their address.
>
> > support in NMI handlers because of this limitation. This cannot be emulated
> > with popf/lret, because lret would be single-stepped. It does not apply to
> > "immediate values" because they do not use single-stepping. This code detects if
> > the TF flag is set and uses the iret path for single-stepping, even if it
> > reactivates NMIs prematurely.
>
> What about the VM flag for VM86 tasks? It cannot be changed by POPFD
> either.
>
> How about only using the special return path when a nested exception is
> about to return to the NMI handler? You'd avoid all the odd cases then
> that do not happen in the NMI context.

This is exactly what this patch does :-)

It selects the return path with

+ testl $NMI_MASK,TI_preempt_count(%ebp)
+ jz resume_kernel /* Not nested over NMI ? */

In addition, about int3 breakpoints use in the kernel, AFAIK the handler does
not explicitly set the RF flag, and the breakpoint instruction (int3) appears
not to set it. (from my understanding of Intel's
Intel Architecture Software Developer's Manual Volume 3: System Programming
15.3.1.1. INSTRUCTION-BREAKPOINT EXCEPTION C)

So it should be safe to set a int3 breakpoint in a NMI handler with this patch.
It's just the "single-stepping" feature of kprobes which is problematic.
Luckily, only int3 is needed for code patching bypass.

Thanks,

Mathieu


--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Maciej W. Rozycki on
On Wed, 14 Jul 2010, Mathieu Desnoyers wrote:

> > How about only using the special return path when a nested exception is
> > about to return to the NMI handler? You'd avoid all the odd cases then
> > that do not happen in the NMI context.
>
> This is exactly what this patch does :-)

Ah, OK then -- I understood you actually tested the value of TF in the
image to be restored.

> It selects the return path with
>
> + testl $NMI_MASK,TI_preempt_count(%ebp)
> + jz resume_kernel /* Not nested over NMI ? */
>
> In addition, about int3 breakpoints use in the kernel, AFAIK the handler does
> not explicitly set the RF flag, and the breakpoint instruction (int3) appears
> not to set it. (from my understanding of Intel's
> Intel Architecture Software Developer's Manual Volume 3: System Programming
> 15.3.1.1. INSTRUCTION-BREAKPOINT EXCEPTION C)

The CPU only sets RF itself in the image saved in certain cases -- you'd
see it set in the page fault handler for example, so that once the handler
has finished any instruction breakpoint does not hit (presumably again,
because the instruction breakpoint debug exception has the highest
priority). You mentioned the need to handle these faults.

> So it should be safe to set a int3 breakpoint in a NMI handler with this patch.
>
> It's just the "single-stepping" feature of kprobes which is problematic.
> Luckily, only int3 is needed for code patching bypass.

Actually the breakpoint exception handler should actually probably set RF
explicitly, but that depends on the exact debugging scenario, so I can't
comment on it further. I don't know how INT3 is used in this context, so
I'm just noting this may be a danger zone.

Maciej
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mathieu Desnoyers on
* Maciej W. Rozycki (macro(a)linux-mips.org) wrote:
> On Wed, 14 Jul 2010, Mathieu Desnoyers wrote:
>
> > > How about only using the special return path when a nested exception is
> > > about to return to the NMI handler? You'd avoid all the odd cases then
> > > that do not happen in the NMI context.
> >
> > This is exactly what this patch does :-)
>
> Ah, OK then -- I understood you actually tested the value of TF in the
> image to be restored.

It tests it too. When it detects that the return path is about to return to a
NMI handler, it checks if the TF flag is set. If it is set, then "iret" is
really needed, because TF can only single-step an instruction when set by
"iret". The popf/ret scheme would otherwise trap at the "ret" instruction that
follows popf. Anyway, single-stepping is really discouraged in nmi handlers,
because there is no way to go around the iret.

>
> > It selects the return path with
> >
> > + testl $NMI_MASK,TI_preempt_count(%ebp)
> > + jz resume_kernel /* Not nested over NMI ? */
> >
> > In addition, about int3 breakpoints use in the kernel, AFAIK the handler does
> > not explicitly set the RF flag, and the breakpoint instruction (int3) appears
> > not to set it. (from my understanding of Intel's
> > Intel Architecture Software Developer's Manual Volume 3: System Programming
> > 15.3.1.1. INSTRUCTION-BREAKPOINT EXCEPTION C)
>
> The CPU only sets RF itself in the image saved in certain cases -- you'd
> see it set in the page fault handler for example, so that once the handler
> has finished any instruction breakpoint does not hit (presumably again,
> because the instruction breakpoint debug exception has the highest
> priority). You mentioned the need to handle these faults.

Well, the only case where I think it might make sense to allow a breakpoint in
NMI handler code would be to temporarily replace a static branch, which should
in no way be able to trigger any other fault.

>
> > So it should be safe to set a int3 breakpoint in a NMI handler with this patch.
> >
> > It's just the "single-stepping" feature of kprobes which is problematic.
> > Luckily, only int3 is needed for code patching bypass.
>
> Actually the breakpoint exception handler should actually probably set RF
> explicitly, but that depends on the exact debugging scenario, so I can't
> comment on it further. I don't know how INT3 is used in this context, so
> I'm just noting this may be a danger zone.

In the case of temporary bypass, the int3 is only there to divert the
instruction execution flow to somewhere else, and we come back to the original
code at the address following the instruction which has the breakpoint. So
basically, we never come back to the original instruction, ever. We might as
well just clear the RF flag from the EFLAGS image before popf.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Maciej W. Rozycki on
On Wed, 14 Jul 2010, Mathieu Desnoyers wrote:

> It tests it too. When it detects that the return path is about to return to a
> NMI handler, it checks if the TF flag is set. If it is set, then "iret" is
> really needed, because TF can only single-step an instruction when set by
> "iret". The popf/ret scheme would otherwise trap at the "ret" instruction that
> follows popf. Anyway, single-stepping is really discouraged in nmi handlers,
> because there is no way to go around the iret.

Hmm, with Pentium Pro and more recent processors there is actually a
nasty hack that will let you get away with POPF/RET and TF set. ;) You
can try it if you like and can arrange for an appropriate scenario.

> In the case of temporary bypass, the int3 is only there to divert the
> instruction execution flow to somewhere else, and we come back to the original
> code at the address following the instruction which has the breakpoint. So
> basically, we never come back to the original instruction, ever. We might as
> well just clear the RF flag from the EFLAGS image before popf.

Yes, if you return to elsewhere, then that's actually quite desirable
IMHO.

This RF flag is quite complicated to handle and there are some errata
involved too. If I understand it correctly, all fault-class exception
handlers are expected to set it manually in the image to be restored if
they return to the original faulting instruction (that includes the debug
exception handler if it was invoked as a fault, i.e. in response to an
instruction breakpoint). Then all trap-class exception handlers are
expected to clear the flag (and that includes the debug exception handler
if it was invoked as a trap, e.g. in response to a data breakpoint or a
single step). I haven't checked if Linux gets these bits right, but it
may be worth doing so.

For the record -- GDB hardly cares, because it removes any instruction
breakpoints before it is asked to resume execution of an instruction that
has a breakpoint set at, single-steps the instruction with all the other
threads locked out and then reinserts the breakpoints so that they can hit
again. Then it proceeds with whatever should be done next to fulfil the
execution request.

Maciej
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/