A question of perf NMI handler [Kernel]

Prev: [tip:timers/core] Checkpatch: Warn about unexpectedly long msleep's
Next: genhd, efi: add efi partition metadata to hd_structs

From: Cyrill Gorcunov on 4 Aug 2010 11:20

On Wed, Aug 04, 2010 at 05:02:41PM +0200, Peter Zijlstra wrote:
> On Wed, 2010-08-04 at 10:52 -0400, Don Zickus wrote:
> > > Right so I looked up your thing and while that limits the damage in that
> > > at some point it will let NMIs pass, it will still consume too many.
> > > Meaning that Yinghai will have to potentially press his NMI button
> > > several times before it registers.
> >
> > Ok. Thanks for reviewing. How does it consume to many? I probably don't
> > understand how perf is being used in the non-simple scenarios.
>
> Suppose you have 4 counters (AMD, intel-nhm+), when more than 2 overflow
> the first will raise the PMI, if the other 2+ overflow before we disable
> the PMU it will try to raise 2+ more PMIs, but because hardware only has
> a single interrupt pending bit it will at most cause a single extra
> interrupt after we finish servicing the first one.
>
> So then the first interrupt will see 3+ overflows, return 3+, and will
> thus eat 2+ NMIs, only one of which will be the pending interrupt,
> leaving 1+ NMIs from other sources to consume unhandled.
>
> In which case Yinghai will have to press his NMI button 2+ times before
> it registers.
>
> That said, that might be a better situation than always consuming
> unknown NMIs..
>

Well, first I guess having Yinghai CC'ed is a bonus ;)
The second thing is that I don't get why perf handler can't be _last_
call in default_do_nmi, if there were any nmi with reason (serr or parity)
I think they should be calling first which of course don't eliminate
the former issue but somewhat make it weaken.

-- Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Don Zickus on 4 Aug 2010 11:50

On Wed, Aug 04, 2010 at 05:02:41PM +0200, Peter Zijlstra wrote:
> So then the first interrupt will see 3+ overflows, return 3+, and will
> thus eat 2+ NMIs, only one of which will be the pending interrupt,
> leaving 1+ NMIs from other sources to consume unhandled.
>
> In which case Yinghai will have to press his NMI button 2+ times before
> it registers.
>
> That said, that might be a better situation than always consuming
> unknown NMIs..

Well if the worse case scenario is only one extra NMI, then I can change
the logic in my patch to eat a max of 1 possible NMI instead of 2 as in the
example you gave above.

It still won't be 100% accurate but how often are people running perf
where they need 4 counters, have to hit an external nmi button or run into
broken firmware all at the same time? :-)

Cheers,
Don

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Don Zickus on 4 Aug 2010 12:00

On Wed, Aug 04, 2010 at 07:18:58PM +0400, Cyrill Gorcunov wrote:
> On Wed, Aug 04, 2010 at 05:02:41PM +0200, Peter Zijlstra wrote:
> > On Wed, 2010-08-04 at 10:52 -0400, Don Zickus wrote:
> > > > Right so I looked up your thing and while that limits the damage in that
> > > > at some point it will let NMIs pass, it will still consume too many.
> > > > Meaning that Yinghai will have to potentially press his NMI button
> > > > several times before it registers.
> > >
> > > Ok. Thanks for reviewing. How does it consume to many? I probably don't
> > > understand how perf is being used in the non-simple scenarios.
> >
> > Suppose you have 4 counters (AMD, intel-nhm+), when more than 2 overflow
> > the first will raise the PMI, if the other 2+ overflow before we disable
> > the PMU it will try to raise 2+ more PMIs, but because hardware only has
> > a single interrupt pending bit it will at most cause a single extra
> > interrupt after we finish servicing the first one.
> >
> > So then the first interrupt will see 3+ overflows, return 3+, and will
> > thus eat 2+ NMIs, only one of which will be the pending interrupt,
> > leaving 1+ NMIs from other sources to consume unhandled.
> >
> > In which case Yinghai will have to press his NMI button 2+ times before
> > it registers.
> >
> > That said, that might be a better situation than always consuming
> > unknown NMIs..
> >
>
> Well, first I guess having Yinghai CC'ed is a bonus ;)
> The second thing is that I don't get why perf handler can't be _last_
> call in default_do_nmi, if there were any nmi with reason (serr or parity)
> I think they should be calling first which of course don't eliminate
> the former issue but somewhat make it weaken.

Because the reason registers are never set. If they were, then the code
wouldn't have to walk the notify_chain. :-)

Unknown nmis are unknown nmis, nobody is claiming them. Even worse, there
are customers that want to register their nmi handler below the perf
handler to claim all the unknown nmis, so they can be logged on the system
before being rebooted.

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Cyrill Gorcunov on 4 Aug 2010 12:20

On Wed, Aug 04, 2010 at 11:50:02AM -0400, Don Zickus wrote:
....
> >
> > Well, first I guess having Yinghai CC'ed is a bonus ;)
> > The second thing is that I don't get why perf handler can't be _last_
> > call in default_do_nmi, if there were any nmi with reason (serr or parity)
> > I think they should be calling first which of course don't eliminate
> > the former issue but somewhat make it weaken.
>
> Because the reason registers are never set. If they were, then the code
> wouldn't have to walk the notify_chain. :-)
>

maybe we're talking about different things. i meant that if there is nmi
with a reason (from 0x61) the handling of such nmi should be done before
notify_die I think (if only I not miss something behind).

> Unknown nmis are unknown nmis, nobody is claiming them. Even worse, there
> are customers that want to register their nmi handler below the perf
> handler to claim all the unknown nmis, so they can be logged on the system
> before being rebooted.

well, perhaps we might need some kind of perf_chain in notifier code and
call for it after die_nmi chain, so the customers you mention may add own
chain for being called last.

>
> Cheers,
> Don
>
-- Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Don Zickus on 4 Aug 2010 12:30

On Wed, Aug 04, 2010 at 08:10:46PM +0400, Cyrill Gorcunov wrote:
> On Wed, Aug 04, 2010 at 11:50:02AM -0400, Don Zickus wrote:
> ...
> > >
> > > Well, first I guess having Yinghai CC'ed is a bonus ;)
> > > The second thing is that I don't get why perf handler can't be _last_
> > > call in default_do_nmi, if there were any nmi with reason (serr or parity)
> > > I think they should be calling first which of course don't eliminate
> > > the former issue but somewhat make it weaken.
> >
> > Because the reason registers are never set. If they were, then the code
> > wouldn't have to walk the notify_chain. :-)
> >
>
> maybe we're talking about different things. i meant that if there is nmi
> with a reason (from 0x61) the handling of such nmi should be done before
> notify_die I think (if only I not miss something behind).

No we are talking about the same thing. :-) And that code is already
there. The problem is the bits in register 0x61 are not always set
correctly in the case of SERRs (well at least in all the cases I have
dealt with). So you can easily can a flood of unknown nmis from an SERR
and register 0x61 would have the PERR/SERR bits set to 0. Fun, huh?

Cheers,
Don

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: [tip:timers/core] Checkpatch: Warn about unexpectedly long msleep's
Next: genhd, efi: add efi partition metadata to hd_structs