From: Don Zickus on
On Wed, Aug 11, 2010 at 04:03:07PM +0200, Robert Richter wrote:
> On 11.08.10 08:44:43, Don Zickus wrote:
> > On Wed, Aug 11, 2010 at 01:10:46PM +0200, Robert Richter wrote:
> > > > > + *
> > > > > + * NOTE: We have no way of knowing if a second NMI was
> > > > > + * actually triggered, so we may accidentally skip a valid
> > > > > + * unknown nmi later.
> > > > > + */
> > > > > + __get_cpu_var(perfctr_skip) +=1;
> > >
> > > ... but this will not work. You have to mark the *absolute* nmi number
> > > here. If you only raise a flag, the next unknown nmi will be dropped,
> > > every. Because, in between there could have been other nmis that
> > > stopped the chain and thus the 'unknown' path is not executed. The
> > > trick in my patch is that you *know*, which nmi you want to skip.
> >
> > I guess I am confused. The way I read your patch was that you assumed the
> > next NMI would be the one you skip and if there was another NMI in between
> > the handled one and the one to skip, you would not skip it (nmi count !=
> > prev + 1) and it would produce an accidental unknown nmi.
>
> That's how it works.
>
> > I tried to change that with my patch by setting the skip flag which would
> > be drained on the next unknown nmi, independent of where it is in the NMI
> > backlog of NMIs.
>
> "setting the skip flag which would be drained on the next unknown nmi"
>
> That's what is wrong, it drops every unknown nmi, no matter when it is

Well as Frederic pointed out the skip variable will never go past one, so
it will only drop at most one unknown nmi.

> detected. In between there could be 1000's of valid other nmis
> handled. You even could have been returned from nmi mode. But still,
> the next unknown nmi will be dropped. Your patch could accumulate also

That was the intent. Can we guarantee that in the rare cases where the
perfctr is generating two nmis, that they will be back-to-back?

I think Huang tried to cap my approach even further my creating a time
window in which the two nmis had to happen. That gives us the flexibility
to handle nmis that are not back to back, but yet deal with the case where
two perfctrs fired but we are unsure if it generated a second nmi and we
falsely set the skip flag.

Cheers,
Don

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Robert Richter on
On 12.08.10 00:00:58, Robert Richter wrote:
> I was debuging this a little more, see version 2 below.

I was testing the patch further, it properly filters perfctr
back-to-back nmis. I was able to reliable detect unknown nmis
triggered by the nmi button during high load perf sessions with
multiple counters, no false positives.

>
> -Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Robert Richter on
On 10.08.10 15:24:28, Cyrill Gorcunov wrote:
> On Tue, Aug 10, 2010 at 09:05:41PM +0200, Robert Richter wrote:
> > On 10.08.10 13:24:51, Cyrill Gorcunov wrote:
> >
> > > It gets masked on NMI arrival, at least for some models (Core Duo, P4,
> > > P6 M and I suspect more theh that, that was the reason oprofile has
> > > it, also there is a note in SDM V3a page 643).
> >
> > Yes, that's right, I never noticed that. Maybe it is better to
> > implement the apic write it in model specific code then.
> >
>
> Perhaps we can make it simplier I think, ie like it was before -- we just
> move it under your new DIE_NMIUNKNOWN, in separate patch of course. Though
> I'm fine with either way.

I do not understand why you want to put this in the 'unknown'
path. Isn't it necessary to unmask the vector with every call of the
nmi handler?

-Robert

>
> (actually it's interesting to know wouldn't we leave lvt masked when
> we hit 'second delayed nmi has arrived' situation, I guess we didn't
> hit it before in real yet :-)
>
> > -Robert
> >
> > --
> > Advanced Micro Devices, Inc.
> > Operating System Research Center
> >
> -- Cyrill
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Don Zickus on
On Thu, Aug 12, 2010 at 12:00:58AM +0200, Robert Richter wrote:
> I was debuging this a little more, see version 2 below.
>
> -Robert
>
> --
>
> From 8bb831af56d118b85fc38e0ddc2e516f7504b9fb Mon Sep 17 00:00:00 2001
> From: Robert Richter <robert.richter(a)amd.com>
> Date: Thu, 5 Aug 2010 16:19:59 +0200
> Subject: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
>
> When perfctrs are running it is valid to have unhandled nmis, two
> events could trigger 'simultaneously' raising two back-to-back
> NMIs. If the first NMI handles both, the latter will be empty and daze
> the CPU.
>
> The solution to avoid an 'unknown nmi' massage in this case was simply
> to stop the nmi handler chain when perfctrs are runnning by stating
> the nmi was handled. This has the drawback that a) we can not detect
> unknown nmis anymore, and b) subsequent nmi handlers are not called.
>
> This patch addresses this. Now, we check this unknown NMI if it could
> be a perfctr back-to-back NMI. Otherwise we pass it and let the kernel
> handle the unknown nmi.

Seems to cover my concerns. Great work!

ACK

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Don Zickus on
On Thu, Aug 12, 2010 at 03:10:07PM +0200, Robert Richter wrote:
> On 12.08.10 00:00:58, Robert Richter wrote:
> > I was debuging this a little more, see version 2 below.
>
> I was testing the patch further, it properly filters perfctr
> back-to-back nmis. I was able to reliable detect unknown nmis
> triggered by the nmi button during high load perf sessions with
> multiple counters, no false positives.

For my own curiousity, what type of high load perf sessions are you using
to test this. I don't know perf well enough to have it generate events
across multiple perfctrs.

Thanks,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/