perf: Precise task / softirq / hardirq filtered stats/profiles [Kernel]

Prev: writeback: initial tracing support
Next: input: mt: Document the MT event slot protocol (rev2)

From: Ingo Molnar on 21 May 2010 11:20

* Frederic Weisbecker <fweisbec(a)gmail.com> wrote:

> Hi,
>
> The new task and irq exclusion handling can let you
> confine tracing and profiling to about everything you
> want.

I fixed the subject line ;-)

'exclusion' is the ABI detail. The feature your patches
implement are to allow 'softirq limited' or 'task-context
limited' or 'hardirq profiling' - which is way cool.

One thing i'd like to see in this feature is for it to
work on pure event counting - i.e. 'perf stat' as well.

This would allow some _very_ precise stats, without IRQ
noise. For example, today we have this kind of noise
in instruction counting:

$ for ((i=0;i<10;i++)); do perf stat -e instructions /bin/true 2>&1 | grep instructions; done
217161 instructions # 0,000 IPC
218591 instructions # 0,000 IPC
223268 instructions # 0,000 IPC
217112 instructions # 0,000 IPC
219392 instructions # 0,000 IPC
216801 instructions # 0,000 IPC
217501 instructions # 0,000 IPC
218565 instructions # 0,000 IPC
218682 instructions # 0,000 IPC
218523 instructions # 0,000 IPC

it it's all that bad at ~2% jitter, but many improvements
we are working on in the kernel are much smaller than 1%.

If we extended your feature to perf stat, we might be able
to get a lot more precise measurements in terms of kernel
optimizations (and kernel bloat).

I'm really curious how accurate your scheme could become
that way. From the above 'few thousands instructions'
noise we might be able to get down to a 'hundreds of
instructions' noise? If so then it would allow us to
measure micro-optimizations in a radically more precise
way.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Peter Zijlstra on 21 May 2010 12:20

On Fri, 2010-05-21 at 17:12 +0200, Ingo Molnar wrote:
> 'exclusion' is the ABI detail. The feature your patches
> implement are to allow 'softirq limited' or 'task-context
> limited' or 'hardirq profiling' - which is way cool.
>
> One thing i'd like to see in this feature is for it to
> work on pure event counting - i.e. 'perf stat' as well.

Its not really exclusion, all it does is discard samples when in the
wrong context (which happens to work reasonably well for all the
swevents, except for the timer ones).

If you really want to do exclusion you have to disable/enable on *IRQ
entry/exit, but I guess that gets to be prohibitive on costs.

Implementing it shouldn't be too hard, just add some hooks to
irq_enter() irq_exit() and __do_softirq(). Each such hook should loop
over all active events and call ->stop/->start.

The only real problem would be poking at the hrtimer events from an
hrtimer interrupt :-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ingo Molnar on 21 May 2010 14:40

* Peter Zijlstra <peterz(a)infradead.org> wrote:

> On Fri, 2010-05-21 at 17:12 +0200, Ingo Molnar wrote:
> > 'exclusion' is the ABI detail. The feature your patches
> > implement are to allow 'softirq limited' or 'task-context
> > limited' or 'hardirq profiling' - which is way cool.
> >
> > One thing i'd like to see in this feature is for it to
> > work on pure event counting - i.e. 'perf stat' as well.
>
> Its not really exclusion, all it does is discard samples
> when in the wrong context (which happens to work
> reasonably well for all the swevents, except for the
> timer ones).
>
> If you really want to do exclusion you have to
> disable/enable on *IRQ entry/exit, but I guess that gets
> to be prohibitive on costs.

Yeah, i know - this is what i tried to allude to in my
other part of my reply:

> > If we extended your feature to perf stat, we might be
> > able to get a lot more precise measurements in terms
> > of kernel optimizations (and kernel bloat).

Right, so there's two ways to do it, one is the
disable/enable what you mention, the other would be to
save the count and then read again and subtract the delta.

( the RDPMC based delta method can be made to work for
sampling as well, even if the NMI hits in the middle of
the softirq or hardirq. )

Two reads might be cheaper than a disable+enable.
Especially if it's done using RDPMC.

We should do it like that, not by discarding samples, and
overhead should be OK as long as we dont do the
disable/enable (or delta read) if the feature is off.

If a simple enable/disable or read/read costs too much
then we need to prod hw makers about it. But it should be
OK i think.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

|
Pages: 1
Prev: writeback: initial tracing support
Next: input: mt: Document the MT event slot protocol (rev2)