From: Peter Zijlstra on
On Tue, 2010-05-04 at 14:38 +0200, Tejun Heo wrote:
>
> This patchset does the following two things.
>
> * 0001-0007: Unify the three tracers (tracepoints, perf_events and
> preempt/sched notifiers) in scheduler.

Right, so I really don't like the SCHED_EVENT() thing much, esp the 3
different function postfixes.

> * 0008-0012: Move perf hooks in sched on top of tracepoints if TPs are
> enabled.

This seems to add a terrible amount of overhead for no particular
reason.

I know Ingo was pushing to consolidate things, but imho this cure is
worse than the ailment.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Wed, 2010-05-05 at 07:00 +0200, Tejun Heo wrote:
> Hello,
>
> On 05/04/2010 07:29 PM, Peter Zijlstra wrote:
> >> * 0001-0007: Unify the three tracers (tracepoints, perf_events and
> >> preempt/sched notifiers) in scheduler.
> >
> > Right, so I really don't like the SCHED_EVENT() thing much, esp the 3
> > different function postfixes.
>
> Yeap, it's not the prettiest thing. I thought about renaming all of
> them so that they share the same postfix but then again I need a way
> to tell the script which to enable which is the easiest with
> specifying postfixes as macro argument. Any better ideas?

Add more NOP functions for the missing bits I guess, but that gets
cumbersome too.

> >> * 0008-0012: Move perf hooks in sched on top of tracepoints if TPs are
> >> enabled.
> >
> > This seems to add a terrible amount of overhead for no particular
> > reason.
>
> Hmm... What overhead?

Well, the perf_{inc,dec}_nr_events() stuff, that's massively painful. If
that is the price to pay for using tracepoints then I'd rather not.

Also, the whole magic dance to keep !CONFIG_TRACEPOINTS working doesn't
make the code any saner.

> What matters more here is avoiding the overhead when perf or whatever
> tracing mechanism is disabled. When both perf and TPs are compiled in
> but not enabled, moving perf hooks on top of TPs means that the sched
> core code doesn't have to call into extra functions for perf and TPs
> can be nooped. ie. less overhead. Also, even when perf is enabled,
> there isn't any inherent extra runtime overhead other than during
> enabling/disabling which again is a much colder path.

Well, in the current code we get the call overhead, but all perf
functions bail on !nr_events. We could invert that and have an inline
function check nr_events and only then do the call.

> The only place where noticeable overhead is added is the extra pair of
> irq enable/disable that I added for sched_out hook. After glancing
> through what perf does during context switch, I thought that overhead
> wouldn't be anything noticeable but if it is, they can definitely be
> separated. The only purpose of that change was colocating sched
> notifiers and perf hooks.

Right, recent Intel chips seem to do much better at IRQ disable than say
P4 and IA64, and poking at the PMU MSRs is way more painful, not sure
how it would balance out for PowerPC (or anything other) though.

> Also, if a few more perf hooks are converted to use the same
> mechanism, it would be possible to put perf properly on top of TPs
> other than init/exit code paths which will be cleaner for the rest of
> the kernel and there won't be any extra runtime overhead.
>
> The code will be much cleaner if perf depends on TPs. With perf hooks
> completely removed, there won't be much point in SCHED_EVENT and the
> messy ifdefs in perf can also go away. Would this be a possibility?

Well, I already utterly hate that x86 can't build with !
CONFIG_PERF_EVENTS, also requiring CONFIG_TRACEPOINTS is going the wrong
way.

I've always explicitly avoided depending on tracepoints, and I'd very
much like to keep it that way.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Wed, 2010-05-05 at 11:32 +0200, Tejun Heo wrote:
>
> > Well, I already utterly hate that x86 can't build with !
> > CONFIG_PERF_EVENTS, also requiring CONFIG_TRACEPOINTS is going the wrong
> > way.
> >
> > I've always explicitly avoided depending on tracepoints, and I'd very
> > much like to keep it that way.
>
> I was wondering the other way around - ie. the possibility to make
> perf optional and maybe even as a module which depends on TPs, which
> would be nicer than the current situation and make the code less
> cluttered too.

I really really hate making perf rely on tracepoints.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Wed, 2010-05-05 at 11:54 +0200, Tejun Heo wrote:
> Hello,
>
> On 05/05/2010 11:51 AM, Peter Zijlstra wrote:
> >> I was wondering the other way around - ie. the possibility to make
> >> perf optional and maybe even as a module which depends on TPs, which
> >> would be nicer than the current situation and make the code less
> >> cluttered too.
> >
> > I really really hate making perf rely on tracepoints.
>
> Hmmm.... may I ask why?

Because it will bloat the kernel for no reason. And I don't like the
endless indirections either.

Like said, I already really dislike x86 won't even build without
CONFIG_PERF_EVENTS, adding a hard requirement on TRACEEVENTS will only
make the whole thing even worse.

Sure, most distro configs will include both perf and tracepoints, but I
don't want to burden everybody who wants to use perf with the tracepoint
overhead.

As it stands I'd argue to simply drop this whole idea. The SCHED_EVENT()
thing doesn't look like its worth the obfuscation and I'm very much
opposed to making perf and sched_notifiers rely on tracepoints.

I don't see the few direct functions calls we have now as a big problem
and it sure as hell is a lot easier to read than the whole tracepoint
indirection mess.

If you care about the call overhead of the perf hooks, we can do the
same immediate value optimization to them as would be done to the
tracepoints.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Wed, 2010-05-05 at 18:55 +0200, Ingo Molnar wrote:
> * Tejun Heo <tj(a)kernel.org> wrote:
>
> > Hello,
> >
> > On 05/05/2010 01:38 PM, Peter Zijlstra wrote:
> > > As it stands I'd argue to simply drop this whole idea. The SCHED_EVENT()
> > > thing doesn't look like its worth the obfuscation and I'm very much
> > > opposed to making perf and sched_notifiers rely on tracepoints.
> >
> > Hmmm... okay. Well, this thing is born out of Ingo's suggestion that for
> > more sched notifiers to be added notification mechanisms need to be
> > consolidated. As long as I can get those two notifiers needed for cmwq,
> > it's okay with me. Ingo, what do you think?
>
> I'd much rather see the *_EVENT() APIs used - but enhanced to address Peter's
> objections. One change would be to add a DEFINE_EVENT_ABI() variant, which
> would be called via trace_abi_*() calls. That way we always know they are
> 'hardwired' events in the extreme.
>
> That then would allow the software events to be consolidated.
>
> Peter?

Well, I'd much rather just see a direct call in the code than having to
reverse engineer wth hangs onto that _EVENT() junk.

Also, we don't need ABI muck for this.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/