perf: Implement perf_output

Prev: Optimize perf ring-buffer
Next: [PATCH 0/2 v2] mc13783: LED support

From: Steven Rostedt on 19 May 2010 11:40

On Wed, 2010-05-19 at 17:05 +0200, Peter Zijlstra wrote:
> On Wed, 2010-05-19 at 10:47 -0400, Steven Rostedt wrote:
> > On Wed, 2010-05-19 at 09:58 +0200, Peter Zijlstra wrote:
> > > On Wed, 2010-05-19 at 09:21 +0200, Frederic Weisbecker wrote:
> > >
> > > > I'm still not sure what you mean here by this multiplexing. Is
> > > > this about per cpu multiplexing?
> > >
> > > Suppose there's two events attached to the same tracepoint. Will you
> > > write the tracepoint twice and risk different data in each, or will you
> > > do it once and copy it into each buffer?
> >
> > Is this because the same function deals with the same tracepoint, and
> > has difficulty in knowing which event it is dealing with?
>
> No, but suppose the tracepoint has a racy expression in it. Having to
> evaluate { assign; } multiple times could yield different results, which
> in turn means you have to run the filter multiple times too, etc..

I'm still a bit confused by what you mean here. Could you show an
example?

>
> Although I suppose you could delay the commit of the first even and copy
> from there into the next events, but that might give rather messy code.
>
> > Note, the shrinking of the TRACE_EVENT() code that I pushed (and I'm
> > hoping makes it to 35 since it lays the ground work for lots of features
> > on top of TRACE_EVENT()), allows you to pass private data to each probe
> > registered to the tracepoint. Letting the same function handle two
> > different activities, or different tracepoints.
>
> tracepoint_probe_register() is useless, it requires scheduling. I
> currently register a probe on pref_event creation and then maintain a
> per-cpu hlist of active events.

When is perf_event creation? When the user runs the code or at boot up?

>
> > > > There is another problem. We need something like
> > > > perf_output_discard() in case the filter reject the event (which
> > > > must be filled for this check to happen).
> > >
> > > Yeah, I utterly hate that, I opted to let anything with a filter take
> > > the slow path. Not only would I have to add a discard, but I'd have to
> > > decrement the counter as well, which is a big no-no.
> >
> > Hmm, this would impact performance on system wide recording of events
> > that are filtered. One would think adding a filter would speed things
> > up, not slow it down.
>
> Depends, actually running the filter and backing out might take more
> time than simply logging it, esp if you've already done all of the work
> and only lack a commit.

Hmm, could be, don't know for sure. I just want to keep the macro magic
to a minimum ;-)

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Peter Zijlstra on 19 May 2010 12:00

On Wed, 2010-05-19 at 11:38 -0400, Steven Rostedt wrote:

> > No, but suppose the tracepoint has a racy expression in it. Having to
> > evaluate { assign; } multiple times could yield different results, which
> > in turn means you have to run the filter multiple times too, etc..
>
> I'm still a bit confused by what you mean here. Could you show an
> example?

Well, suppose { assign; } contains:

entry->foo = atomic_read(&bar);

Now suppose you have multiple active consumers of the tracepoint, either
you do the evaluation once and copy that around, or you do it multiple
times and end up with different results.

> > Although I suppose you could delay the commit of the first even and copy
> > from there into the next events, but that might give rather messy code.
> >
> > > Note, the shrinking of the TRACE_EVENT() code that I pushed (and I'm
> > > hoping makes it to 35 since it lays the ground work for lots of features
> > > on top of TRACE_EVENT()), allows you to pass private data to each probe
> > > registered to the tracepoint. Letting the same function handle two
> > > different activities, or different tracepoints.
> >
> > tracepoint_probe_register() is useless, it requires scheduling. I
> > currently register a probe on pref_event creation and then maintain a
> > per-cpu hlist of active events.
>
> When is perf_event creation? When the user runs the code or at boot up?

sys_perf_counter_open()

And an event could be per task, so it needs to be scheduled along with
the task context, try doing that with probes ;-)

> Hmm, could be, don't know for sure. I just want to keep the macro magic
> to a minimum ;-)

Right, but filters evaluated at the point where you basically already
done all the hard work simply don't make much sense in my book.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Steven Rostedt on 19 May 2010 12:10

On Wed, 2010-05-19 at 17:50 +0200, Peter Zijlstra wrote:
> On Wed, 2010-05-19 at 11:38 -0400, Steven Rostedt wrote:
>
> > > No, but suppose the tracepoint has a racy expression in it. Having to
> > > evaluate { assign; } multiple times could yield different results, which
> > > in turn means you have to run the filter multiple times too, etc..
> >
> > I'm still a bit confused by what you mean here. Could you show an
> > example?
>
> Well, suppose { assign; } contains:
>
> entry->foo = atomic_read(&bar);
>
> Now suppose you have multiple active consumers of the tracepoint, either
> you do the evaluation once and copy that around, or you do it multiple
> times and end up with different results.

OK, this is where I'm getting a bit lost. The "multiple active
consumers". Is this multiple instances of perf? Or perf doing multiple
things with that event using different buffers?

>
> > > Although I suppose you could delay the commit of the first even and copy
> > > from there into the next events, but that might give rather messy code.
> > >
> > > > Note, the shrinking of the TRACE_EVENT() code that I pushed (and I'm
> > > > hoping makes it to 35 since it lays the ground work for lots of features
> > > > on top of TRACE_EVENT()), allows you to pass private data to each probe
> > > > registered to the tracepoint. Letting the same function handle two
> > > > different activities, or different tracepoints.
> > >
> > > tracepoint_probe_register() is useless, it requires scheduling. I
> > > currently register a probe on pref_event creation and then maintain a
> > > per-cpu hlist of active events.
> >
> > When is perf_event creation? When the user runs the code or at boot up?
>
> sys_perf_counter_open()
>
> And an event could be per task, so it needs to be scheduled along with
> the task context, try doing that with probes ;-)

Ah, this is basically the same thing that ftrace does too. It only
enables the tracepoint (or function tracer) at initiation of the trace,
and uses things like a hash table to determine if the event (or
function) should be traced or not.

>
> > Hmm, could be, don't know for sure. I just want to keep the macro magic
> > to a minimum ;-)
>
> Right, but filters evaluated at the point where you basically already
> done all the hard work simply don't make much sense in my book.

Well, the hard work was just to reserve the buffer, which is under 100ns
to do. But we still need the assign, because the filters compare the
result of those assigns.

I guess you are saying that if we have a filter, we need to do the
assign to a temporary buffer, evaluate, and then decide if we should
record it (via copy) or not.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Peter Zijlstra on 19 May 2010 12:20

On Wed, 2010-05-19 at 12:08 -0400, Steven Rostedt wrote:
> > Now suppose you have multiple active consumers of the tracepoint, either
> > you do the evaluation once and copy that around, or you do it multiple
> > times and end up with different results.
>
> OK, this is where I'm getting a bit lost. The "multiple active
> consumers". Is this multiple instances of perf? Or perf doing multiple
> things with that event using different buffers?

Multiple perf events of the same tracepoint, basically what you would en
up with if you were to allow multiple buffers.

Say task A and B both sample C's sched:sched_wakeup events. Then the
tracepoint will have two active perf_events hanging from it and we need
to fill two buffers.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Steven Rostedt on 19 May 2010 12:30

On Wed, 2010-05-19 at 18:15 +0200, Peter Zijlstra wrote:
> On Wed, 2010-05-19 at 12:08 -0400, Steven Rostedt wrote:
> > > Now suppose you have multiple active consumers of the tracepoint, either
> > > you do the evaluation once and copy that around, or you do it multiple
> > > times and end up with different results.
> >
> > OK, this is where I'm getting a bit lost. The "multiple active
> > consumers". Is this multiple instances of perf? Or perf doing multiple
> > things with that event using different buffers?
>
> Multiple perf events of the same tracepoint, basically what you would en
> up with if you were to allow multiple buffers.
>
> Say task A and B both sample C's sched:sched_wakeup events. Then the
> tracepoint will have two active perf_events hanging from it and we need
> to fill two buffers.

OK, so I would let them evaluate separately. If they do have two
different results, then that's fine, because the view of an event could
possible be different. The &bar may change in the two instances, but how
much does that matter? Which version of &bar is correct anyway?

How do you handle the multiple readers then? The call to record the
event copies to each buffer that is registered for that event?

If more than one buffer is attached to an event, you could also work to
directly write to one, and then copy directly from that buffer to the
others.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3
Prev: Optimize perf ring-buffer
Next: [PATCH 0/2 v2] mc13783: LED support

perf: Implement perf_output_addr()