perf: Take a hot regs snapshot for trace events [Kernel]

Prev: [PATCH] timbgpio: Fix build.
Next: [PATCH 2/4] nodemask: fix the declaration of NODEMASK_ALLOC()

From: Steven Rostedt on 3 Mar 2010 12:50

On Wed, 2010-03-03 at 18:16 +0100, Peter Zijlstra wrote:

> > This is what I actually was wondering about. Why is it a "perf only"
> > trace point instead of a TRACE_EVENT()?
>
> Because I wanted to make perf usable without having to rely on funny
> tracepoints. That is, I am less worried about committing software
> counters to ABI than I am about TRACE_EVENT(), which still gives me a
> terribly uncomfortable feeling.
>
> Also, building with all CONFIG_TRACE_*=n will still yield a usable perf,
> which is something the embedded people might fancy, all that TRACE stuff
> adds lots of code.

We could make TRACE_EVENT() into a perf only trace point with
CONFIG_TRACE_*=n.

Just saying that it would be nice if ftrace could also see page faults
and such.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Steven Rostedt on 4 Mar 2010 10:20

On Thu, 2010-03-04 at 12:25 +0100, Ingo Molnar wrote:
> * Peter Zijlstra <peterz(a)infradead.org> wrote:
>
> > On Wed, 2010-03-03 at 12:07 -0500, Steven Rostedt wrote:
> > > oops, my bad :-), I thought this was in the x86 arch directory. For the
> > > University, I was helping them with adding trace points for page faults
> > > when I came across this in arch/x86/mm/fault.c:
> > >
> > > perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, 0, regs, address);
> > >
> > >
> > > This is what I actually was wondering about. Why is it a "perf only" trace
> > > point instead of a TRACE_EVENT()?
> >
> > Because I wanted to make perf usable without having to rely on funny
> > tracepoints. That is, I am less worried about committing software counters
> > to ABI than I am about TRACE_EVENT(), which still gives me a terribly
> > uncomfortable feeling.
>
> I'd still like a much less error-prone and work-intense way of doing it.
>
> I'd suggest we simply add a TRACE_EVENT_ABI() for such cases, where we really
> want to expose a tracepoint to tooling, programmatically. Maybe even change
> the usage sites to trace_foo_ABI(), to make it really clear and to make people
> aware of the consequences.

Would this still be available as a normal trace event?

>
> > Also, building with all CONFIG_TRACE_*=n will still yield a usable perf,
> > which is something the embedded people might fancy, all that TRACE stuff
> > adds lots of code.
>
> Not a real issue i suspect when you do lock profiling ...
>
> Or if it is, some debloating might be in order - and the detaching of event
> enumeration and ftrace TRACE_EVENT infrastructure from other ftrace bits. (i
> suggested an '/eventfs' special filesystem before, for nicely layed out
> hierarchy of ftrace/perf events.)

Actually, we already have a way to decouple it.

include/trace/define_trace.h is the file that just adds the tracepoint
that is needed.

include/trace/ftrace.h is the file that does the magic and adds the code
for callbacks and tracing.

The perf hooks probably should not have gone in that file and been put
into a include/trace/perf.h file, and then in define_trace.h we would
add:

#ifdef CONFIG_EVENT_TRACING
#include <trace/ftrace.h>
#endif

+#ifdef CONFIG_PERF_EVENTS
+#include <trace/perf.h>
+#endif

This should be done anyway. But it would also let you decouple ftrace
trace events from perf trace events but still let the two use the same
trace points.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Steven Rostedt on 4 Mar 2010 11:00

On Thu, 2010-03-04 at 16:36 +0100, Ingo Molnar wrote:

> > This should be done anyway. But it would also let you decouple ftrace trace
> > events from perf trace events but still let the two use the same trace
> > points.
>
> I think the main thing would be to have a decoupled /eventfs - basically
> /debug/tracing/events/ moved to "/eventfs" or maybe to "/proc/events/". This
> would make them available more widely, and in a standardized way.

I know Greg once proposed a /tracefs directory. I don't really care how
things work as long as we don't lose functionality. Perhaps we should
have a standard tracefs dir, and have:

/sys/kernel/trace
/sys/kernel/trace/events
/sys/kernel/trace/ftrace
/sys/kernel/trace/perf

This would keep things nicely grouped but separate.

I could also decouple the printing of the formats from ftrace.h and then
in in the define_trace.h:

#ifdef CONFIG_EVENTS
# include <trace/events.h>
# ifdef CONFIG_FTRACE_EVENTS
# include <trace/ftrace.h>
# endif
# ifdef CONFIG_PERF_EVENTS
# include <trace/perf.h>
# endif
#endif

Have the trace/events.h file create the files for the event directory.

But what about the enable and filter files in the event directory. How
would they be attached? Currently these modify the way ftrace works. I'm
assuming that perf enables these with the syscall. Should these files
still be specific to ftrace if enabled?

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Steven Rostedt on 4 Mar 2010 16:40

On Thu, 2010-03-04 at 22:17 +0100, Ingo Molnar wrote:
> * Steven Rostedt <rostedt(a)goodmis.org> wrote:
>

> No, we want to decouple it from 'tracing'. It's events, not tracing. Events
> are more broader, they can be used for RAS, profiling, counting, etc. - not
> just tracing.
>
> Furthermore, we only want /debug/tracing/events really, not the various
> dynamic ftrace controls - those could remain in /debug/tracing/.

I was talking about the files in the events directory:

events/sched/sched_switch/{id,format,enable,filter}

Seems only the format file should go in, and perhaps the id.

I can keep the debug/tracing/events/* as is too, where the format and id
just call the same routines that the eventfs calls, but add the enable
and filter to be specific to ftrace.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Thomas Gleixner on 4 Mar 2010 17:00

On Thu, 4 Mar 2010, Frederic Weisbecker wrote:

> On Thu, Mar 04, 2010 at 04:30:38PM -0500, Steven Rostedt wrote:
> > On Thu, 2010-03-04 at 22:17 +0100, Ingo Molnar wrote:
> > > * Steven Rostedt <rostedt(a)goodmis.org> wrote:
> > >
> >
> > > No, we want to decouple it from 'tracing'. It's events, not tracing. Events
> > > are more broader, they can be used for RAS, profiling, counting, etc. - not
> > > just tracing.
> > >
> > > Furthermore, we only want /debug/tracing/events really, not the various
> > > dynamic ftrace controls - those could remain in /debug/tracing/.
> >
> > I was talking about the files in the events directory:
> >
> > events/sched/sched_switch/{id,format,enable,filter}
> >
> >
> > Seems only the format file should go in, and perhaps the id.
> >
> > I can keep the debug/tracing/events/* as is too, where the format and id
> > just call the same routines that the eventfs calls, but add the enable
> > and filter to be specific to ftrace.
>
>
> The /debug/tracing/events could contain symlinks for the format
> files so that the rest can stay there.

Are you proposing to create another sysfs symlink maze ?

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3
Prev: [PATCH] timbgpio: Fix build.
Next: [PATCH 2/4] nodemask: fix the declaration of NODEMASK_ALLOC()