From: Li Zefan on
Ian Munsie wrote:
> From: Ian Munsie <imunsie(a)au1.ibm.com>
>
> Previously, when tracing was activated through debugfs, regardless of
> which tracing plugin (if any) were activated, the probe_sched_switch and
> probe_sched_wakeup probes from the sched_switch plugin would be
> activated. This appears to have been a hack to use them to record the
> command lines of active processes as they were scheduled.
>
> That approach would suffer if many processes were being scheduled that
> were not generating events as they would consume entries in the
> saved_cmdlines buffer that could otherwise have been used by other
> processes that were actually generating events.
>
> It also had the problem that events could be mis-attributed - in the
> common situation of a process forking then execing a new process, the
> change of the process command would not be noticed for some time after
> the exec until the process was next scheduled.
>
> If the trace was read after the fact this would generally go unnoticed
> because at some point the process would be scheduled and the entry in
> the saved_cmdlines buffer would be updated so that the new command would
> be reported when the trace was eventually read. However, if the events
> were being read live (e.g. through trace_pipe), the events just after
> the exec and before the process was next scheduled would show the
> incorrect command (though the PID would be correct).
>
> This patch removes the sched_switch hack altogether and instead records
> the commands at a more appropriate moment - at the same time the PID of
> the process is recorded (i.e. when an entry on the ring buffer is
> reserved). This means that the recorded command line is much more likely
> to be correct when the trace is read, either live or after the fact, so
> long as the command line still resides in the saved_cmdlines buffer.
>
> It is still not guaranteed to be correct in all situations. For instance
> if the trace is read after the fact rather than live (consider events
> generated by a process before an exec - in the below example they would
> be attributed to sleep rather than stealpid since the entry in
> saved_cmdlines would have changed before the event was read), but this
> is no different to the current situation and the alternative would be to
> store the command line with each and every event.
>
....
>
> Signed-off-by: Ian Munsie <imunsie(a)au1.ibm.com>
> ---
> kernel/trace/trace.c | 3 +--
> kernel/trace/trace_events.c | 11 -----------
> kernel/trace/trace_functions.c | 2 --
> kernel/trace/trace_functions_graph.c | 2 --
> kernel/trace/trace_sched_switch.c | 10 ----------
> 5 files changed, 1 insertions(+), 27 deletions(-)
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 4b1122d..f8458c3 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -1023,8 +1023,6 @@ void tracing_stop(void)
> spin_unlock_irqrestore(&tracing_start_lock, flags);
> }
>
> -void trace_stop_cmdline_recording(void);
> -
> static void trace_save_cmdline(struct task_struct *tsk)
> {
> unsigned pid, idx;
> @@ -1112,6 +1110,7 @@ tracing_generic_entry_update(struct trace_entry *entry, unsigned long flags,
> {
> struct task_struct *tsk = current;
>
> + tracing_record_cmdline(tsk);

Now this function is called everytime a tracepoint is triggered, so
did you run some benchmarks to see if the performance is improved
or even worse?

Another problem in this patch is, tracing_generic_entry_update() is also
called by perf, but cmdline recoding is not needed in perf.

> entry->preempt_count = pc & 0xff;
> entry->pid = (tsk) ? tsk->pid : 0;
> entry->lock_depth = (tsk) ? tsk->lock_depth : 0;
....
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/