From: Stephane Eranian on
On Tue, May 11, 2010 at 4:48 PM, Peter Zijlstra <peterz(a)infradead.org> wrote:
> On Tue, 2010-05-11 at 16:04 +0200, Stephane Eranian wrote:
>> Hi,
>>
>>
>> I am confused by the inheritance cmd line option of perf record:
>>
>> $ perf record -h
>>  usage: perf record [<options>] [<command>]
>>     or: perf record [<options>] -- <command> [<options>]
>>
>>     -e, --event <event>   event selector. use 'perf list' to list
>> available events
>>         --filter <filter>
>>                           event filter
>>     -p, --pid <n>         record events on existing process id
>>     -t, --tid <n>         record events on existing thread id
>>     -r, --realtime <n>    collect data with this RT SCHED_FIFO priority
>>     -R, --raw-samples     collect raw sample records from all opened counters
>>     -a, --all-cpus        system-wide collection from all CPUs
>>     -A, --append          append to the output file to do incremental profiling
>>     -C, --profile_cpu <n>
>>                           CPU to profile on
>>     -f, --force           overwrite existing data file (deprecated)
>>     -c, --count           event period to sample
>>     -o, --output <file>   output file name
>>     -i, --inherit         child tasks inherit counters
>>
>> This leads to believe that by default inheritance in children is off.
>>
>> However, builtin-record.c says:
>>
>> static bool                     inherit                         =   true;
>>
>> If that's the case, what's the point of the -i option?
>
> Right, I think we should invert that, does --no-inherit work?
>
>> Another side effect of inheritance is that in per-thread mode,
>> perf creates as many "sessions" as you have CPUs. So
>> on a 16-way processor, sampling on cycles, perf creates
>> 16 events and 16 x 2-page sampling buffers. That's a lot of
>> resources consumed if I am just interested in monitoring
>> a single-threaded workload.
>
> Right, but I think the default of inherit is right, and once you do that
> you basically have to do the per-task-per-cpu thing, otherwise your
> fancy 16-way will start spending most of its time in cacheline bounces.
>
In that case, don't you think you should also ensure that the buffer is
allocated on the NUMA node of the designated per-thread-per-cpu?
I don't think it is the case today.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Mon, 2010-05-17 at 16:25 +0200, Stephane Eranian wrote:
> > Right, but I think the default of inherit is right, and once you do that
> > you basically have to do the per-task-per-cpu thing, otherwise your
> > fancy 16-way will start spending most of its time in cacheline bounces.
> >
> In that case, don't you think you should also ensure that the buffer is
> allocated on the NUMA node of the designated per-thread-per-cpu?
> I don't think it is the case today.

Yeah, something like the below ought to do I guess..

Almost-Signed-off-by: Peter Zijlstra <a.p.zijlstra(a)chello.nl>
---
kernel/perf_event.c | 17 +++++++++++++++--
1 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 9dbe8cd..85e2d32 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -2288,6 +2288,19 @@ perf_mmap_to_page(struct perf_mmap_data *data, unsigned long pgoff)
return virt_to_page(data->data_pages[pgoff - 1]);
}

+static void *perf_mmap_alloc_page(int cpu)
+{
+ struct page *page;
+ int node;
+
+ node = (cpu == -1) ? cpu : cpu_to_node(cpu);
+ page = alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0);
+ if (!page)
+ return NULL;
+
+ return page_address(page);
+}
+
static struct perf_mmap_data *
perf_mmap_data_alloc(struct perf_event *event, int nr_pages)
{
@@ -2304,12 +2317,12 @@ perf_mmap_data_alloc(struct perf_event *event, int nr_pages)
if (!data)
goto fail;

- data->user_page = (void *)get_zeroed_page(GFP_KERNEL);
+ data->user_page = perf_mmap_alloc_page(event->cpu);
if (!data->user_page)
goto fail_user_page;

for (i = 0; i < nr_pages; i++) {
- data->data_pages[i] = (void *)get_zeroed_page(GFP_KERNEL);
+ data->data_pages[i] = perf_mmap_alloc_page(event->cpu);
if (!data->data_pages[i])
goto fail_data_pages;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/