From: Masami Hiramatsu on
Peter Zijlstra wrote:
> PEBS always reports the IP+1, that is the instruction after the one
> that got sampled, cure this by using the LBR to reliably rewind the
> instruction stream.
>
> CC: Masami Hiramatsu <mhiramat(a)redhat.com>
> CC: Yanmin Zhang <yanmin_zhang(a)linux.intel.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra(a)chello.nl>
> LKML-Reference: <new-submission>
> ---
> arch/x86/include/asm/perf_event.h | 19 ++++++
> arch/x86/kernel/cpu/perf_event.c | 70 ++++++++++++-------------
> arch/x86/kernel/cpu/perf_event_intel.c | 4 -
> arch/x86/kernel/cpu/perf_event_intel_ds.c | 84 +++++++++++++++++++++++++++++-
> include/linux/perf_event.h | 6 ++
> 5 files changed, 144 insertions(+), 39 deletions(-)
>
[...]
> Index: linux-2.6/arch/x86/include/asm/perf_event.h
> ===================================================================
> --- linux-2.6.orig/arch/x86/include/asm/perf_event.h
> +++ linux-2.6/arch/x86/include/asm/perf_event.h
> @@ -136,6 +136,25 @@ extern void perf_events_lapic_init(void)
>
> #define PERF_EVENT_INDEX_OFFSET 0
>
> +/*
> + * Abuse bit 3 of the cpu eflags register to indicate proper PEBS IP fixups.
> + * This flag is otherwise unused and ABI specified to be 0, so nobody should
> + * care what we do with it.
> + */
> +#define PERF_EFLAGS_EXACT (1UL << 3)
> +
> +#define perf_misc_flags(regs) \
> +({ int misc = 0; \
> + if (user_mode(regs)) \
> + misc |= PERF_RECORD_MISC_USER; \
> + else \
> + misc |= PERF_RECORD_MISC_KERNEL; \
> + if (regs->flags & PERF_EFLAGS_EXACT) \
> + misc |= PERF_RECORD_MISC_EXACT; \
> + misc; })
> +
> +#define perf_instruction_pointer(regs) ((regs)->ip)

Hmm, why don't you use instruction_pointer() defined in asm/ptrace.h?
And I couldn't find any user of this macro in this patch...

Others looks good to me :)

Thank you,

--
Masami Hiramatsu
e-mail: mhiramat(a)redhat.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Thu, 2010-03-04 at 11:21 -0500, Masami Hiramatsu wrote:
> Peter Zijlstra wrote:

> > +#define perf_misc_flags(regs) \
> > +({ int misc = 0; \
> > + if (user_mode(regs)) \
> > + misc |= PERF_RECORD_MISC_USER; \
> > + else \
> > + misc |= PERF_RECORD_MISC_KERNEL; \
> > + if (regs->flags & PERF_EFLAGS_EXACT) \
> > + misc |= PERF_RECORD_MISC_EXACT; \
> > + misc; })
> > +
> > +#define perf_instruction_pointer(regs) ((regs)->ip)
>
> Hmm, why don't you use instruction_pointer() defined in asm/ptrace.h?
> And I couldn't find any user of this macro in this patch...

perf_instruction_pointer() is used in kernel/perf_event.c, and yeah I
could have used instruction_pointer() but that's yet another wrapper.

Anyway, Yanmin is poking at doing kvm-guest profiling and will likely
rewrite all of the perf_misc() and perf_instruction_pointer() stuff
soon.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Masami Hiramatsu on
Peter Zijlstra wrote:
> On Thu, 2010-03-04 at 11:21 -0500, Masami Hiramatsu wrote:
>> Peter Zijlstra wrote:
>
>>> +#define perf_misc_flags(regs) \
>>> +({ int misc = 0; \
>>> + if (user_mode(regs)) \
>>> + misc |= PERF_RECORD_MISC_USER; \
>>> + else \
>>> + misc |= PERF_RECORD_MISC_KERNEL; \
>>> + if (regs->flags & PERF_EFLAGS_EXACT) \
>>> + misc |= PERF_RECORD_MISC_EXACT; \
>>> + misc; })
>>> +
>>> +#define perf_instruction_pointer(regs) ((regs)->ip)
>>
>> Hmm, why don't you use instruction_pointer() defined in asm/ptrace.h?
>> And I couldn't find any user of this macro in this patch...
>
> perf_instruction_pointer() is used in kernel/perf_event.c, and yeah I
> could have used instruction_pointer() but that's yet another wrapper.
>
> Anyway, Yanmin is poking at doing kvm-guest profiling and will likely
> rewrite all of the perf_misc() and perf_instruction_pointer() stuff
> soon.

Hmm, still I can't find where it is used (in your patches).
Anyway, would you mean that perf_instruction_pointer() will
have different implementation by Yanmin?

Thank you,

--
Masami Hiramatsu
e-mail: mhiramat(a)redhat.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Thu, 2010-03-04 at 15:54 -0500, Masami Hiramatsu wrote:

> Anyway, would you mean that perf_instruction_pointer() will
> have different implementation by Yanmin?

Yes, he will also be checking if we came from guest context:

http://www.mail-archive.com/kvm(a)vger.kernel.org/msg29830.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Stephane Eranian on
On Thu, Mar 4, 2010 at 9:57 AM, Peter Zijlstra <peterz(a)infradead.org> wrote:
> On Wed, 2010-03-03 at 22:50 +0100, Stephane Eranian wrote:
>
>> I think systematically and transparently using LBR to correct PEBS off-by-one
>> problem is not such a good idea. You've basically highjacked LBR and user
>> cannot use it in a different way.
>
> Well, they could, it just makes scheduling the stuff more interesting.
>
>> There are PEBS+LBR measurements where you care about extracting the LBR data.
>> There are PEBS measurements where you don't care about getting the correct IP.
>> I don't necessarily want to pay the price, especially when this could
>> be done offline in the tool.
>
> There are some people who argue that fixing up that +1 insn issue is
> critical, sadly they don't appear to want to argue their case in public.
> What we can do is make it optional I guess.

I can see why they would want IP, instead of IP+1. But what I am saying
is that there are certain measurements where you need to use LBR in
another way. For instance, you may want to combine PEBS + LBR to capture the
path that leads to a cache miss. For that you would need to configure LBR
to record only call branches. Then you would do the correction of the IP offline
in the tool. In this case, the patch is more important than the IP+1 error.

This is why I think you need to provide a config field to disable IP+1
correction,
and thus free LBR for other usage. I understand this also means you cannot
share the LBR with other competing events (on the same or distinct CPUs),
but that's what event scheduling is good for.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/