From: Masami Hiramatsu on
Peter Zijlstra wrote:
> PEBS always reports the IP+1, that is the instruction after the one
> that got sampled, cure this by using the LBR to reliably rewind the
> instruction stream.

Hmm, does PEBS always report one byte after the end address of the
sampled instruction? Or the instruction which will be executed next
step?

[...]
> +#include <asm/insn.h>
> +
> +#define MAX_INSN_SIZE 16

Hmm, we'd better integrate these kinds of definitions into
asm/insn.h... (several features define it)

> +
> +static void intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
> +{
> +#if 0
> + /*
> + * Borken, makes the machine expode at times trying to
> + * derefence funny userspace addresses.
> + *
> + * Should we always fwd decode from @to, instead of trying
> + * to rewind as implemented?
> + */
> +
> + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> + unsigned long from = cpuc->lbr_entries[0].from;
> + unsigned long to = cpuc->lbr_entries[0].to;

Ah, I see. For branch instruction case, we can use LBR to
find previous IP...

> + unsigned long ip = regs->ip;
> + u8 buf[2*MAX_INSN_SIZE];
> + u8 *kaddr;
> + int i;
> +
> + if (from && to) {
> + /*
> + * We sampled a branch insn, rewind using the LBR stack
> + */
> + if (ip == to) {
> + regs->ip = from;
> + return;
> + }
> + }
> +
> + if (user_mode(regs)) {
> + int bytes = copy_from_user_nmi(buf,
> + (void __user *)(ip - MAX_INSN_SIZE),
> + 2*MAX_INSN_SIZE);
> +

maybe, you'd better check the source address range is within
the user address range. e.g. ip < MAX_INSN_SIZE.

> + /*
> + * If we fail to copy the insn stream, give up
> + */
> + if (bytes != 2*MAX_INSN_SIZE)
> + return;
> +
> + kaddr = buf;
> + } else
> + kaddr = (void *)(ip - MAX_INSN_SIZE);

It also needs to be checked this address within kernel text.

> +
> + /*
> + * Try to find the longest insn ending up at the given IP
> + */
> + for (i = MAX_INSN_SIZE; i > 0; i--) {
> + struct insn insn;
> +
> + kernel_insn_init(&insn, kaddr + MAX_INSN_SIZE - i);
> + insn_get_length(&insn);
> + if (insn.length == i) {
> + regs->ip -= i;
> + return;
> + }
> + }

Hmm, this will not work correctly on x86, since the decoder can
miss-decode the tail bytes of previous instruction as prefix bytes. :(

Thus, if you want to rewind instruction stream, you need to decode
a function (or basic block) entirely.

Thank you,

> +
> + /*
> + * We failed to find a match for the previous insn.. give up
> + */
> +#endif
> +}
> +
> static int intel_pmu_save_and_restart(struct perf_event *event);
> static void intel_pmu_disable_event(struct perf_event *event);
>
> @@ -458,6 +532,8 @@ static void intel_pmu_drain_pebs_core(st
>
> PEBS_TO_REGS(at, &regs);
>
> + intel_pmu_pebs_fixup_ip(&regs);
> +
> if (perf_event_overflow(event, 1, data, &regs))
> intel_pmu_disable_event(event);
>
> @@ -519,6 +595,7 @@ static void intel_pmu_drain_pebs_nhm(str
> data->period = event->hw.last_period;
>
> PEBS_TO_REGS(at, &regs);
> + intel_pmu_pebs_fixup_ip(&regs);
>
> if (perf_event_overflow(event, 1, data, &regs))
> intel_pmu_disable_event(event);
>
> --

--
Masami Hiramatsu
e-mail: mhiramat(a)redhat.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Wed, 2010-03-03 at 13:05 -0500, Masami Hiramatsu wrote:
> Peter Zijlstra wrote:
> > PEBS always reports the IP+1, that is the instruction after the one
> > that got sampled, cure this by using the LBR to reliably rewind the
> > instruction stream.
>
> Hmm, does PEBS always report one byte after the end address of the
> sampled instruction? Or the instruction which will be executed next
> step?

The next instruction, its trap like.

> [...]
> > +#include <asm/insn.h>
> > +
> > +#define MAX_INSN_SIZE 16
>
> Hmm, we'd better integrate these kinds of definitions into
> asm/insn.h... (several features define it)

Agreed, I'll look at doing a patch to collect them all into asm/insn.h
if nobody beats me to it :-)

> > +
> > +static void intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
> > +{
> > +#if 0
> > + /*
> > + * Borken, makes the machine expode at times trying to
> > + * derefence funny userspace addresses.
> > + *
> > + * Should we always fwd decode from @to, instead of trying
> > + * to rewind as implemented?
> > + */
> > +
> > + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> > + unsigned long from = cpuc->lbr_entries[0].from;
> > + unsigned long to = cpuc->lbr_entries[0].to;
>
> Ah, I see. For branch instruction case, we can use LBR to
> find previous IP...

Right, we use the LBR to find the basic block.

> > + unsigned long ip = regs->ip;
> > + u8 buf[2*MAX_INSN_SIZE];
> > + u8 *kaddr;
> > + int i;
> > +
> > + if (from && to) {
> > + /*
> > + * We sampled a branch insn, rewind using the LBR stack
> > + */
> > + if (ip == to) {
> > + regs->ip = from;
> > + return;
> > + }
> > + }
> > +
> > + if (user_mode(regs)) {
> > + int bytes = copy_from_user_nmi(buf,
> > + (void __user *)(ip - MAX_INSN_SIZE),
> > + 2*MAX_INSN_SIZE);
> > +
>
> maybe, you'd better check the source address range is within
> the user address range. e.g. ip < MAX_INSN_SIZE.

Not only that, I realized user_mode() checks regs->cs, which is not set
by the PEBS code, so I added some helpers.

> > +
> > + /*
> > + * Try to find the longest insn ending up at the given IP
> > + */
> > + for (i = MAX_INSN_SIZE; i > 0; i--) {
> > + struct insn insn;
> > +
> > + kernel_insn_init(&insn, kaddr + MAX_INSN_SIZE - i);
> > + insn_get_length(&insn);
> > + if (insn.length == i) {
> > + regs->ip -= i;
> > + return;
> > + }
> > + }
>
> Hmm, this will not work correctly on x86, since the decoder can
> miss-decode the tail bytes of previous instruction as prefix bytes. :(
>
> Thus, if you want to rewind instruction stream, you need to decode
> a function (or basic block) entirely.

Something like the below?

#ifdef CONFIG_X86_32
static bool kernel_ip(unsigned long ip)
{
return ip > TASK_SIZE;
}
#else
static bool kernel_ip(unsigned long ip)
{
return (long)ip < 0;
}
#endif

static int intel_pmu_pebs_fixup_ip(unsigned long *ipp)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
unsigned long from = cpuc->lbr_entries[0].from;
unsigned long old_to, to = cpuc->lbr_entries[0].to;
unsigned long ip = *ipp;
int i;

/*
* We don't need to fixup if the PEBS assist is fault like
*/
if (!x86_pmu.intel_perf_capabilities.pebs_trap)
return 0;

if (!cpuc->lbr_stack.nr || !from || !to)
return 0;

if (ip < to)
return 0;

/*
* We sampled a branch insn, rewind using the LBR stack
*/
if (ip == to) {
*ipp = from;
return 1;
}

do {
struct insn insn;
u8 buf[MAX_INSN_SIZE];
void *kaddr;

old_to = to;
if (!kernel_ip(ip)) {
int bytes = copy_from_user_nmi(buf, (void __user *)to,
MAX_INSN_SIZE);

if (bytes != MAX_INSN_SIZE)
return 0;

kaddr = buf;
} else kaddr = (void *)to;

kernel_insn_init(&insn, kaddr);
insn_get_length(&insn);
to += insn.length;
} while (to < ip);

if (to == ip) {
*ipp = old_to;
return 1;
}

return 0;
}

I thought about exposing the success of this fixup as a PERF_RECORD_MISC
bit.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Masami Hiramatsu on
Peter Zijlstra wrote:
> On Wed, 2010-03-03 at 13:05 -0500, Masami Hiramatsu wrote:
>> Peter Zijlstra wrote:
>>> PEBS always reports the IP+1, that is the instruction after the one
>>> that got sampled, cure this by using the LBR to reliably rewind the
>>> instruction stream.
>>
>> Hmm, does PEBS always report one byte after the end address of the
>> sampled instruction? Or the instruction which will be executed next
>> step?
>
> The next instruction, its trap like.
>
>> [...]
>>> +#include <asm/insn.h>
>>> +
>>> +#define MAX_INSN_SIZE 16
>>
>> Hmm, we'd better integrate these kinds of definitions into
>> asm/insn.h... (several features define it)
>
> Agreed, I'll look at doing a patch to collect them all into asm/insn.h
> if nobody beats me to it :-)

At least kprobes doesn't :)

>>> +
>>> +static void intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
>>> +{
>>> +#if 0
>>> + /*
>>> + * Borken, makes the machine expode at times trying to
>>> + * derefence funny userspace addresses.
>>> + *
>>> + * Should we always fwd decode from @to, instead of trying
>>> + * to rewind as implemented?
>>> + */
>>> +
>>> + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
>>> + unsigned long from = cpuc->lbr_entries[0].from;
>>> + unsigned long to = cpuc->lbr_entries[0].to;
>>
>> Ah, I see. For branch instruction case, we can use LBR to
>> find previous IP...
>
> Right, we use the LBR to find the basic block.

Hm, that's a good idea :)

>>> + unsigned long ip = regs->ip;
>>> + u8 buf[2*MAX_INSN_SIZE];
>>> + u8 *kaddr;
>>> + int i;
>>> +
>>> + if (from && to) {
>>> + /*
>>> + * We sampled a branch insn, rewind using the LBR stack
>>> + */
>>> + if (ip == to) {
>>> + regs->ip = from;
>>> + return;
>>> + }
>>> + }
>>> +
>>> + if (user_mode(regs)) {
>>> + int bytes = copy_from_user_nmi(buf,
>>> + (void __user *)(ip - MAX_INSN_SIZE),
>>> + 2*MAX_INSN_SIZE);
>>> +
>>
>> maybe, you'd better check the source address range is within
>> the user address range. e.g. ip < MAX_INSN_SIZE.
>
> Not only that, I realized user_mode() checks regs->cs, which is not set
> by the PEBS code, so I added some helpers.
>
>>> +
>>> + /*
>>> + * Try to find the longest insn ending up at the given IP
>>> + */
>>> + for (i = MAX_INSN_SIZE; i > 0; i--) {
>>> + struct insn insn;
>>> +
>>> + kernel_insn_init(&insn, kaddr + MAX_INSN_SIZE - i);
>>> + insn_get_length(&insn);
>>> + if (insn.length == i) {
>>> + regs->ip -= i;
>>> + return;
>>> + }
>>> + }
>>
>> Hmm, this will not work correctly on x86, since the decoder can
>> miss-decode the tail bytes of previous instruction as prefix bytes. :(
>>
>> Thus, if you want to rewind instruction stream, you need to decode
>> a function (or basic block) entirely.
>
> Something like the below?

Great! it looks good to me.
Yeah, LBR.to may always smaller than current ip (if no one disabled LBR).

Thank you,

>
> #ifdef CONFIG_X86_32
> static bool kernel_ip(unsigned long ip)
> {
> return ip > TASK_SIZE;
> }
> #else
> static bool kernel_ip(unsigned long ip)
> {
> return (long)ip < 0;
> }
> #endif
>
> static int intel_pmu_pebs_fixup_ip(unsigned long *ipp)
> {
> struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> unsigned long from = cpuc->lbr_entries[0].from;
> unsigned long old_to, to = cpuc->lbr_entries[0].to;
> unsigned long ip = *ipp;
> int i;
>
> /*
> * We don't need to fixup if the PEBS assist is fault like
> */
> if (!x86_pmu.intel_perf_capabilities.pebs_trap)
> return 0;
>
> if (!cpuc->lbr_stack.nr || !from || !to)
> return 0;
>
> if (ip < to)
> return 0;
>
> /*
> * We sampled a branch insn, rewind using the LBR stack
> */
> if (ip == to) {
> *ipp = from;
> return 1;
> }
>
> do {
> struct insn insn;
> u8 buf[MAX_INSN_SIZE];
> void *kaddr;
>
> old_to = to;
> if (!kernel_ip(ip)) {
> int bytes = copy_from_user_nmi(buf, (void __user *)to,
> MAX_INSN_SIZE);
>
> if (bytes != MAX_INSN_SIZE)
> return 0;
>
> kaddr = buf;
> } else kaddr = (void *)to;
>
> kernel_insn_init(&insn, kaddr);
> insn_get_length(&insn);
> to += insn.length;
> } while (to < ip);
>
> if (to == ip) {
> *ipp = old_to;
> return 1;
> }
>
> return 0;
> }
>
> I thought about exposing the success of this fixup as a PERF_RECORD_MISC
> bit.
>

--
Masami Hiramatsu
e-mail: mhiramat(a)redhat.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Stephane Eranian on
I think systematically and transparently using LBR to correct PEBS off-by-one
problem is not such a good idea. You've basically highjacked LBR and user
cannot use it in a different way.

There are PEBS+LBR measurements where you care about extracting the LBR data.
There are PEBS measurements where you don't care about getting the correct IP.
I don't necessarily want to pay the price, especially when this could
be done offline
in the tool.


On Wed, Mar 3, 2010 at 10:11 PM, Masami Hiramatsu <mhiramat(a)redhat.com> wrote:
> Peter Zijlstra wrote:
>> On Wed, 2010-03-03 at 13:05 -0500, Masami Hiramatsu wrote:
>>> Peter Zijlstra wrote:
>>>> PEBS always reports the IP+1, that is the instruction after the one
>>>> that got sampled, cure this by using the LBR to reliably rewind the
>>>> instruction stream.
>>>
>>> Hmm, does PEBS always report one byte after the end address of the
>>> sampled instruction? Or the instruction which will be executed next
>>> step?
>>
>> The next instruction, its trap like.
>>
>>> [...]
>>>> +#include <asm/insn.h>
>>>> +
>>>> +#define MAX_INSN_SIZE      16
>>>
>>> Hmm, we'd better integrate these kinds of definitions into
>>> asm/insn.h... (several features define it)
>>
>> Agreed, I'll look at doing a patch to collect them all into asm/insn.h
>> if nobody beats me to it :-)
>
> At least kprobes doesn't :)
>
>>>> +
>>>> +static void intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
>>>> +{
>>>> +#if 0
>>>> +   /*
>>>> +    * Borken, makes the machine expode at times trying to
>>>> +    * derefence funny userspace addresses.
>>>> +    *
>>>> +    * Should we always fwd decode from @to, instead of trying
>>>> +    * to rewind as implemented?
>>>> +    */
>>>> +
>>>> +   struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
>>>> +   unsigned long from = cpuc->lbr_entries[0].from;
>>>> +   unsigned long to = cpuc->lbr_entries[0].to;
>>>
>>> Ah, I see. For branch instruction case, we can use LBR to
>>> find previous IP...
>>
>> Right, we use the LBR to find the basic block.
>
> Hm, that's a good idea :)
>
>>>> +   unsigned long ip = regs->ip;
>>>> +   u8 buf[2*MAX_INSN_SIZE];
>>>> +   u8 *kaddr;
>>>> +   int i;
>>>> +
>>>> +   if (from && to) {
>>>> +           /*
>>>> +            * We sampled a branch insn, rewind using the LBR stack
>>>> +            */
>>>> +           if (ip == to) {
>>>> +                   regs->ip = from;
>>>> +                   return;
>>>> +           }
>>>> +   }
>>>> +
>>>> +   if (user_mode(regs)) {
>>>> +           int bytes = copy_from_user_nmi(buf,
>>>> +                           (void __user *)(ip - MAX_INSN_SIZE),
>>>> +                           2*MAX_INSN_SIZE);
>>>> +
>>>
>>> maybe, you'd better check the source address range is within
>>> the user address range. e.g. ip < MAX_INSN_SIZE.
>>
>> Not only that, I realized user_mode() checks regs->cs, which is not set
>> by the PEBS code, so I added some helpers.
>>
>>>> +
>>>> +   /*
>>>> +    * Try to find the longest insn ending up at the given IP
>>>> +    */
>>>> +   for (i = MAX_INSN_SIZE; i > 0; i--) {
>>>> +           struct insn insn;
>>>> +
>>>> +           kernel_insn_init(&insn, kaddr + MAX_INSN_SIZE - i);
>>>> +           insn_get_length(&insn);
>>>> +           if (insn.length == i) {
>>>> +                   regs->ip -= i;
>>>> +                   return;
>>>> +           }
>>>> +   }
>>>
>>> Hmm, this will not work correctly on x86, since the decoder can
>>> miss-decode the tail bytes of previous instruction as prefix bytes. :(
>>>
>>> Thus, if you want to rewind instruction stream, you need to decode
>>> a function (or basic block) entirely.
>>
>> Something like the below?
>
> Great! it looks good to me.
> Yeah, LBR.to may always smaller than current ip (if no one disabled LBR).
>
> Thank you,
>
>>
>> #ifdef CONFIG_X86_32
>> static bool kernel_ip(unsigned long ip)
>> {
>>         return ip > TASK_SIZE;
>> }
>> #else
>> static bool kernel_ip(unsigned long ip)
>> {
>>         return (long)ip < 0;
>> }
>> #endif
>>
>> static int intel_pmu_pebs_fixup_ip(unsigned long *ipp)
>> {
>>         struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
>>         unsigned long from = cpuc->lbr_entries[0].from;
>>         unsigned long old_to, to = cpuc->lbr_entries[0].to;
>>         unsigned long ip = *ipp;
>>         int i;
>>
>>         /*
>>          * We don't need to fixup if the PEBS assist is fault like
>>          */
>>         if (!x86_pmu.intel_perf_capabilities.pebs_trap)
>>                 return 0;
>>
>>         if (!cpuc->lbr_stack.nr || !from || !to)
>>                 return 0;
>>
>>         if (ip < to)
>>                 return 0;
>>
>>         /*
>>          * We sampled a branch insn, rewind using the LBR stack
>>          */
>>         if (ip == to) {
>>                 *ipp = from;
>>                 return 1;
>>         }
>>
>>         do {
>>                 struct insn insn;
>>                 u8 buf[MAX_INSN_SIZE];
>>                 void *kaddr;
>>
>>                 old_to = to;
>>                 if (!kernel_ip(ip)) {
>>                         int bytes = copy_from_user_nmi(buf, (void __user *)to,
>>                                         MAX_INSN_SIZE);
>>
>>                         if (bytes != MAX_INSN_SIZE)
>>                                 return 0;
>>
>>                         kaddr = buf;
>>                 } else kaddr = (void *)to;
>>
>>                 kernel_insn_init(&insn, kaddr);
>>                 insn_get_length(&insn);
>>                 to += insn.length;
>>         } while (to < ip);
>>
>>         if (to == ip) {
>>                 *ipp = old_to;
>>                 return 1;
>>         }
>>
>>         return 0;
>> }
>>
>> I thought about exposing the success of this fixup as a PERF_RECORD_MISC
>> bit.
>>
>
> --
> Masami Hiramatsu
> e-mail: mhiramat(a)redhat.com
>



--
Stephane Eranian  | EMEA Software Engineering
Google France | 38 avenue de l'Opéra | 75002 Paris
Tel : +33 (0) 1 42 68 53 00
This email may be confidential or privileged. If you received this
communication by mistake, please
don't forward it to anyone else, please erase all copies and
attachments, and please let me know that
it went to the wrong person. Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Wed, 2010-03-03 at 22:50 +0100, Stephane Eranian wrote:

> I think systematically and transparently using LBR to correct PEBS off-by-one
> problem is not such a good idea. You've basically highjacked LBR and user
> cannot use it in a different way.

Well, they could, it just makes scheduling the stuff more interesting.

> There are PEBS+LBR measurements where you care about extracting the LBR data.
> There are PEBS measurements where you don't care about getting the correct IP.
> I don't necessarily want to pay the price, especially when this could
> be done offline in the tool.

There are some people who argue that fixing up that +1 insn issue is
critical, sadly they don't appear to want to argue their case in public.
What we can do is make it optional I guess.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/