From: Avi Kivity on
On 03/16/2010 11:28 AM, Zhang, Yanmin wrote:
> Sorry. I found currently --pid isn't process but a thread (main thread).
>
> Ingo,
>
> Is it possible to support a new parameter or extend --inherit, so 'perf record' and
> 'perf top' could collect data on all threads of a process when the process is running?
>

That seems like a worthwhile addition regardless of this thread.
Profile all current threads and any new ones. It probably makes sense
to call this --pid and rename the existing --pid to --thread.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Avi Kivity on
On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
>
>> Excellent, support for guest kernel != host kernel is critical (I can't
>> remember the last time I ran same kernels).
>>
>> How would we support multiple guests with different kernels?
>>
> With the patch, 'perf kvm report --sort pid" could show
> summary statistics for all guest os instances. Then, use
> parameter --pid of 'perf kvm record' to collect single problematic instance data.
>

That certainly works, though automatic association of guest data with
guest symbols is friendlier.

>>> diff -Nraup linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c
>>> --- linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c 2010-03-16 08:59:11.825295404 +0800
>>> +++ linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c 2010-03-16 09:01:09.976084492 +0800
>>> @@ -26,6 +26,7 @@
>>> #include<linux/sched.h>
>>> #include<linux/moduleparam.h>
>>> #include<linux/ftrace_event.h>
>>> +#include<linux/perf_event.h>
>>> #include "kvm_cache_regs.h"
>>> #include "x86.h"
>>>
>>> @@ -3632,6 +3633,43 @@ static void update_cr8_intercept(struct
>>> vmcs_write32(TPR_THRESHOLD, irr);
>>> }
>>>
>>> +DEFINE_PER_CPU(int, kvm_in_guest) = {0};
>>> +
>>> +static void kvm_set_in_guest(void)
>>> +{
>>> + percpu_write(kvm_in_guest, 1);
>>> +}
>>> +
>>> +static int kvm_is_in_guest(void)
>>> +{
>>> + return percpu_read(kvm_in_guest);
>>> +}
>>>
>>>
>>
>
>> There is already PF_VCPU for this.
>>
> Right, but there is a scope between kvm_guest_enter and really running
> in guest os, where a perf event might overflow. Anyway, the scope is very
> narrow, I will change it to use flag PF_VCPU.
>

There is also a window between setting the flag and calling 'int $2'
where an NMI might happen and be accounted incorrectly.

Perhaps separate the 'int $2' into a direct call into perf and another
call for the rest of NMI handling. I don't see how it would work on svm
though - AFAICT the NMI is held whereas vmx swallows it. I guess NMIs
will be disabled until the next IRET so it isn't racy, just tricky.

>>> +static struct perf_guest_info_callbacks kvm_guest_cbs = {
>>> + .is_in_guest = kvm_is_in_guest,
>>> + .is_user_mode = kvm_is_user_mode,
>>> + .get_guest_ip = kvm_get_guest_ip,
>>> + .reset_in_guest = kvm_reset_in_guest,
>>> +};
>>>
>>>
>> Should be in common code, not vmx specific.
>>
> Right. I discussed with Yangsheng. I will move above data structures and
> callbacks to file arch/x86/kvm/x86.c, and add get_ip, a new callback to
> kvm_x86_ops.
>

You will need access to the vcpu pointer (kvm_rip_read() needs it), you
can put it in a percpu variable. I guess if it's not null, you know
you're in a guest, so no need for PF_VCPU.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Zhang, Yanmin <yanmin_zhang(a)linux.intel.com> wrote:

> On Tue, 2010-03-16 at 15:48 +0800, Zhang, Yanmin wrote:
> > On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote:
> > > On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
> > > > From: Zhang, Yanmin<yanmin_zhang(a)linux.intel.com>
> > > >
> > > > Based on the discussion in KVM community, I worked out the patch to support
> > > > perf to collect guest os statistics from host side. This patch is implemented
> > > > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
> > > > critical bug and provided good suggestions with other guys. I really appreciate
> > > > their kind help.
> > > >
> > > > The patch adds new subcommand kvm to perf.
> > > >
> > > > perf kvm top
> > > > perf kvm record
> > > > perf kvm report
> > > > perf kvm diff
> > > >
> > > > The new perf could profile guest os kernel except guest os user space, but it
> > > > could summarize guest os user space utilization per guest os.
> > > >
> > > > Below are some examples.
> > > > 1) perf kvm top
> > > > [root(a)lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
> > > > --guestmodules=/home/ymzhang/guest/modules top
> > > >
> > > >
> > >
> > Thanks for your kind comments.
> >
> > > Excellent, support for guest kernel != host kernel is critical (I can't
> > > remember the last time I ran same kernels).
> > >
> > > How would we support multiple guests with different kernels?
> > With the patch, 'perf kvm report --sort pid" could show
> > summary statistics for all guest os instances. Then, use
> > parameter --pid of 'perf kvm record' to collect single problematic instance data.
> Sorry. I found currently --pid isn't process but a thread (main thread).
>
> Ingo,
>
> Is it possible to support a new parameter or extend --inherit, so 'perf
> record' and 'perf top' could collect data on all threads of a process when
> the process is running?
>
> If not, I need add a new ugly parameter which is similar to --pid to filter
> out process data in userspace.

Yeah. For maximum utility i'd suggest to extend --pid to include this, and
introduce --tid for the previous, limited-to-a-single-task functionality.

Most users would expect --pid to work like a 'late attach' - i.e. to work like
strace -f or like a gdb attach.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Avi Kivity <avi(a)redhat.com> wrote:

> On 03/16/2010 09:24 AM, Ingo Molnar wrote:
> >* Avi Kivity<avi(a)redhat.com> wrote:
> >
> >>On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
> >>>From: Zhang, Yanmin<yanmin_zhang(a)linux.intel.com>
> >>>
> >>>Based on the discussion in KVM community, I worked out the patch to support
> >>>perf to collect guest os statistics from host side. This patch is implemented
> >>>with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
> >>>critical bug and provided good suggestions with other guys. I really appreciate
> >>>their kind help.
> >>>
> >>>The patch adds new subcommand kvm to perf.
> >>>
> >>> perf kvm top
> >>> perf kvm record
> >>> perf kvm report
> >>> perf kvm diff
> >>>
> >>>The new perf could profile guest os kernel except guest os user space, but it
> >>>could summarize guest os user space utilization per guest os.
> >>>
> >>>Below are some examples.
> >>>1) perf kvm top
> >>>[root(a)lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
> >>>--guestmodules=/home/ymzhang/guest/modules top
> >>>
> >>Excellent, support for guest kernel != host kernel is critical (I
> >>can't remember the last time I ran same kernels).
> >>
> >>How would we support multiple guests with different kernels? Perhaps a
> >>symbol server that perf can connect to (and that would connect to guests in
> >>turn)?
> >The highest quality solution would be if KVM offered a 'guest extension' to
> >the guest kernel's /proc/kallsyms that made it easy for user-space to get this
> >information from an authorative source.
> >
> >That's the main reason why the host side /proc/kallsyms is so popular and so
> >useful: while in theory it's mostly redundant information which can be gleaned
> >from the System.map and other sources of symbol information, it's easily
> >available and is _always_ trustable to come from the host kernel.
> >
> >Separate System.map's have a tendency to go out of sync (or go missing when a
> >devel kernel gets rebuilt, or if a devel package is not installed), and server
> >ports (be that a TCP port space server or an UDP port space mount-point) are
> >both a configuration hassle and are not guest-transparent.
> >
> >So for instrumentation infrastructure (such as perf) we have a large and well
> >founded preference for intrinsic, built-in, kernel-provided information: i.e.
> >a largely 'built-in' and transparent mechanism to get to guest symbols.
>
> The symbol server's client can certainly access the bits through vmchannel.

Ok, that would work i suspect.

Would be nice to have the symbol server in tools/perf/ and also make it easy
to add it to the initrd via a .config switch or so.

That would have basically all of the advantages of being built into the kernel
(availability, configurability, transparency, hackability), while having all
the advantages of a user-space approach as well (flexibility, extensibility,
robustness, ease of maintenance, etc.).

If only we had tools/xorg/ integrated via the initrd that way ;-)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Avi Kivity on
On 03/16/2010 11:53 AM, Ingo Molnar wrote:
> * Avi Kivity<avi(a)redhat.com> wrote:
>
>
>> On 03/16/2010 09:24 AM, Ingo Molnar wrote:
>>
>>> * Avi Kivity<avi(a)redhat.com> wrote:
>>>
>>>
>>>> On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
>>>>
>>>>> From: Zhang, Yanmin<yanmin_zhang(a)linux.intel.com>
>>>>>
>>>>> Based on the discussion in KVM community, I worked out the patch to support
>>>>> perf to collect guest os statistics from host side. This patch is implemented
>>>>> with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
>>>>> critical bug and provided good suggestions with other guys. I really appreciate
>>>>> their kind help.
>>>>>
>>>>> The patch adds new subcommand kvm to perf.
>>>>>
>>>>> perf kvm top
>>>>> perf kvm record
>>>>> perf kvm report
>>>>> perf kvm diff
>>>>>
>>>>> The new perf could profile guest os kernel except guest os user space, but it
>>>>> could summarize guest os user space utilization per guest os.
>>>>>
>>>>> Below are some examples.
>>>>> 1) perf kvm top
>>>>> [root(a)lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
>>>>> --guestmodules=/home/ymzhang/guest/modules top
>>>>>
>>>>>
>>>> Excellent, support for guest kernel != host kernel is critical (I
>>>> can't remember the last time I ran same kernels).
>>>>
>>>> How would we support multiple guests with different kernels? Perhaps a
>>>> symbol server that perf can connect to (and that would connect to guests in
>>>> turn)?
>>>>
>>> The highest quality solution would be if KVM offered a 'guest extension' to
>>> the guest kernel's /proc/kallsyms that made it easy for user-space to get this
>>> information from an authorative source.
>>>
>>> That's the main reason why the host side /proc/kallsyms is so popular and so
>>> useful: while in theory it's mostly redundant information which can be gleaned
>>>
>> >from the System.map and other sources of symbol information, it's easily
>>
>>> available and is _always_ trustable to come from the host kernel.
>>>
>>> Separate System.map's have a tendency to go out of sync (or go missing when a
>>> devel kernel gets rebuilt, or if a devel package is not installed), and server
>>> ports (be that a TCP port space server or an UDP port space mount-point) are
>>> both a configuration hassle and are not guest-transparent.
>>>
>>> So for instrumentation infrastructure (such as perf) we have a large and well
>>> founded preference for intrinsic, built-in, kernel-provided information: i.e.
>>> a largely 'built-in' and transparent mechanism to get to guest symbols.
>>>
>> The symbol server's client can certainly access the bits through vmchannel.
>>
> Ok, that would work i suspect.
>
> Would be nice to have the symbol server in tools/perf/ and also make it easy
> to add it to the initrd via a .config switch or so.
>
> That would have basically all of the advantages of being built into the kernel
> (availability, configurability, transparency, hackability), while having all
> the advantages of a user-space approach as well (flexibility, extensibility,
> robustness, ease of maintenance, etc.).
>

Note, I am not advocating building the vmchannel client into the host
kernel. While that makes everything simpler for the user, it increases
the kernel footprint with all the disadvantages that come with that (any
bug is converted into a host DoS or worse).

So, perf would connect to qemu via (say) a well-known unix domain
socket, which would then talk to the guest kernel.

I know you won't like it, we'll continue to disagree on this unfortunately.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/