From: Avi Kivity on
On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
> From: Zhang, Yanmin<yanmin_zhang(a)linux.intel.com>
>
> Based on the discussion in KVM community, I worked out the patch to support
> perf to collect guest os statistics from host side. This patch is implemented
> with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
> critical bug and provided good suggestions with other guys. I really appreciate
> their kind help.
>
> The patch adds new subcommand kvm to perf.
>
> perf kvm top
> perf kvm record
> perf kvm report
> perf kvm diff
>
> The new perf could profile guest os kernel except guest os user space, but it
> could summarize guest os user space utilization per guest os.
>
> Below are some examples.
> 1) perf kvm top
> [root(a)lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
> --guestmodules=/home/ymzhang/guest/modules top
>
>

Excellent, support for guest kernel != host kernel is critical (I can't
remember the last time I ran same kernels).

How would we support multiple guests with different kernels? Perhaps a
symbol server that perf can connect to (and that would connect to guests
in turn)?

> diff -Nraup linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c
> --- linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c 2010-03-16 08:59:11.825295404 +0800
> +++ linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c 2010-03-16 09:01:09.976084492 +0800
> @@ -26,6 +26,7 @@
> #include<linux/sched.h>
> #include<linux/moduleparam.h>
> #include<linux/ftrace_event.h>
> +#include<linux/perf_event.h>
> #include "kvm_cache_regs.h"
> #include "x86.h"
>
> @@ -3632,6 +3633,43 @@ static void update_cr8_intercept(struct
> vmcs_write32(TPR_THRESHOLD, irr);
> }
>
> +DEFINE_PER_CPU(int, kvm_in_guest) = {0};
> +
> +static void kvm_set_in_guest(void)
> +{
> + percpu_write(kvm_in_guest, 1);
> +}
> +
> +static int kvm_is_in_guest(void)
> +{
> + return percpu_read(kvm_in_guest);
> +}
>

There is already PF_VCPU for this.

> +static struct perf_guest_info_callbacks kvm_guest_cbs = {
> + .is_in_guest = kvm_is_in_guest,
> + .is_user_mode = kvm_is_user_mode,
> + .get_guest_ip = kvm_get_guest_ip,
> + .reset_in_guest = kvm_reset_in_guest,
> +};
>

Should be in common code, not vmx specific.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Avi Kivity <avi(a)redhat.com> wrote:

> On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
> >From: Zhang, Yanmin<yanmin_zhang(a)linux.intel.com>
> >
> >Based on the discussion in KVM community, I worked out the patch to support
> >perf to collect guest os statistics from host side. This patch is implemented
> >with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
> >critical bug and provided good suggestions with other guys. I really appreciate
> >their kind help.
> >
> >The patch adds new subcommand kvm to perf.
> >
> > perf kvm top
> > perf kvm record
> > perf kvm report
> > perf kvm diff
> >
> >The new perf could profile guest os kernel except guest os user space, but it
> >could summarize guest os user space utilization per guest os.
> >
> >Below are some examples.
> >1) perf kvm top
> >[root(a)lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
> >--guestmodules=/home/ymzhang/guest/modules top
> >
>
> Excellent, support for guest kernel != host kernel is critical (I
> can't remember the last time I ran same kernels).
>
> How would we support multiple guests with different kernels? Perhaps a
> symbol server that perf can connect to (and that would connect to guests in
> turn)?

The highest quality solution would be if KVM offered a 'guest extension' to
the guest kernel's /proc/kallsyms that made it easy for user-space to get this
information from an authorative source.

That's the main reason why the host side /proc/kallsyms is so popular and so
useful: while in theory it's mostly redundant information which can be gleaned
from the System.map and other sources of symbol information, it's easily
available and is _always_ trustable to come from the host kernel.

Separate System.map's have a tendency to go out of sync (or go missing when a
devel kernel gets rebuilt, or if a devel package is not installed), and server
ports (be that a TCP port space server or an UDP port space mount-point) are
both a configuration hassle and are not guest-transparent.

So for instrumentation infrastructure (such as perf) we have a large and well
founded preference for intrinsic, built-in, kernel-provided information: i.e.
a largely 'built-in' and transparent mechanism to get to guest symbols.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Zhang, Yanmin on
On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote:
> On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
> > From: Zhang, Yanmin<yanmin_zhang(a)linux.intel.com>
> >
> > Based on the discussion in KVM community, I worked out the patch to support
> > perf to collect guest os statistics from host side. This patch is implemented
> > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
> > critical bug and provided good suggestions with other guys. I really appreciate
> > their kind help.
> >
> > The patch adds new subcommand kvm to perf.
> >
> > perf kvm top
> > perf kvm record
> > perf kvm report
> > perf kvm diff
> >
> > The new perf could profile guest os kernel except guest os user space, but it
> > could summarize guest os user space utilization per guest os.
> >
> > Below are some examples.
> > 1) perf kvm top
> > [root(a)lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
> > --guestmodules=/home/ymzhang/guest/modules top
> >
> >
>
Thanks for your kind comments.

> Excellent, support for guest kernel != host kernel is critical (I can't
> remember the last time I ran same kernels).
>
> How would we support multiple guests with different kernels?
With the patch, 'perf kvm report --sort pid" could show
summary statistics for all guest os instances. Then, use
parameter --pid of 'perf kvm record' to collect single problematic instance data.

> Perhaps a
> symbol server that perf can connect to (and that would connect to guests
> in turn)?

>
> > diff -Nraup linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c
> > --- linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c 2010-03-16 08:59:11.825295404 +0800
> > +++ linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c 2010-03-16 09:01:09.976084492 +0800
> > @@ -26,6 +26,7 @@
> > #include<linux/sched.h>
> > #include<linux/moduleparam.h>
> > #include<linux/ftrace_event.h>
> > +#include<linux/perf_event.h>
> > #include "kvm_cache_regs.h"
> > #include "x86.h"
> >
> > @@ -3632,6 +3633,43 @@ static void update_cr8_intercept(struct
> > vmcs_write32(TPR_THRESHOLD, irr);
> > }
> >
> > +DEFINE_PER_CPU(int, kvm_in_guest) = {0};
> > +
> > +static void kvm_set_in_guest(void)
> > +{
> > + percpu_write(kvm_in_guest, 1);
> > +}
> > +
> > +static int kvm_is_in_guest(void)
> > +{
> > + return percpu_read(kvm_in_guest);
> > +}
> >
>

> There is already PF_VCPU for this.
Right, but there is a scope between kvm_guest_enter and really running
in guest os, where a perf event might overflow. Anyway, the scope is very
narrow, I will change it to use flag PF_VCPU.

>
> > +static struct perf_guest_info_callbacks kvm_guest_cbs = {
> > + .is_in_guest = kvm_is_in_guest,
> > + .is_user_mode = kvm_is_user_mode,
> > + .get_guest_ip = kvm_get_guest_ip,
> > + .reset_in_guest = kvm_reset_in_guest,
> > +};
> >
>
> Should be in common code, not vmx specific.
Right. I discussed with Yangsheng. I will move above data structures and
callbacks to file arch/x86/kvm/x86.c, and add get_ip, a new callback to
kvm_x86_ops.

Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Zhang, Yanmin on
On Tue, 2010-03-16 at 15:48 +0800, Zhang, Yanmin wrote:
> On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote:
> > On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
> > > From: Zhang, Yanmin<yanmin_zhang(a)linux.intel.com>
> > >
> > > Based on the discussion in KVM community, I worked out the patch to support
> > > perf to collect guest os statistics from host side. This patch is implemented
> > > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
> > > critical bug and provided good suggestions with other guys. I really appreciate
> > > their kind help.
> > >
> > > The patch adds new subcommand kvm to perf.
> > >
> > > perf kvm top
> > > perf kvm record
> > > perf kvm report
> > > perf kvm diff
> > >
> > > The new perf could profile guest os kernel except guest os user space, but it
> > > could summarize guest os user space utilization per guest os.
> > >
> > > Below are some examples.
> > > 1) perf kvm top
> > > [root(a)lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
> > > --guestmodules=/home/ymzhang/guest/modules top
> > >
> > >
> >
> Thanks for your kind comments.
>
> > Excellent, support for guest kernel != host kernel is critical (I can't
> > remember the last time I ran same kernels).
> >
> > How would we support multiple guests with different kernels?
> With the patch, 'perf kvm report --sort pid" could show
> summary statistics for all guest os instances. Then, use
> parameter --pid of 'perf kvm record' to collect single problematic instance data.
Sorry. I found currently --pid isn't process but a thread (main thread).

Ingo,

Is it possible to support a new parameter or extend --inherit, so 'perf record' and
'perf top' could collect data on all threads of a process when the process is running?

If not, I need add a new ugly parameter which is similar to --pid to filter out process
data in userspace.

Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Avi Kivity on
On 03/16/2010 09:24 AM, Ingo Molnar wrote:
> * Avi Kivity<avi(a)redhat.com> wrote:
>
>
>> On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
>>
>>> From: Zhang, Yanmin<yanmin_zhang(a)linux.intel.com>
>>>
>>> Based on the discussion in KVM community, I worked out the patch to support
>>> perf to collect guest os statistics from host side. This patch is implemented
>>> with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
>>> critical bug and provided good suggestions with other guys. I really appreciate
>>> their kind help.
>>>
>>> The patch adds new subcommand kvm to perf.
>>>
>>> perf kvm top
>>> perf kvm record
>>> perf kvm report
>>> perf kvm diff
>>>
>>> The new perf could profile guest os kernel except guest os user space, but it
>>> could summarize guest os user space utilization per guest os.
>>>
>>> Below are some examples.
>>> 1) perf kvm top
>>> [root(a)lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
>>> --guestmodules=/home/ymzhang/guest/modules top
>>>
>>>
>> Excellent, support for guest kernel != host kernel is critical (I
>> can't remember the last time I ran same kernels).
>>
>> How would we support multiple guests with different kernels? Perhaps a
>> symbol server that perf can connect to (and that would connect to guests in
>> turn)?
>>
> The highest quality solution would be if KVM offered a 'guest extension' to
> the guest kernel's /proc/kallsyms that made it easy for user-space to get this
> information from an authorative source.
>
> That's the main reason why the host side /proc/kallsyms is so popular and so
> useful: while in theory it's mostly redundant information which can be gleaned
> from the System.map and other sources of symbol information, it's easily
> available and is _always_ trustable to come from the host kernel.
>
> Separate System.map's have a tendency to go out of sync (or go missing when a
> devel kernel gets rebuilt, or if a devel package is not installed), and server
> ports (be that a TCP port space server or an UDP port space mount-point) are
> both a configuration hassle and are not guest-transparent.
>
> So for instrumentation infrastructure (such as perf) we have a large and well
> founded preference for intrinsic, built-in, kernel-provided information: i.e.
> a largely 'built-in' and transparent mechanism to get to guest symbols.
>

The symbol server's client can certainly access the bits through vmchannel.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/