Enhance perf to collect KVM guest os statistics from host side [Kernel]

Prev: [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2.
Next: [PATCH 2/3] SCSI: lpfc, fix lock imbalances

From: Frank Ch. Eigler on 16 Mar 2010 11:10

Ingo Molnar <mingo(a)elte.hu> writes:

> [...]
>> >I.e. we really want to be able users to:
>> >
>> > 1) have it all working with a single guest, without having to specify 'which'
>> > guest (qemu PID) to work with. That is the dominant usecase both for
>> > developers and for a fair portion of testers.
>>
>> That's reasonable if we can get it working simply.
>
> IMO such ease of use is reasonable and required, full stop.
> If it cannot be gotten simply then that's a bug: either in the code, or in the
> design, or in the development process that led to the design. Bugs need
> fixing. [...]

Perhaps the fact that kvm happens to deal with an interesting
application area (virtualization) is misleading here. As far as the
host kernel or other host userspace is concerned, qemu is just some
random unprivileged userspace program (with some *optional* /dev/kvm
services that might happen to require temporary root).

As such, perf trying to instrument qemu is no different than perf
trying to instrument any other userspace widget. Therefore, expecting
'trusted enumeration' of instances is just as sensible as using
'trusted ps' and 'trusted /var/run/FOO.pid files'.

- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Frank Ch. Eigler on 16 Mar 2010 12:10

Hi -

On Tue, Mar 16, 2010 at 04:52:21PM +0100, Ingo Molnar wrote:
> [...]
> > Perhaps the fact that kvm happens to deal with an interesting application
> > area (virtualization) is misleading here. As far as the host kernel or
> > other host userspace is concerned, qemu is just some random unprivileged
> > userspace program [...]

> You are quite mistaken: KVM isnt really a 'random unprivileged
> application' in this context, it is clearly an extension of
> system/kernel services.

I don't know what "extension of system/kernel services" means in this
context, beyond something running on the system/kernel, like every
other process. To clarify, to what extent do you consider your
classification similarly clear for a host is running

* multiple kvm instances run as unprivileged users
* non-kvm OS simulators such as vmware or xen or gdb
* kvm instances running something other than linux

> ( Which can be seen from the simple fact that what started the
> discussion was 'how do we get /proc/kallsyms from the
> guest'. I.e. an extension of the existing host-space /proc/kallsyms
> was desired. )

(Sorry, that smacks of circular reasoning.)

It may be a charming convenience function for perf users to give them
shortcuts for certain favoured configurations (kvm running freshest
linux), but that says more about perf than kvm.

- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Frank Ch. Eigler on 16 Mar 2010 20:50

Hi -

On Tue, Mar 16, 2010 at 06:04:10PM -0500, Anthony Liguori wrote:
> [...]
> The only way to really address this is to change the interaction.
> Instead of running perf externally to qemu, we should support a perf
> command in the qemu monitor that can then tie directly to the perf
> tooling. That gives us the best possible user experience.

To what extent could this be solved with less crossing of
isolation/abstraction layers, if the perfctr facilities were properly
virtualized? That way guests could run perf goo internally.
Optionally virt tools on the host side could aggregate data from
cooperating self-monitoring guests.

- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Sheng Yang on 17 Mar 2010 05:30

On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote:
> On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote:
> > On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
> > > Right, but there is a scope between kvm_guest_enter and really running
> > > in guest os, where a perf event might overflow. Anyway, the scope is
> > > very narrow, I will change it to use flag PF_VCPU.
> >
> > There is also a window between setting the flag and calling 'int $2'
> > where an NMI might happen and be accounted incorrectly.
> >
> > Perhaps separate the 'int $2' into a direct call into perf and another
> > call for the rest of NMI handling. I don't see how it would work on svm
> > though - AFAICT the NMI is held whereas vmx swallows it.
> >
> > I guess NMIs
> > will be disabled until the next IRET so it isn't racy, just tricky.
>
> I'm not sure if vmexit does break NMI context or not. Hardware NMI context
> isn't reentrant till a IRET. YangSheng would like to double check it.

After more check, I think VMX won't remained NMI block state for host. That's
means, if NMI happened and processor is in VMX non-root mode, it would only
result in VMExit, with a reason indicate that it's due to NMI happened, but no
more state change in the host.

So in that meaning, there _is_ a window between VMExit and KVM handle the NMI.
Moreover, I think we _can't_ stop the re-entrance of NMI handling code because
"int $2" don't have effect to block following NMI.

And if the NMI sequence is not important(I think so), then we need to generate
a real NMI in current vmexit-after code. Seems let APIC send a NMI IPI to
itself is a good idea.

I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to replace
"int $2". Something unexpected is happening...

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Sheng Yang on 17 Mar 2010 06:00

On Wednesday 17 March 2010 17:41:58 Avi Kivity wrote:
> On 03/17/2010 11:28 AM, Sheng Yang wrote:
> >> I'm not sure if vmexit does break NMI context or not. Hardware NMI
> >> context isn't reentrant till a IRET. YangSheng would like to double
> >> check it.
> >
> > After more check, I think VMX won't remained NMI block state for host.
> > That's means, if NMI happened and processor is in VMX non-root mode, it
> > would only result in VMExit, with a reason indicate that it's due to NMI
> > happened, but no more state change in the host.
> >
> > So in that meaning, there _is_ a window between VMExit and KVM handle the
> > NMI. Moreover, I think we _can't_ stop the re-entrance of NMI handling
> > code because "int $2" don't have effect to block following NMI.
>
> That's pretty bad, as NMI runs on a separate stack (via IST). So if
> another NMI happens while our int $2 is running, the stack will be
> corrupted.

Though hardware didn't provide this kind of block, software at least would
warn about it... nmi_enter() still would be executed by "int $2", and result
in BUG() if we are already in NMI context(OK, it is a little better than
mysterious crash due to corrupted stack).
>
> > And if the NMI sequence is not important(I think so), then we need to
> > generate a real NMI in current vmexit-after code. Seems let APIC send a
> > NMI IPI to itself is a good idea.
> >
> > I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to
> > replace "int $2". Something unexpected is happening...
>
> I think you need DM_NMI for that to work correctly.
>
> An alternative is to call the NMI handler directly.

apic_send_IPI_self() already took care of APIC_DM_NMI.

And NMI handler would block the following NMI?

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2 3
Prev: [PATCH] perf: x86: fix callgraphs of 32-bit processes on 64-bit kernels V2.
Next: [PATCH 2/3] SCSI: lpfc, fix lock imbalances