Unify KVM kernel-space and user-space code into a single project [Kernel]

Prev: Irish 2010 Grant Winner
Next: [PATCH] staging: winbond: mds_f.h whitespace and CamelCase corrections.

From: Avi Kivity on 24 Mar 2010 05:10

On 03/24/2010 09:38 AM, Andi Kleen wrote:
>> If you're profiling a single guest it makes more sense to do this from
>> inside the guest - you can profile userspace as well as the kernel.
>>
> I'm interested in debugging the guest without guest cooperation.
>
> In many cases qemu's new gdb stub works for that, but in some cases
> I would prefer instruction/branch traces over standard gdb style
> debugging.
>

Isn't gdb supposed to be able to use branch traces? It makes sense to
expose them via the gdb stub then. Not to say an external tool doesn't
make sense.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Paolo Bonzini on 24 Mar 2010 08:10

On 03/22/2010 08:13 AM, Avi Kivity wrote:
>
> (btw, why are you interested in desktop-on-desktop? one use case is
> developers, which don't really need fancy GUIs; a second is people who
> test out distributions, but that doesn't seem to be a huge population;
> and a third is people running Windows for some application that doesn't
> run on Linux - hopefully a small catergory as well.

This third category is pretty well served by virt-manager. It has its
quirks and shortcomings, but at least it exists.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 24 Mar 2010 08:10

On 03/24/2010 01:59 PM, Joerg Roedel wrote:
> On Wed, Mar 24, 2010 at 06:57:47AM +0200, Avi Kivity wrote:
>
>> On 03/23/2010 08:21 PM, Joerg Roedel wrote:
>>
>>> This enumeration is a very small and non-intrusive feature. Making it
>>> aware of namespaces is easy too.
>>>
>>>
>> It's easier (and safer and all the other boring bits) not to do it at
>> all in the kernel.
>>
> For the KVM stack is doesn't matter where it is implemented. It is as
> easy in qemu or libvirt as in the kernel. I also don't see big risks. On
> the perf side and for its users it is a lot easier to have this in the
> kernel.
> I for example always use plain qemu when running kvm guests and never
> used libvirt. The only central entity I have here is the kvm kernel
> modules. I don't want to start using it only to be able to use perf kvm.
>

You can always provide the kernel and module paths as command line
parameters. It just won't be transparently usable, but if you're using
qemu from the command line, presumably you can live with that.

>>> Who would be the consumer of such notifications? A 'perf kvm list' can
>>> live without I guess. If we need them later we can still add them.
>>>
>> System-wide monitoring needs to work equally well for guests started
>> before or after the monitor.
>>
> Could be easily done using notifier chains already in the kernel.
> Probably implemented with much less than 100 lines of additional code.
>

And a userspace interface for that.

>> Even disregarding that, if you introduce an API, people will start
>> using it and complaining if it's incomplete.
>>
> There is nothing wrong with that. We only need to define what this API
> should be used for to prevent rank growth. It could be an
> instrumentation-only API for example.
>

If we make an API, I'd like it to be generally useful.

It's a total headache. For example, we'd need security module hooks to
determine access permissions. So far we managed to avoid that since kvm
doesn't allow you to access any information beyond what you provided it
directly.

>>> My statement was not limited to enumeration, I should have been more
>>> clear about that. The guest filesystem access-channel is another
>>> affected part. The 'perf kvm top' command will access the guest
>>> filesystem regularly and going over qemu would be more overhead here.
>>>
>>>
>> Why? Also, the real cost would be accessing the filesystem, not copying
>> data over qemu.
>>
> When measuring cache-misses any additional (and in this case
> unnecessary) copy-overhead result in less appropriate results.
>

Copying the objects is a one time cost. If you run perf for more than a
second or two, it would fetch and cache all of the data. It's really
the same problem with non-guest profiling, only magnified a bit.

>>> Providing this in the KVM module directly also has the benefit that it
>>> would work out-of-the-box with different userspaces too. Or do we want
>>> to limit 'perf kvm' to the libvirt-qemu-kvm software stack?
>>>
>> Other userspaces can also provide this functionality, like they have to
>> provide disk, network, and display emulation. The kernel is not a huge
>> library.
>>
> This has nothing to do with a library. It is about entity and resource
> management which is what os kernels are about. The virtual machine is
> the entity (similar to a process) and we want to add additional access
> channels and names to it.
>

kvm.ko has only a small subset of the information that is used to define
a guest.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 24 Mar 2010 09:10

On 03/24/2010 02:50 PM, Joerg Roedel wrote:
>
>> You can always provide the kernel and module paths as command line
>> parameters. It just won't be transparently usable, but if you're using
>> qemu from the command line, presumably you can live with that.
>>
> I don't want the tool for myself only. A typical perf user expects that
> it works transparent.
>

A typical kvm user uses libvirt, so we can integrate it with that.

>>> Could be easily done using notifier chains already in the kernel.
>>> Probably implemented with much less than 100 lines of additional code.
>>>
>> And a userspace interface for that.
>>
> Not necessarily. The perf event is configured to measure systemwide kvm
> by userspace. The kernel side of perf takes care that it stays
> system-wide even with added vm instances. So in this case the consumer
> for the notifier would be the perf kernel part. No userspace interface
> required.
>

Someone needs to know about the new guest to fetch its symbols. Or do
you want that part in the kernel too?

>> If we make an API, I'd like it to be generally useful.
>>
> Thats hard to do at this point since we don't know what people will use
> it for. We should keep it simple in the beginning and add new features
> as they are requested and make sense in this context.
>

IMO this use case is to rare to warrant its own API, especially as there
are alternatives.

>> It's a total headache. For example, we'd need security module hooks to
>> determine access permissions. So far we managed to avoid that since kvm
>> doesn't allow you to access any information beyond what you provided it
>> directly.
>>
> Depends on how it is designed. A filesystem approach was already
> mentioned. We could create /sys/kvm/ for example to expose information
> about virtual machines to userspace. This would not require any new
> security hooks.
>

Who would set the security context on those files? Plus, we need cgroup
support so you can't see one container's guests from an unrelated container.

>> Copying the objects is a one time cost. If you run perf for more than a
>> second or two, it would fetch and cache all of the data. It's really
>> the same problem with non-guest profiling, only magnified a bit.
>>
> I don't think we can cache filesystem data of a running guest on the
> host. It is too hard to keep such a cache coherent.
>

I don't see any choice. The guest can change its symbols at any time
(say by kexec), without any notification.

>>>> Other userspaces can also provide this functionality, like they have to
>>>> provide disk, network, and display emulation. The kernel is not a huge
>>>> library.
>>>>
> If two userspaces run in parallel what is the single instance where perf
> can get a list of guests from?
>

I don't know. Surely that's solvable though.

>> kvm.ko has only a small subset of the information that is used to define
>> a guest.
>>
> The subset is not small. It contains all guest vcpus, the complete
> interrupt routing hardware emulation and manages event the guests
> memory.
>

It doesn't contain most of the mmio and pio address space. Integration
with qemu would allow perf to tell us that the guest is hitting the
interrupt status register of a virtio-blk device in pci slot 5 (the
information is already available through the kvm_mmio trace event, but
only qemu can decode it).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Alexander Graf on 24 Mar 2010 10:00

Avi Kivity wrote:
> On 03/24/2010 02:50 PM, Joerg Roedel wrote:
>>
>>> You can always provide the kernel and module paths as command line
>>> parameters. It just won't be transparently usable, but if you're using
>>> qemu from the command line, presumably you can live with that.
>>>
>> I don't want the tool for myself only. A typical perf user expects that
>> it works transparent.
>>
>
> A typical kvm user uses libvirt, so we can integrate it with that.
>
>>>> Could be easily done using notifier chains already in the kernel.
>>>> Probably implemented with much less than 100 lines of additional code.
>>>>
>>> And a userspace interface for that.
>>>
>> Not necessarily. The perf event is configured to measure systemwide kvm
>> by userspace. The kernel side of perf takes care that it stays
>> system-wide even with added vm instances. So in this case the consumer
>> for the notifier would be the perf kernel part. No userspace interface
>> required.
>>
>
> Someone needs to know about the new guest to fetch its symbols. Or do
> you want that part in the kernel too?

How about we add a virtio "guest file system access" device? The guest
would then expose its own file system using that device.

On the host side this would simply be a -virtioguestfs
unix:/tmp/guest.fs and you'd get a unix socket that gives you full
access to the guest file system by using commands. I envision something
like:

SEND: GET /proc/version
RECV: Linux version 2.6.27.37-0.1-default (geeko(a)buildhost) (gcc version
4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15
14:56:58 +0200

Now all we need is integration in perf to enumerate virtual machines
based on libvirt. If you want to run qemu-kvm directly, just go with
--guestfs=/tmp/guest.fs and perf could fetch all required information
automatically.

This should solve all issues while staying 100% in user space, right?

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
Prev: Irish 2010 Grant Winner
Next: [PATCH] staging: winbond: mds_f.h whitespace and CamelCase corrections.