Unify KVM kernel-space and user-space code into a single project [Kernel]

Prev: Irish 2010 Grant Winner
Next: [PATCH] staging: winbond: mds_f.h whitespace and CamelCase corrections.

From: Avi Kivity on 24 Mar 2010 10:00

On 03/24/2010 03:46 PM, Joerg Roedel wrote:
> On Wed, Mar 24, 2010 at 03:05:02PM +0200, Avi Kivity wrote:
>
>> On 03/24/2010 02:50 PM, Joerg Roedel wrote:
>>
>
>>> I don't want the tool for myself only. A typical perf user expects that
>>> it works transparent.
>>>
>> A typical kvm user uses libvirt, so we can integrate it with that.
>>
> Someone who uses libvirt and virt-manager by default is probably not
> interested in this feature at the same level a kvm developer is. And
> developers tend not to use libvirt for low-level kvm development. A
> number of developers have stated in this thread already that they would
> appreciate a solution for guest enumeration that would not involve
> libvirt.
>

So would I. But when I weigh the benefit of truly transparent
system-wide perf integration for users who don't use libvirt but do use
perf, versus the cost of transforming kvm from a single-process API to a
system-wide API with all the complications that I've listed, it comes
out in favour of not adding the API.

Those few users can probably script something to cover their needs.

>> Someone needs to know about the new guest to fetch its symbols. Or do
>> you want that part in the kernel too?
>>
> The samples will be tagged with the guest-name (and some additional
> information perf needs). Perf userspace can access the symbols then
> through /sys/kvm/guest0/fs/...
>

I take that as a yes? So we need a virtio-serial client in the kernel
(which might be exploitable by a malicious guest if buggy) and a
fs-over-virtio-serial client in the kernel (also exploitable).

>>> Depends on how it is designed. A filesystem approach was already
>>> mentioned. We could create /sys/kvm/ for example to expose information
>>> about virtual machines to userspace. This would not require any new
>>> security hooks.
>>>
>> Who would set the security context on those files?
>>
> An approach like: "The files are owned and only readable by the same
> user that started the vm." might be a good start. So a user can measure
> its own guests and root can measure all of them.
>

That's not how sVirt works. sVirt isolates a user's VMs from each
other, so if a guest breaks into qemu it can't break into other guests
owned by the same user.

The users who need this API (!libvirt and perf) probably don't care
about sVirt, but a new API must not break it.

>> Plus, we need cgroup support so you can't see one container's guests
>> from an unrelated container.
>>
> cgroup support is an issue but we can solve that too. Its in general
> still less complex than going through the whole libvirt-qemu-kvm stack.
>

It's a tradeoff. IMO, going through qemu is the better way, and also
provides more information.

>> Integration with qemu would allow perf to tell us that the guest is
>> hitting the interrupt status register of a virtio-blk device in pci
>> slot 5 (the information is already available through the kvm_mmio
>> trace event, but only qemu can decode it).
>>
> Yeah that would be interesting information. But it is more related to
> tracing than to pmu measurements.
> The information which you mentioned above are probably better
> captured by an extension of trace-events to userspace.
>

It's all related. You start with perf, see a problem with mmio, call up
a histogram of mmio or interrupts or whatever, then zoom in on the
misbehaving device.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 24 Mar 2010 10:10

On 03/24/2010 03:53 PM, Alexander Graf wrote:
>
>> Someone needs to know about the new guest to fetch its symbols. Or do
>> you want that part in the kernel too?
>>
>
> How about we add a virtio "guest file system access" device? The guest
> would then expose its own file system using that device.
>
> On the host side this would simply be a -virtioguestfs
> unix:/tmp/guest.fs and you'd get a unix socket that gives you full
> access to the guest file system by using commands. I envision something
> like:
>

The idea is to use a dedicated channel over virtio-serial. If the
channel is present the file server can serve files over it.

> SEND: GET /proc/version
> RECV: Linux version 2.6.27.37-0.1-default (geeko(a)buildhost) (gcc version
> 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15
> 14:56:58 +0200
>
> Now all we need is integration in perf to enumerate virtual machines
> based on libvirt. If you want to run qemu-kvm directly, just go with
> --guestfs=/tmp/guest.fs and perf could fetch all required information
> automatically.
>
> This should solve all issues while staying 100% in user space, right?
>

Yeah, needs a fuse filesystem to populate the host namespace (kind of
sshfs over virtio-serial).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Alexander Graf on 24 Mar 2010 10:30

Avi Kivity wrote:
> On 03/24/2010 03:53 PM, Alexander Graf wrote:
>>
>>> Someone needs to know about the new guest to fetch its symbols. Or do
>>> you want that part in the kernel too?
>>>
>>
>> How about we add a virtio "guest file system access" device? The guest
>> would then expose its own file system using that device.
>>
>> On the host side this would simply be a -virtioguestfs
>> unix:/tmp/guest.fs and you'd get a unix socket that gives you full
>> access to the guest file system by using commands. I envision something
>> like:
>>
>
> The idea is to use a dedicated channel over virtio-serial. If the
> channel is present the file server can serve files over it.

The file server being a kernel module inside the guest? We want to be
able to serve things as early and hassle free as possible, so in this
case I agree with Ingo that a kernel module is superior.

>
>> SEND: GET /proc/version
>> RECV: Linux version 2.6.27.37-0.1-default (geeko(a)buildhost) (gcc version
>> 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15
>> 14:56:58 +0200
>>
>> Now all we need is integration in perf to enumerate virtual machines
>> based on libvirt. If you want to run qemu-kvm directly, just go with
>> --guestfs=/tmp/guest.fs and perf could fetch all required information
>> automatically.
>>
>> This should solve all issues while staying 100% in user space, right?
>>
>
> Yeah, needs a fuse filesystem to populate the host namespace (kind of
> sshfs over virtio-serial).

I don't see why we need a fuse filesystem. We can of course create one
later on. But for now all you need is a user connecting to that socket.

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 24 Mar 2010 11:10

On 03/24/2010 04:24 PM, Alexander Graf wrote:
> Avi Kivity wrote:
>
>> On 03/24/2010 03:53 PM, Alexander Graf wrote:
>>
>>>
>>>> Someone needs to know about the new guest to fetch its symbols. Or do
>>>> you want that part in the kernel too?
>>>>
>>>>
>>> How about we add a virtio "guest file system access" device? The guest
>>> would then expose its own file system using that device.
>>>
>>> On the host side this would simply be a -virtioguestfs
>>> unix:/tmp/guest.fs and you'd get a unix socket that gives you full
>>> access to the guest file system by using commands. I envision something
>>> like:
>>>
>>>
>> The idea is to use a dedicated channel over virtio-serial. If the
>> channel is present the file server can serve files over it.
>>
> The file server being a kernel module inside the guest? We want to be
> able to serve things as early and hassle free as possible, so in this
> case I agree with Ingo that a kernel module is superior.
>

No, just a daemon. If it's important enough we can get distributions to
package it by default, and then it will be hassle free. If "early
enough" is also so important, we can get it to start up on initrd. If
it's really critical, we can patch grub to serve the files as well.

>>> SEND: GET /proc/version
>>> RECV: Linux version 2.6.27.37-0.1-default (geeko(a)buildhost) (gcc version
>>> 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15
>>> 14:56:58 +0200
>>>
>>> Now all we need is integration in perf to enumerate virtual machines
>>> based on libvirt. If you want to run qemu-kvm directly, just go with
>>> --guestfs=/tmp/guest.fs and perf could fetch all required information
>>> automatically.
>>>
>>> This should solve all issues while staying 100% in user space, right?
>>>
>>>
>> Yeah, needs a fuse filesystem to populate the host namespace (kind of
>> sshfs over virtio-serial).
>>
> I don't see why we need a fuse filesystem. We can of course create one
> later on. But for now all you need is a user connecting to that socket.
>

If the perf app knows the protocol, no problem. But leave perf with
pure filesystem access and hide the details in fuse.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 24 Mar 2010 11:20

On 03/24/2010 05:01 PM, Joerg Roedel wrote:
>
>> But when I weigh the benefit of truly transparent system-wide perf
>> integration for users who don't use libvirt but do use perf, versus
>> the cost of transforming kvm from a single-process API to a
>> system-wide API with all the complications that I've listed, it comes
>> out in favour of not adding the API.
>>
> Its not a transformation, its an extension. The current per-process
> /dev/kvm stays mostly untouched. Its all about having something like
> this:
>
> $ cd /sys/kvm/guest0
> $ ls -l
> -r-------- 1 root root 0 2009-08-17 12:05 name
> dr-x------ 1 root root 0 2009-08-17 12:05 fs
> $ cat name
> guest0
> $ # ...
>
> The fs/ directory is used as the mount point for the guest root fs.
>

The problem is /sys/kvm, not /sys/kvm/fs.

>>> The samples will be tagged with the guest-name (and some additional
>>> information perf needs). Perf userspace can access the symbols then
>>> through /sys/kvm/guest0/fs/...
>>>
>> I take that as a yes? So we need a virtio-serial client in the kernel
>> (which might be exploitable by a malicious guest if buggy) and a
>> fs-over-virtio-serial client in the kernel (also exploitable).
>>
> What I meant was: perf-kernel puts the guest-name into every sample and
> perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
> symbols. I leave the question of how the guest-fs is exposed to the host
> out of this discussion. We should discuss this seperatly.
>

How I see it: perf-kernel puts the guest pid into every sample, and
perf-userspace uses that to resolve to a mountpoint served by fuse, or
to a unix domain socket that serves the files.

>>> An approach like: "The files are owned and only readable by the same
>>> user that started the vm." might be a good start. So a user can measure
>>> its own guests and root can measure all of them.
>>>
>> That's not how sVirt works. sVirt isolates a user's VMs from each
>> other, so if a guest breaks into qemu it can't break into other guests
>> owned by the same user.
>>
> If a vm breaks into qemu it can access the host file system which is the
> bigger problem. In this case there is no isolation anymore. From that
> context it can even kill other VMs of the same user independent of a
> hypothetical /sys/kvm/.
>

It cannot. sVirt labels the disk image and other files qemu needs with
the appropriate label, and everything else is off limits. Even if you
run the guest as root, it won't have access to other files.

>>> Yeah that would be interesting information. But it is more related to
>>> tracing than to pmu measurements. The information which you
>>> mentioned above are probably better captured by an extension of
>>> trace-events to userspace.
>>>
>> It's all related. You start with perf, see a problem with mmio, call up
>> a histogram of mmio or interrupts or whatever, then zoom in on the
>> misbehaving device.
>>
> Yes, but its different from the implementation point-of-view. For the
> user it surely all plays together.
>

We need qemu to cooperate for mmio tracing, and we can cooperate with
qemu for symbol resolution. If it prevents adding another kernel API,
that's a win from my POV.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 28 29 30 31 32 33 34 35 36 37 38 39 40 41
Prev: Irish 2010 Grant Winner
Next: [PATCH] staging: winbond: mds_f.h whitespace and CamelCase corrections.