Unify KVM kernel-space and user-space code into a single project [Kernel]

Prev: Irish 2010 Grant Winner
Next: [PATCH] staging: winbond: mds_f.h whitespace and CamelCase corrections.

From: Anthony Liguori on 22 Mar 2010 14:30

On 03/22/2010 11:59 AM, Ingo Molnar wrote:
>
> Ok, that sounds interesting! I'd rather see some raw mechanism that 'perf kvm'
> could use instead of having to require yet another library (which generally
> dampens adoption of a tool). So i think we can work from there.
>

You can access the protocol directly if you don't want a library dependency.

> Btw., have you considered using Qemu's command name (task->comm[]) as the
> symbolic name? That way we could see the guest name in 'top' on the host - a
> nice touch.
>

qemu-system-x86_64 -name Fedora,process=qemu-Fedora

Does exactly that. We don't make this default based on the element of
least surprise. Many users expect to be able to do killall
qemu-system-x86 and if we did this by default, that wouldn't work.

>> The sockets are named based on UUID and you'll have to connect to a guest
>> and ask it for it's name. Some guests don't have names so we'll have to
>> come up with a clever way to describe a nameless VM.
>>
> I think just exposing the UUID in that lazy case would be adequate? It creates
> pressure for VM launchers to use better symbolic names.
>

Yup.

>>> I.e.:
>>>
>>> - Easy default reference to guest instances, and a way for tools to
>>> reference them symbolically as well in the multi-guest case. Preferably
>>> something trustable and kernel-provided - not some indirect information
>>> like a PID file created by libvirt-manager or so.
>>>
>> A guest is not a KVM concept. It's a qemu concept so it needs to be
>> something provided by qemu. The other caveat is that you won't see guests
>> created by libvirt because we're implementing this in terms of a default QMP
>> device and libvirt will disable defaults. This is desired behaviour.
>> libvirt wants to be in complete control and doesn't want a tool like perf
>> interacting with a guest directly.
>>
> Hm, this sucks for multiple reasons. Firstly, perf isnt a tool that
> 'interacts', it's an observation tool: just like 'top' is an observation tool.
>
> We want to enable developers to see all activities on the system - regardless
> of who started the VM or who started the process. Imagine if we had a way to
> hide tasks to hide from 'top'. It would be rather awful.
>
> Secondly, it tells us that the concept is fragile if it doesnt automatically
> enumerate all guests, regardless of how they were created.
>

Perf does interact with a guest though because it queries a guest to
read it's file system.

I understand the point you're making though. If instead of doing a pull
interface where the host queries the guest for files, if the guest
pushed a small set of files at startup which the host cached, then you
could potentially unconditionally expose a "read-only" socket that only
exposed limited information.

> Full system enumeration is generally best left to the kernel, as it can offer
> coherent access.
>

I don't see why qemu can't offer coherent access. The limitation today
is intentional and if it's overly restrictive, we can figure out a means
to change it.

Regards,

Anthony Liguori

> Ingo
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Anthony Liguori on 22 Mar 2010 14:40

On 03/22/2010 12:11 PM, Ingo Molnar wrote:
> * Anthony Liguori<anthony(a)codemonkey.ws> wrote:
>
>
>>> - Easy default reference to guest instances, and a way for tools to
>>> reference them symbolically as well in the multi-guest case. Preferably
>>> something trustable and kernel-provided - not some indirect information
>>> like a PID file created by libvirt-manager or so.
>>>
>> A guest is not a KVM concept. [...]
>>
> Well, in a sense a guest is a KVM concept too: it's in essence represented via
> the 'vcpu state attached to a struct mm' abstraction that is attached to the
> /dev/kvm file descriptor attached to a Linux process.
>
> Multiple vcpus can be started by the same process to represent SMP, but the
> whole guest notion is present: a Linux MM that carries KVM state.
>
> In that sense when we type 'perf kvm list' we'd like to get a list of all
> currently present guests that the developer has permission to profile: i.e.
> we'd like a list of all [debuggable] Linux tasks that have a KVM instance
> attached to them.
>
> A convenient way to do that would be to use the Qemu process's ->comm[] name,
> and to have a KVM ioctl that gets us a list of all vcpus that the querying
> task has ptrace permission to. [the standard permission check we do for
> instrumentation]
>
> No need for communication with Qemu for that - just an ioctl, and an
> always-guaranteed result that works fine on a whole-system and on a per user
> basis as well.
>

You need a way to interact with the guest which means you need some type
of device. All of the interesting devices are implemented in qemu so
you're going to have to interact with qemu if you want meaningful
interaction with a guest.

Regards,

Anthony Liguori

> Thanks,
>
> Ingo
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Anthony Liguori on 22 Mar 2010 14:50

On 03/22/2010 12:34 PM, Ingo Molnar wrote:
> This is really just the much-discredited microkernel approach for keeping
> global enumeration data that should be kept by the kernel ...
>
> Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by Anthony.
> There's numerous ways that this can break:
>
> - Those special files can get corrupted, mis-setup, get out of sync, or can
> be hard to discover.
>
> - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
> design flaw: it is per user. When i'm root i'd like to query _all_ current
> guest images, not just the ones started by root. A system might not even
> have a notion of '${HOME}'.
>
> - Apps might start KVM vcpu instances without adhering to the
> ${HOME}/.qemu/qmp/ access method.
>
> - There is no guarantee for the Qemu process to reply to a request - while
> the kernel can always guarantee an enumeration result. I dont want 'perf
> kvm' to hang or misbehave just because Qemu has hung.
>

If your position basically boils down to, we can't trust userspace and
we can always trust the kernel, I want to eliminate any userspace path,
then I can't really help you out.

I believe we can come up with an infrastructure that satisfies your
actual requirements within qemu but if you're also insisting upon the
above implementation detail then there's nothing I can do.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 22 Mar 2010 15:10

On 03/22/2010 04:54 PM, Ingo Molnar wrote:
> * Pekka Enberg<penberg(a)cs.helsinki.fi> wrote:
>
>
>> Hi Avi,
>>
>> On Mon, Mar 22, 2010 at 2:49 PM, Avi Kivity<avi(a)redhat.com> wrote:
>>
>>> Seems like perf is also split, with sysprof being developed outside the
>>> kernel. ?Will you bring sysprof into the kernel? ?Will every feature be
>>> duplicated in prof and sysprof?
>>>
>> I am glad you brought it up! Sysprof was historically outside of the kernel
>> (with it's own kernel module, actually). While the GUI was nice, it was much
>> harder to set up compared to oprofile so it wasn't all that popular. Things
>> improved slightly when Ingo merged the custom kernel module but the
>> _userspace_ part of sysprof was lagging behind a bit. I don't know what's
>> the situation now that they've switched over to perf syscalls but you
>> probably get my point.
>>
>> It would be nice if the two projects merged but I honestly don't see any
>> fundamental problem with two (or more) co-existing projects. Friendly
>> competition will ultimately benefit the users (think KDE and Gnome here).
>>
> See my previous mail - what i see as the most healthy project model is to have
> a full solution reference implementation, connected to a flexible halo of
> plugins or sub-apps.
>
> Firefox does that, KDE does that, and Gnome as well to a certain degree.
>
> The 'halo' provides a constant feedback of new features, and it also provides
> competition and pressure on the 'main' code to be top-notch.
>
> The problem i see with KVM is that there's no reference implementation! There
> is _only_ the KVM kernel part which is not functional in itself. Surrounded by
> a 'halo' - where none of the entities is really 'the' reference implementation
> we call 'KVM'.
>

The reference implementation is qemu-kvm.git, in the future qemu.git.
Like the reference implementation of device-mapper is
lvm2/device-mapper, not tools/device-mapper.

> This causes constant quality problems as the developers of the main project
> dont have constant pressure towards good quality (it is not their
> responsibility to care about user-space bits after all),

The developers of the main project are very much aware that users don't
call the ioctls directly but instead use qemu.

> plus it causes a lack
> of focus as well: integration between (friendly) competing user-space
> components is a lot harder than integration within a single framework such as
> Firefox.
>

We are very focused, just not on what you think we should be focused.

> I hope this explains my points about modularization a bit better! I suggested
> KVM to grow a user-space tool component in the kernel repo in tools/kvm/,
> which would become the reference implementation for tooling. User-space
> projects can still provide alternative tooling or can plug into this tooling,
> just like they are doing it now. So the main effect isnt even on those
> projects but on the kernel developers. The ABI remains and all the user-space
> packages and projects remain.
>

Seems like wanton duplication of effort. Can we throw so many
developer-years away on duplicate projects? Assuming not all are true
volunteers (85% for 2.6.33) who will fund this duplicate effort?

> Yes, i thought Qemu would be a prime candidate to be the baseline for
> tools/kvm/, but i guess that has become socially impossible now after this
> flamewar. It's not a big problem in the big scheme of things: tools/kvm/ is
> best grown up from a small towards larger size anyway ...
>

Qemu is open source, you can cp it into tools/kvm. Rewriting it from
scratch is a mammoth effort, there's a reason kvm, Xen, and virtualbox
all use qemu. Qemu itself copied code from bochs. Writing this stuff
is hard, especially if there is something already working.

You'll probably get much better threading (the qemu device model is
still single threaded), but it will take years to reach where qemu is
already at.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Anthony Liguori on 22 Mar 2010 15:20

On 03/22/2010 02:10 PM, Ingo Molnar wrote:
>
> I posit that it's both: and that priorities can be communicated - if only you
> try as a maintainer. All i'm suggesting is to add 'usable, unified user-space'
> to the list of unfun priorities, because it's possible and because it matters.
>

I've spent the past few months dealing with customers using the
libvirt/qemu/kvm stack. Usability is a major problem and is a top
priority for me. That is definitely a shift but that occurred before
you started your thread.

But I disagree with your analysis of what the root of the problem is.
It's a very kernel centric view and doesn't consider the interactions
between userspace.

Regards,

Anthony Liguori

> Ingo
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
Prev: Irish 2010 Grant Winner
Next: [PATCH] staging: winbond: mds_f.h whitespace and CamelCase corrections.