Unify KVM kernel-space and user-space code into a single project [Kernel]

Prev: Irish 2010 Grant Winner
Next: [PATCH] staging: winbond: mds_f.h whitespace and CamelCase corrections.

From: Avi Kivity on 21 Mar 2010 17:40

On 03/21/2010 10:31 PM, Ingo Molnar wrote:
> * Avi Kivity<avi(a)redhat.com> wrote:
>
>
>> On 03/21/2010 09:17 PM, Ingo Molnar wrote:
>>
>>> Adding any new daemon to an existing guest is a deployment and usability
>>> nightmare.
>>>
>> The logical conclusion of that is that everything should be built into the
>> kernel. [...]
>>
> Only if you apply it as a totalitarian rule.
>
> Furthermore, the logical conclusion of _your_ line of argument (applied in a
> totalitarian manner) is that 'nothing should be built into the kernel'.
>

I'm certainly a minimalist, but that doesn't follow. Things that
require privileged access, or access to the page cache, or that can't be
made to perform otherwise should certainly be in the kernel. That's why
I submitted kvm for inclusion in the first place.

If it's something that can work just as well in userspace but we can't
be bothered to fix any 'deployment nightmares', then they shouldn't be
in the kernel. Examples include lvm2 and mdadm (which truly are
'deployment nightmares' - you need to start them before you have access
to your filesystem - yet they work somehow).

> I.e. you are arguing for microkernel Linux, while you see me as arguing for a
> monolithic kernel.
>

No. I'm arguing for reducing bloat wherever possible. Kernel code is
more expensive than userspace code in every metric possible.

> Reality is that we are somewhere inbetween, we are neither black nor white:
> it's shades of grey.
>
> If we want to do a good job with all this then we observe subsystems, we see
> how they relate to the physical world and decide about how to shape them. We
> identify long-term changes and re-design modularization boundaries in
> hindsight - when we got them wrong initially. We dont try to rationalize the
> status-quo.
>

I'm not for the status quo either - I'm for reducing the kernel code
footprint whereever it doesn't impact performance or break clean interfaces.

> Lets see one example of that thought process in action: Oprofile.
>
> We saw that the modularization of oprofile was a total nightmare: a separate
> kernel-space and a separate user-space component, which was in constant
> version friction. The ABI between them was stiffling: it was hard to change it
> (you needed to trickle that through the tool as well which was on a different
> release schedule, etc.e tc.)
>
> The result was sucky usability that never went beyond some basic 'you can do
> profiling' threshold. The subsystem worked well within that design box, and it
> was worked on by highly competent people - but it was still far, far away from
> the potential it could have achieved.
>
> So we observed those problems and decided to do something about it:
>
> - We unified the two parts into a single maintenance domain. There's
> the kernel-side in kernel/perf_event.c and arch/*/*/perf_event.c,
> plus the user-side in tools/perf/. The two are connected by a very
> flexible, forwards and backwards compatible ABI.
>

That's useful because perf is still small. If it were a full fledged
350KLOC GUI, then most of the development would concentrate on the GUI
and very little (relatively) would have to do with the kernel.

Qemu is in that state today. Please, please look at the recent commits
and check how many have actually anything to do with kvm, and how many
with everything else.

> - We moved much more code into the kernel, realizing that transparent
> and robust instrumentation should be offered instead of punting
> abstractions into user-space (which is in a disadvantaged position
> to implement system-wide abstractions).
>

No argument.

I have a similar experience with kvm. The user/kernel break is at the
cpu virtualization level - that is kvm is solely responsible for
emulating a cpu and userspace is responsible for emulating devices. An
exception was made for the PIC/IOAPIC/PIT due to performance
considerations - they are emulated in the kernel as well.

A common FAQ is why do we not emulate real-mode instructions in qemu.
The answer is that it the interface to kvm would be insane - it would
emulate a partial cpu. All other users of that interface would have to
implement an emulator (there is also a practical argument - the qemu
emulator does not implement atomics correctly wrt other threads).

> - We created a no-bullsh*t approach to usability. perf is by no means
> perfect, but it's written by developers for developers and if you report a
> bug to us we'll act on it before anything else. Furthermore the kernel
> developers do the user-space coding as well, so there's no chinese
> wall separating them. Kernel-space becomes aware of the intricacies of
> user-space and user-space developers become aware of the difficulties of
> kernel-space as well. It's a good mix in our experience.
>

Excellent. However qemu is written by developers for their users, and
their users are not worried about an eject button in the qemu SDL
interface, or about running the qemu command line by hand. They have
complicated management interfaces that do everything, so we concentrate,
for example, on a robust RPC interface for qemu. That means nothing for
command line users but is critical for our users.

I am not _against_ excellent support for command-line users, but I am
not going to divert the resources I control (=me) into something that is
not needed by my users. I encourage anyone who wants to improve
usability to subscribe to qemu-devel and contribute, they will receive a
warm welcome.

> The thing is (and i doubt you are surprised that i say that), i see a similar
> situation with KVM. The basic parameters are comparable to Oprofile: it has a
> kernel-space component and a KVM-specific user-space. By all practical means
> the two are one and the same, but are maintained as different projects.
>

There is tight cooperation between the maintainers and developers of
these two projects. Most developers are subscibed to both mailing lists
and many have contributed to both repositories. There does not appear
to be a problem with release schedules.

> I have followed KVM since its inception with great interest. I saw its good
> initial design, i tried it early on and even wrote various patches for it. So
> i care more about KVM than a random observer would, but this preference and
> passion for KVM's good technical sides does not cloud my judgement when it
> comes to its weaknesses.
>
> In fact the weaknesses are far more important to identify and express
> publicly, so i tend to concentrate on them. Dont take this as me blasting KVM,
> we both know the many good aspects of KVM.
>
> So, as i explained it earlier in greater detail the modularization of KVM into
> a separate kernel-space and user-space component is one of its worst current
> weaknesses, and it has become the main stiffling force in the way of a better
> KVM experience to users.
>
> That, IMO, is the 'weakest link' of KVM today and no matter how well the rest
> of KVM gets improved those nice bits all get unfairly ignored when the user
> cannot have a usable and good desktop experience and thinks that KVM is
> crappy.
>

Thanks. I agree the user experience when launching qemu from the
command line is miles behind virtualbox and vmware workstation. What I
disagree is that this is how a typical user will first experience kvm -
most distributions now integrate virt-manager which allows you much
better graphical interaction.

Unfortunately, virt-manager is still server-oriented (for example, it
uses VNC instead of displaying directly to X), and is hardly polished to
the same level as commercial tools. However, you cannot force someone
to write good desktop integration for qemu, it has to come from someone
with the itch, the experience, the capability, and the time.

> I think you should think outside the initial design box you have created 4
> years ago, you should consider iterating the model and you should consider the
> alternative i suggested: move (or create) KVM tooling to tools/kvm/ and treat
> it as a single project from there on.
>

Do you really think that tools/kvm/ would create a good GUI for kvm?
lkml is hardly the place where GUI developers and designers congregate.
Please, if any of you GUI experts are reading this, please consider
contributing to qemu directly.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 21 Mar 2010 17:50

On 03/21/2010 11:00 PM, Ingo Molnar wrote:
> * Avi Kivity<avi(a)redhat.com> wrote:
>
>
>> On 03/21/2010 09:59 PM, Ingo Molnar wrote:
>>
>>> Frankly, i was surprised (and taken slightly off base) by both Avi and Anthony
>>> suggesting such a clearly inferior "add a demon to the guest space" solution.
>>> It's a usability and deployment non-starter.
>>>
>> It's only clearly inferior if you ignore every consideration against it.
>> It's definitely not a deployment non-starter, see the tons of daemons that
>> come with any Linux system. [...]
>>
> Avi, please dont put arguments into my mouth that i never made.
>

Sorry, that was not the intent. I meant that putting things into the
kernel have disadvantages that must be considered.

> My (clearly expressed) argument was that:
>
> _a new guest-side demon is a transparent instrumentation non-starter_
>
> What is so hard to understand about that simple concept? Instrumentation is
> good if it's as transparent as possible.
>
> Of course lots of other features can be done via a new user-space package ...
>

I believe you can deploy this daemon via a (default) package, without
any hassle to users.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 21 Mar 2010 17:50

On 03/21/2010 10:55 PM, Ingo Molnar wrote:
>
> Of course you could say the following:
>
> ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not
> able to add this to the v2.6.35 kernel queue anymore as the ongoing
> usability work already takes up all of the project's maintainer and
> testing bandwidth. If you want the feature to be merged sooner than that
> then please help us cut down on the TODO and BUGS list that can be found
> at XYZ. There's quite a few low hanging fruits there. '
>

That would be shooting at my own foot as well as the contributor's since
I badly want that RCU stuff, and while a GUI would be nice, that itch
isn't on my back.

You're asking a developer and a maintainer to put off the work they're
interested in, in order to work on something someone else is interested
in, but not contributing the work.

> Although this RCU example is 'worst' possible example, as it's a pure speedup
> change with no functionality effect.
>
> Consider the _other_ examples that are a lot more clear:
>
> ' If you expose paravirt spilocks via KVM please also make sure the KVM
> tooling can make use of it, has an option for it to configure it, and
> that it has sufficient efficiency statistics displayed in the tool for
> admins to monitor.'
>
> ' If you create this new paravirt driver then please also make sure it can
> be configured in the tooling. '
>
> ' Please also add a testcase for this bug to tools/kvm/testcases/ so we dont
> repeat this same mistake in the future. '
>

All three happen quite commonly in qemu/kvm development. Of course
someone who develops a feature also develops a patch that exposes it in
qemu. There are several test cases in qemu-kvm.git/kvm/user/test.

> I'd say most of the high-level feature work in KVM has tooling impact.
>

Usually, pretty low. Plumbing down a feature is usually trivial. There
are exceptions, of course - smp is only supported in qemu-kvm.git, not
in upstream qemu.git, for example. In any case of course the work is
done in both qemu and kvm - do you think people develop features to see
them bitrot?

> And note the important arguement that the 'eject button' thing would not occur
> naturally in a project that is well designed and has a good quality balance.
> It would only occur in the transitionary period if a big lump of lower-quality
> code is unified with higher-quality code. Then indeed a lot of pressure gets
> created on the people working on the high-quality portion to go over and fix
> the low-quality portion.
>

It's a matter of priorities.

> Which, btw., is an unconditonally good thing ...
>
> But even an RCU speedup can be fairly linked/ordered to more pressing needs of
> a project.
>

Pressing to whom?

> Really, the unification of two tightly related pieces of code has numerous
> clear advantages. Please give it some thought before rejecting it.
>

I'm not blind to the advantages. Dropping tcg would be the biggest of
them by far (much more than moving the repository, IMO). But there are
disadvantages as well.

Around two years ago I seriously considered forking qemu, at this time I
do not think it is a good idea.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ingo Molnar on 21 Mar 2010 18:00

* Avi Kivity <avi(a)redhat.com> wrote:

> On 03/21/2010 10:55 PM, Ingo Molnar wrote:
> >
> >Of course you could say the following:
> >
> > ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not
> > able to add this to the v2.6.35 kernel queue anymore as the ongoing
> > usability work already takes up all of the project's maintainer and
> > testing bandwidth. If you want the feature to be merged sooner than that
> > then please help us cut down on the TODO and BUGS list that can be found
> > at XYZ. There's quite a few low hanging fruits there. '
>
> That would be shooting at my own foot as well as the contributor's since I
> badly want that RCU stuff, and while a GUI would be nice, that itch isn't on
> my back.

I think this sums up the root cause of all the problems i see with KVM pretty
well.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ingo Molnar on 21 Mar 2010 18:00

* Avi Kivity <avi(a)redhat.com> wrote:

> > I.e. you are arguing for microkernel Linux, while you see me as arguing
> > for a monolithic kernel.
>
> No. I'm arguing for reducing bloat wherever possible. Kernel code is more
> expensive than userspace code in every metric possible.

1)

One of the primary design arguments of the micro-kernel design as well was to
push as much into user-space as possible without impacting performance too
much so you very much seem to be arguing for a micro-kernel design for the
kernel.

I think history has given us the answer for that fight between microkernels
and monolithic kernels.

Furthermore, to not engage in hypotheticals about microkernels: by your
argument the Oprofile design was perfect (it was minimalistic kernel-space,
with all the complexity in user-space), while perf was over-complex (which
does many things in the kernel that could have been done in user-space).

Practical results suggest the exact opposite happened - Oprofile is being
replaced by perf. How do you explain that?

2)

In your analysis you again ignore the package boundary costs and artifacts as
if they didnt exist.

That was my main argument, and that is what we saw with oprofile and perf:
while maintaining more kernel-code may be more expensive, it sure pays off for
getting us a much better solution in the end.

And getting a 'much better solution' to users is the goal of all this, isnt
it?

I dont mind what you call 'bloat' per se if it's for a purpose that users find
like a good deal. I have quite a bit of RAM in most of my systems, having 50K
more or less included in the kernel image is far less important than having a
healthy and vibrant development model and having satisfied users ...

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Prev: Irish 2010 Grant Winner
Next: [PATCH] staging: winbond: mds_f.h whitespace and CamelCase corrections.