Unify KVM kernel-space and user-space code into a single project [Kernel]

Prev: Irish 2010 Grant Winner
Next: [PATCH] staging: winbond: mds_f.h whitespace and CamelCase corrections.

From: Avi Kivity on 22 Mar 2010 15:50

On 03/22/2010 09:27 PM, Ingo Molnar wrote:
>
>> If your position basically boils down to, we can't trust userspace
>> and we can always trust the kernel, I want to eliminate any
>> userspace path, then I can't really help you out.
>>
> Why would you want to 'help me out'? I can tell a good solution from a bad one
> just fine.
>

You are basically making a kernel implementation a requirement, instead
of something that follows from the requirement.

> You should instead read the long list of disadvantages above, invert them and
> list then as advantages for the kernel-based vcpu enumeration solution, apply
> common sense and go admit to yourself that indeed in this situation a kernel
> provided enumeration of vcpu contexts is the most robust solution.
>

Having qemu enumerate guests one way or another is not a good idea IMO
since it is focused on one guest and doesn't have a system-wide entity.
A userspace system-wide entity will work just as well as kernel
implementation, without its disadvantages.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ingo Molnar on 22 Mar 2010 16:00

* Alexander Graf <agraf(a)suse.de> wrote:

> Yes. I think the point was that every layer in between brings potential
> slowdown and loss of features.

Exactly. The more 'fragmented' a project is into sub-projects, without a
single, unified, functional reference implementation in the center of it, the
longer it takes to fix 'unsexy' problems like trivial usability bugs.

Furthermore, another negative effect is that many times features are
implemented not in their technically best way, but in a way to keep them local
to the project that originates them. This is done to keep deployment latencies
and general contribution overhead down to a minimum. The moment you have to
work with yet another project, the overhead adds up.

So developers rather go for the quicker (yet inferior) hack within the
sub-project they have best access to.

Tell me this isnt happening in this space ;-)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ingo Molnar on 22 Mar 2010 16:00

* Joerg Roedel <joro(a)8bytes.org> wrote:

> On Mon, Mar 22, 2010 at 05:32:15PM +0100, Ingo Molnar wrote:
> > I dont know how you can find the situation of Alpha comparable, which is a
> > legacy architecture for which no new CPU was manufactored in the past ~10
> > years.
> >
> > The negative effects of physical obscolescence cannot be overcome even by the
> > very best of development models ...
>
> The maintainers of that architecture could at least continue to maintain it.
> But that is not the case. Most newer syscalls are not available and overall
> stability on alpha sucks (kernel crashed when I tried to start Xorg for
> example) but nobody cares about it. Hardware is still around and there are
> still some users of it.

You are arguing why maintainers do not act as you suggest, against the huge
negative effects of physical obscolescence?

Please use common sense: they dont act because ... there are huge negative
effects due to physical obscolescence?

No amount of development model engineering can offset that negative.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Antoine Martin on 22 Mar 2010 16:10

On 03/23/2010 02:15 AM, Anthony Liguori wrote:
> On 03/22/2010 12:55 PM, Avi Kivity wrote:
>>> Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by
>>> Anthony.
>>> There's numerous ways that this can break:
>>
>> I don't like it either. We have libvirt for enumerating guests.
>
> We're stuck in a rut with libvirt and I think a lot of the
> dissatisfaction with qemu is rooted in that. It's not libvirt that's
> the probably, but the relationship between qemu and libvirt.
+1
The obvious reason why so many people still use shell scripts rather
than libvirt is because if it just doesn't provide what they need.
Every time I've looked at it (and I've been looking for a better
solution for many years), it seems that it would have provided most of
the things I needed, but the remaining bits were unsolvable.

Shell scripts can be ugly, but you get total control.

Antoine
> We add a feature to qemu and maybe after six month it gets exposed by
> libvirt. Release time lines of the two projects complicate the
> situation further. People that write GUIs are limited by libvirt
> because that's what they're told to use and when they need something
> simple, they're presented with first getting that feature implemented
> in qemu, then plumbed through libvirt.
>
> It wouldn't be so bad if libvirt was basically a passthrough interface
> to qemu but it tries to model everything in a generic way which is
> more or less doomed to fail when you're adding lots of new features
> (as we are).
>
> The list of things that libvirt doesn't support and won't any time
> soon is staggering.
>
> libvirt serves an important purpose, but we need to do a better job in
> qemu with respect to usability. We can't just punt to libvirt.
>
> Regards,
>
> Anthony Liguori
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ingo Molnar on 22 Mar 2010 16:10

* Avi Kivity <avi(a)redhat.com> wrote:

> On 03/22/2010 09:20 PM, Ingo Molnar wrote:
> >* Avi Kivity<avi(a)redhat.com> wrote:
> >
> >>>Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by
> >>>Anthony. There's numerous ways that this can break:
> >>I don't like it either. We have libvirt for enumerating guests.
> >Which has pretty much the same problems to the ${HOME}/.qemu/qmp/ solution,
> >obviously.
>
> It doesn't follow. The libvirt daemon could/should own guests from all
> users. I don't know if it does so now, but nothing is preventing it
> technically.

It's hard for me to argue against a hypothetical implementation, but all
user-space driven solutions for resource enumeration i've seen so far had
weaknesses that kernel-based solutions dont have.

> >>> - Those special files can get corrupted, mis-setup, get out of sync, or can
> >>> be hard to discover.
> >>>
> >>> - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
> >>> design flaw: it is per user. When i'm root i'd like to query _all_ current
> >>> guest images, not just the ones started by root. A system might not even
> >>> have a notion of '${HOME}'.
> >>>
> >>> - Apps might start KVM vcpu instances without adhering to the
> >>> ${HOME}/.qemu/qmp/ access method.
> >>- it doesn't work with nfs.
> >So out of a list of 4 disadvantages your reply is that you agree with 3?
>
> I agree with 1-3, disagree with 4, and add 5. Yes.
>
> >>> - There is no guarantee for the Qemu process to reply to a request - while
> >>> the kernel can always guarantee an enumeration result. I dont want 'perf
> >>> kvm' to hang or misbehave just because Qemu has hung.
> >>If qemu doesn't reply, your guest is dead anyway.
> >Erm, but i'm talking about a dead tool here. There's a world of a difference
> >between 'kvm top' not showing new entries (because the guest is dead), and
> >'perf kvm top' hanging due to Qemu hanging.
>
> If qemu hangs, the guest hangs a few milliseconds later.

I think you didnt understand my point. I am talking about 'perf kvm top'
hanging if Qemu hangs.

With a proper in-kernel enumeration the kernel would always guarantee the
functionality, even if the vcpu does not make progress (i.e. it's "hung").

With this implemented in Qemu we lose that kind of robustness guarantee.

And especially during development (when developers use instrumentation the
most) is it important to have robust instrumentation that does not hang along
with the Qemu process.

> If qemu fails, you lose your guest. If libvirt forgets about a
> guest, you can't do anything with it any more. These are more
> serious problems than 'perf kvm' not working. [...]

How on earth can you justify a bug ("perf kvm top" hanging) with that there
are other bugs as well?

Basically you are arguing the equivalent that a gdb session would be fine to
become unresponsive if the debugged task hangs. Fortunately ptrace is
kernel-based and it never 'hangs' if the user-space process hangs somewhere.

This is an essential property of good instrumentation.

So the enumeration method you suggested is a poor, sub-part solution, simple
as that.

> [...] Qemu and libvirt have to be robust anyway, we can rely on them. Like
> we have to rely on init, X, sshd, and a zillion other critical tools.

We can still profile any of those tools without the profiler breaking if the
debugged tool breaks ...

> > By your argument it would be perfectly fine to implement /proc purely via
> > user-space, correct?
>
> I would have preferred /proc to be implemented via syscalls called directly
> from tools, and good tools written to expose the information in it. When
> computers were slower 'top' would spend tons of time opening and closing all
> those tiny files and parsing them. Of course the kernel needs to provide
> the information.

(Then you'll be enjoyed to hear that perf has enabled exactly that, and that we
are working towards that precise usecase.)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
Prev: Irish 2010 Grant Winner
Next: [PATCH] staging: winbond: mds_f.h whitespace and CamelCase corrections.