Unify KVM kernel-space and user-space code into a single project [Kernel]

Prev: Irish 2010 Grant Winner
Next: [PATCH] staging: winbond: mds_f.h whitespace and CamelCase corrections.

From: Anthony Liguori on 21 Mar 2010 20:30

On 03/21/2010 05:00 PM, Ingo Molnar wrote:
> If that is the theory then it has failed to trickle through in practice. As
> you know i have reported a long list of usability problems with hardly a look.
> That list could be created by pretty much anyone spending a few minutes of
> getting a first impression with qemu-kvm.
>

Can you transfer your list to the following wiki page:

http://wiki.qemu.org/Features/Usability

This thread is so large that I can't find your note that contained the
initial list.

I want to make sure this input doesn't die once this thread settles down.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 22 Mar 2010 02:40

On 03/21/2010 11:20 PM, Ingo Molnar wrote:
> * Avi Kivity<avi(a)redhat.com> wrote:
>
>
>>> Well, for what it's worth, I rarely ever use anything else. My virtual
>>> disks are raw so I can loop mount them easily, and I can also switch my
>>> guest kernels from outside... without ever needing to mount those disks.
>>>
>> Curious, what do you use them for?
>>
>> btw, if you build your kernel outside the guest, then you already have
>> access to all its symbols, without needing anything further.
>>
> There's two errors with your argument:
>
> 1) you are assuming that it's only about kernel symbols
>
> Look at this 'perf report' output:
>
> # Samples: 7127509216
> #
> # Overhead Command Shared Object Symbol
> # ........ .......... ............................. ......
> #
> 19.14% git git [.] lookup_object
> 15.16% perf git [.] lookup_object
> 4.74% perf libz.so.1.2.3 [.] inflate
> 4.52% git libz.so.1.2.3 [.] inflate
> 4.21% perf libz.so.1.2.3 [.] inflate_table
> 3.94% git libz.so.1.2.3 [.] inflate_table
> 3.29% git git [.] find_pack_entry_one
> 3.24% git libz.so.1.2.3 [.] inflate_fast
> 2.96% perf libz.so.1.2.3 [.] inflate_fast
> 2.96% git git [.] decode_tree_entry
> 2.80% perf libc-2.11.90.so [.] __strlen_sse42
> 2.56% git libc-2.11.90.so [.] __strlen_sse42
> 1.98% perf libc-2.11.90.so [.] __GI_memcpy
> 1.71% perf git [.] decode_tree_entry
> 1.53% git libc-2.11.90.so [.] __GI_memcpy
> 1.48% git git [.] lookup_blob
> 1.30% git git [.] process_tree
> 1.30% perf git [.] process_tree
> 0.90% perf git [.] tree_entry
> 0.82% perf git [.] lookup_blob
> 0.78% git [kernel.kallsyms] [k] kstat_irqs_cpu
>
> kernel symbols are only a small portion of the symbols. (a single line in this
> case)
>
> To get to those other symbols we have to read the ELF symbols of those
> binaries in the guest filesystem, in the post-processing/reporting phase. This
> is both complex to do and relatively slow so we dont want to (and cannot) do
> this at sample time from IRQ context or NMI context ...
>

Okay. So a symbol server is necessary. Still, I don't think -kernel is
a good reason for including the symbol server in the kernel itself. If
someone uses it extensively together with perf, _and_ they can't put the
symbol server in the guest for some reason, let them patch mkinitrd to
include it.

> Also, many aspects of reporting are interactive so it's done lazily or
> on-demand. So we need ready access to the guest filesystem - for those guests
> which decide to integrate with the host for this.
>
> 2) the 'SystemTap mistake'
>
> You are assuming that the symbols of the kernel when it got built got saved
> properly and are discoverable easily. In reality those symbols can be erased
> by a make clean, can be modified by a new build, can be misplaced and can
> generally be hard to find because each distro puts them in a different
> installation path.
>
> My 10+ years experience with kernel instrumentation solutions is that
> kernel-driven, self-sufficient, robust, trustable, well-enumerated sources of
> information work far better in practice.
>

What about line number information? And the source? Into the kernel
with them as well?

> The thing is, in this thread i'm forced to repeat the same basic facts again
> and again. Could you _PLEASE_, pretty please, when it comes to instrumentation
> details, at least _read the mails_ of the guys who actually ... write and
> maintain Linux instrumentation code? This is getting ridiculous really.
>

I've read every one of your emails. If I misunderstood or overlooked
something, I apologize. The thread is very long and at times
antagonistic so it's hard to keep all the details straight.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 22 Mar 2010 03:00

On 03/21/2010 11:52 PM, Ingo Molnar wrote:
> * Avi Kivity<avi(a)redhat.com> wrote:
>
>
>>> I.e. you are arguing for microkernel Linux, while you see me as arguing
>>> for a monolithic kernel.
>>>
>> No. I'm arguing for reducing bloat wherever possible. Kernel code is more
>> expensive than userspace code in every metric possible.
>>
> 1)
>
> One of the primary design arguments of the micro-kernel design as well was to
> push as much into user-space as possible without impacting performance too
> much so you very much seem to be arguing for a micro-kernel design for the
> kernel.
>
> I think history has given us the answer for that fight between microkernels
> and monolithic kernels.
>

I am not arguing for a microkernel. Again: reduce bloat where possible,
kernel code is more expensive than userspace code.

> Furthermore, to not engage in hypotheticals about microkernels: by your
> argument the Oprofile design was perfect (it was minimalistic kernel-space,
> with all the complexity in user-space), while perf was over-complex (which
> does many things in the kernel that could have been done in user-space).
>
> Practical results suggest the exact opposite happened - Oprofile is being
> replaced by perf. How do you explain that?
>

I did not say that the amount of kernel and userspace code is the only
factor deciding the quality of software. If that were so, microkernels
would have won out long ago.

It may be that that perf has too much kernel code, and won against
oprofile despite that because it was better in other areas. Or it may
be that perf has exactly the right user/kernel division. Or maybe perf
needs some of the code moved from userspace to the kernel. I don't
know, I haven't examined the code.

The user/kernel boundary is only one metric for code quality. Nor is it
always in favour of pushing things to userspace. Narrowing or
simplifying an interface is often an argument in favour of pushing
things into the kernel.

IMO the reason perf is more usable than oprofile has less to do with the
kernel/userspace boundary and more do to with effort and attention spent
on the userspace/user boundary.

> 2)
>
> In your analysis you again ignore the package boundary costs and artifacts as
> if they didnt exist.
>
> That was my main argument, and that is what we saw with oprofile and perf:
> while maintaining more kernel-code may be more expensive, it sure pays off for
> getting us a much better solution in the end.
>

Package costs are real. We need to bear them. I don't think that
because maintaining another package (and the interface between two
packages) is more difficult, then the kernel size should increase.

> And getting a 'much better solution' to users is the goal of all this, isnt
> it?
>
> I dont mind what you call 'bloat' per se if it's for a purpose that users find
> like a good deal. I have quite a bit of RAM in most of my systems, having 50K
> more or less included in the kernel image is far less important than having a
> healthy and vibrant development model and having satisfied users ...
>

I'm not worried about 50K or so, I'm worried about a bug in those 50K
taking down the guest.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Zhang, Yanmin on 22 Mar 2010 03:00

On Sun, 2010-03-21 at 22:20 +0100, Ingo Molnar wrote:
> * Avi Kivity <avi(a)redhat.com> wrote:
>
> > > Well, for what it's worth, I rarely ever use anything else. My virtual
> > > disks are raw so I can loop mount them easily, and I can also switch my
> > > guest kernels from outside... without ever needing to mount those disks.
> >
> > Curious, what do you use them for?
> >
> > btw, if you build your kernel outside the guest, then you already have
> > access to all its symbols, without needing anything further.
>
> There's two errors with your argument:
>
> 1) you are assuming that it's only about kernel symbols
>
> Look at this 'perf report' output:
>
> # Samples: 7127509216
> #
> # Overhead Command Shared Object Symbol
> # ........ .......... ............................. ......
> #
> 19.14% git git [.] lookup_object
> 15.16% perf git [.] lookup_object
> 4.74% perf libz.so.1.2.3 [.] inflate
> 4.52% git libz.so.1.2.3 [.] inflate
> 4.21% perf libz.so.1.2.3 [.] inflate_table
> 3.94% git libz.so.1.2.3 [.] inflate_table
> 3.29% git git [.] find_pack_entry_one
> 3.24% git libz.so.1.2.3 [.] inflate_fast
> 2.96% perf libz.so.1.2.3 [.] inflate_fast
> 2.96% git git [.] decode_tree_entry
> 2.80% perf libc-2.11.90.so [.] __strlen_sse42
> 2.56% git libc-2.11.90.so [.] __strlen_sse42
> 1.98% perf libc-2.11.90.so [.] __GI_memcpy
> 1.71% perf git [.] decode_tree_entry
> 1.53% git libc-2.11.90.so [.] __GI_memcpy
> 1.48% git git [.] lookup_blob
> 1.30% git git [.] process_tree
> 1.30% perf git [.] process_tree
> 0.90% perf git [.] tree_entry
> 0.82% perf git [.] lookup_blob
> 0.78% git [kernel.kallsyms] [k] kstat_irqs_cpu
>
> kernel symbols are only a small portion of the symbols. (a single line in this
> case)
Above example shows perf could summarize both kernel and application hot functions.
If we collect guest os statistics from host side, we can't summarize detailed guest os
application info because we couldn't get guest os's application process id from host
side. So we could only get detailed kernel info and the total utilization percent of
guest application processes.

>
> To get to those other symbols we have to read the ELF symbols of those
> binaries in the guest filesystem, in the post-processing/reporting phase. This
> is both complex to do and relatively slow so we dont want to (and cannot) do
> this at sample time from IRQ context or NMI context ...
>
> Also, many aspects of reporting are interactive so it's done lazily or
> on-demand. So we need ready access to the guest filesystem - for those guests
> which decide to integrate with the host for this.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 22 Mar 2010 03:20

On 03/22/2010 12:00 AM, Ingo Molnar wrote:
> * Avi Kivity<avi(a)redhat.com> wrote:
>
>
>>> Consider the _other_ examples that are a lot more clear:
>>>
>>> ' If you expose paravirt spilocks via KVM please also make sure the KVM
>>> tooling can make use of it, has an option for it to configure it, and
>>> that it has sufficient efficiency statistics displayed in the tool for
>>> admins to monitor.'
>>>
>>> ' If you create this new paravirt driver then please also make sure it can
>>> be configured in the tooling. '
>>>
>>> ' Please also add a testcase for this bug to tools/kvm/testcases/ so we dont
>>> repeat this same mistake in the future. '
>>>
>> All three happen quite commonly in qemu/kvm development. Of course someone
>> who develops a feature also develops a patch that exposes it in qemu. There
>> are several test cases in qemu-kvm.git/kvm/user/test.
>>
> If that is the theory then it has failed to trickle through in practice. As
> you know i have reported a long list of usability problems with hardly a look.
> That list could be created by pretty much anyone spending a few minutes of
> getting a first impression with qemu-kvm.
>

It does happen in practice, just not in the GUI areas, since no one is
working on them. I am not going to condition a qcow2 reliability fix to
a gtk GUI.

> So something is seriously wrong in KVM land, to pretty much anyone trying it
> for the first time. I have explained how i see the root cause of that, while
> you seem to suggest that there's nothing wrong to begin with. I guess we'll
> have to agree to disagree on that.
>

Not anyone trying it for the first time. RHEV-M users will see a
polished GUI that can be used to manage thousands of guests and hosts.
I presume IBM and Siemens (and all other contributors) users will also
enjoy a good user experience with their respective products. Qemu is
not the only GUI for kvm.

So far only one company was interested in a qemu GUI - the makers of
virtualbox. Unfortunately they chose not to contribute that back to
qemu, and no one was sufficiently motivated to pick out the bits and try
to merge them.

Again, if you are interested in a qemu GUI, you either have to write it
yourself or convince someone else to do it. My own plate is full and my
priorities are clear.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Prev: Irish 2010 Grant Winner
Next: [PATCH] staging: winbond: mds_f.h whitespace and CamelCase corrections.