From: Ingo Molnar on

* Anthony Liguori <anthony(a)codemonkey.ws> wrote:

> On 03/22/2010 10:55 AM, Ingo Molnar wrote:
> >* Anthony Liguori<anthony(a)codemonkey.ws> wrote:
> >
> >>[...]
> >>
> >>I've been trying very hard to turn this into a productive thread attempting
> >>to capture your feedback and give clear suggestions about how you can solve
> >>achieve your desired functionality.
> >I'm glad that we are at this more productive stage. I'm still trying to
> >achieve the very same technological capabilities that i expressed in the first
> >few mails when i reviewed the 'perf kvm' patch that was submitted by Yanmin.
> >
> >The crux of the problem is very simple. To quote my earlier mail:
> >
> > |
> > | - The inconvenience of having to type:
> > | perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \
> > | --guestmodules=/home/ymzhang/guest/modules top
> > |
> > |
> > | is very obvious even with a single guest. Now multiply that by more guests ...
> > |
> >
> > For example we want 'perf kvm top' to do something useful by default: it
> > should find the first guest running and it should report its profile.
> >
> > The tool shouldnt have to guess about where the guests are, what their
> > namespaces is and how to talk to them. We also want easy symbolic access
> > to guest, for example:
> >
> > perf kvm -g OpenSuse-2 record sleep 1
>
> Two things are needed. The first thing needed is to be able to enumerate
> running guests and identify a symbolic name. I have a patch for this and
> it'll be posted this week or so. perf will need to have a QMP client and it
> will need to look in ${HOME}/.qemu/qmp/ to sockets to connect to.
>
> This is too much to expect from a client and we've got a GSoC idea posted to
> make a nice library for tools to use to simplify this.

Ok, that sounds interesting! I'd rather see some raw mechanism that 'perf kvm'
could use instead of having to require yet another library (which generally
dampens adoption of a tool). So i think we can work from there.

Btw., have you considered using Qemu's command name (task->comm[]) as the
symbolic name? That way we could see the guest name in 'top' on the host - a
nice touch.

> The sockets are named based on UUID and you'll have to connect to a guest
> and ask it for it's name. Some guests don't have names so we'll have to
> come up with a clever way to describe a nameless VM.

I think just exposing the UUID in that lazy case would be adequate? It creates
pressure for VM launchers to use better symbolic names.

> > I.e.:
> >
> > - Easy default reference to guest instances, and a way for tools to
> > reference them symbolically as well in the multi-guest case. Preferably
> > something trustable and kernel-provided - not some indirect information
> > like a PID file created by libvirt-manager or so.
>
> A guest is not a KVM concept. It's a qemu concept so it needs to be
> something provided by qemu. The other caveat is that you won't see guests
> created by libvirt because we're implementing this in terms of a default QMP
> device and libvirt will disable defaults. This is desired behaviour.
> libvirt wants to be in complete control and doesn't want a tool like perf
> interacting with a guest directly.

Hm, this sucks for multiple reasons. Firstly, perf isnt a tool that
'interacts', it's an observation tool: just like 'top' is an observation tool.

We want to enable developers to see all activities on the system - regardless
of who started the VM or who started the process. Imagine if we had a way to
hide tasks to hide from 'top'. It would be rather awful.

Secondly, it tells us that the concept is fragile if it doesnt automatically
enumerate all guests, regardless of how they were created.

Full system enumeration is generally best left to the kernel, as it can offer
coherent access.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Avi Kivity on
On 03/22/2010 06:51 PM, Ingo Molnar wrote:
> * Avi Kivity<avi(a)redhat.com> wrote:
>
>
>>> The crux of the problem is very simple. To quote my earlier mail:
>>>
>>> |
>>> | - The inconvenience of having to type:
>>> | perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \
>>> | --guestmodules=/home/ymzhang/guest/modules top
>>> |
>>> |
>>> | is very obvious even with a single guest. Now multiply that by more guests ...
>>> |
>>>
>>> For example we want 'perf kvm top' to do something useful by default: it
>>> should find the first guest running and it should report its profile.
>>>
>>> The tool shouldnt have to guess about where the guests are, what their
>>> namespaces is and how to talk to them. We also want easy symbolic access to
>>> guest, for example:
>>>
>>> perf kvm -g OpenSuse-2 record sleep 1
>>>
> [ Sidenote: i still received no adequate suggestions about how to provide this
> category of technical features. ]
>

You need to integrate with libvirt to convert guest names something that
can be used to obtain guest symbols.

>>> I.e.:
>>>
>>> - Easy default reference to guest instances, and a way for tools to
>>> reference them symbolically as well in the multi-guest case. Preferably
>>> something trustable and kernel-provided - not some indirect information
>>> like a PID file created by libvirt-manager or so.
>>>
>> Usually 'layering violation' is trotted out at such suggestions.
>> [...]
>>
> That's weird, how can a feature request be a 'layering violation'?
>

The 'something trustable and kernel-provided'. The kernel knows nothing
about guest names.

> If something that users find straightforward and usable is a layering
> violation to you (such as easily being able to access their own files on the
> host as well ...) then i think you need to revisit the definition of that term
> instead of trying to fix the user.
>

Here is the explanation, you left it quoted:

>> [...] I don't like using the term, because sometimes the layers are
>> incorrect and need to be violated. But it should be done explicitly, not as
>> a shortcut for a minor feature (and profiling is a minor feature, most users
>> will never use it, especially guest-from-host).
>>
>> The fact is we have well defined layers today, kvm virtualizes the cpu and
>> memory, qemu emulates devices for a single guest, libvirt manages guests.
>> We break this sometimes but there has to be a good reason. So perf needs to
>> talk to libvirt if it wants names. Could be done via linking, or can be
>> done using a pluging libvirt drops into perf.

>> You simply kept ignoring me when I said that if something can be kept out of
>> the kernel without impacting performance, it should be. I don't want
>> emergency patches closing some security hole or oops in a kernel symbol
>> server.
>>
> I never suggested an "in kernel space symbol server" which could oops, why
> would i have suggested that? Please point me to an email where i suggested
> that.
>

You insisted that it be in the kernel. Later you relaxed that and said
a daemon is fine. I'm not going to reread this thread, once is more
than enough.

>> The usability argument is a red herring. True, it takes time for things to
>> trickle down to distributions and users. Those who can't wait can download
>> the code and compile, it isn't that difficult.
>>
> It's not just "download and compile", it's also "configure correctly for
> several separate major distributions" and "configure to per guest instance
> local rules".
>

That's life in Linux-land. Either you let distributions feed you cooked
packages and relax, or you do the work yourself. If we had
tools/everything/ it wouldn't be this way, but we don't.

> It's far more fragile in practice than you make it appear to be, and since you
> yourself expressed that you are not interested much in the tooling side, how
> can you have adequate experience to judge such matters?
>

People on kvm-devel manage to build and run release tarballs and even
directly from git. I build packages from source occasionally. It isn't
fun but it doesn't take a PhD.

> In fact for instrumentation it's beyond a critical threshold of fragility -
> instrumentation above all needs to be accessible, transparent and robust.
>
> If you cannot see the advantages of a properly integrated solution then i
> suspect there's not much i can do to convince you.
>

Integration in Linux happens at the desktop or distribution level. You
want to move it to the kernel level. It works for perf, great, but that
doesn't mean it will work for everything else. Once perf grows a GUI, I
expect it will stop working for perf as well (for example, if gtk breaks
its API in a major release, which version will perf code for?)

> And you ignored not just me but you ignored several people in this thread who
> thought the current status quo was inadequate and expressed interest in both
> the VFS integration and in the guest enumeration features.
>

I'm sorry. I don't reply to every email. If you want my opinion on
something, you can ask me again.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Anthony Liguori <anthony(a)codemonkey.ws> wrote:

> > - Easy default reference to guest instances, and a way for tools to
> > reference them symbolically as well in the multi-guest case. Preferably
> > something trustable and kernel-provided - not some indirect information
> > like a PID file created by libvirt-manager or so.
>
> A guest is not a KVM concept. [...]

Well, in a sense a guest is a KVM concept too: it's in essence represented via
the 'vcpu state attached to a struct mm' abstraction that is attached to the
/dev/kvm file descriptor attached to a Linux process.

Multiple vcpus can be started by the same process to represent SMP, but the
whole guest notion is present: a Linux MM that carries KVM state.

In that sense when we type 'perf kvm list' we'd like to get a list of all
currently present guests that the developer has permission to profile: i.e.
we'd like a list of all [debuggable] Linux tasks that have a KVM instance
attached to them.

A convenient way to do that would be to use the Qemu process's ->comm[] name,
and to have a KVM ioctl that gets us a list of all vcpus that the querying
task has ptrace permission to. [the standard permission check we do for
instrumentation]

No need for communication with Qemu for that - just an ioctl, and an
always-guaranteed result that works fine on a whole-system and on a per user
basis as well.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Avi Kivity <avi(a)redhat.com> wrote:

> >>> - Easy default reference to guest instances, and a way for tools to
> >>> reference them symbolically as well in the multi-guest case. Preferably
> >>> something trustable and kernel-provided - not some indirect information
> >>> like a PID file created by libvirt-manager or so.
> >>
> >> Usually 'layering violation' is trotted out at such suggestions.
> >> [...]
> >
> > That's weird, how can a feature request be a 'layering violation'?
>
> The 'something trustable and kernel-provided'. The kernel knows nothing
> about guest names.

The kernel certainly knows about other resources such as task names or network
interface names or tracepoint names. This is kernel design 101.

> > If something that users find straightforward and usable is a layering
> > violation to you (such as easily being able to access their own files on
> > the host as well ...) then i think you need to revisit the definition of
> > that term instead of trying to fix the user.
>
> Here is the explanation, you left it quoted:
>
> >> [...] I don't like using the term, because sometimes the layers are
> >> incorrect and need to be violated. But it should be done explicitly, not
> >> as a shortcut for a minor feature (and profiling is a minor feature, most
> >> users will never use it, especially guest-from-host).
> >>
> >> The fact is we have well defined layers today, kvm virtualizes the cpu
> >> and memory, qemu emulates devices for a single guest, libvirt manages
> >> guests. We break this sometimes but there has to be a good reason. So
> >> perf needs to talk to libvirt if it wants names. Could be done via
> >> linking, or can be done using a pluging libvirt drops into perf.

This is really just the much-discredited microkernel approach for keeping
global enumeration data that should be kept by the kernel ...

Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by Anthony.
There's numerous ways that this can break:

- Those special files can get corrupted, mis-setup, get out of sync, or can
be hard to discover.

- The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
design flaw: it is per user. When i'm root i'd like to query _all_ current
guest images, not just the ones started by root. A system might not even
have a notion of '${HOME}'.

- Apps might start KVM vcpu instances without adhering to the
${HOME}/.qemu/qmp/ access method.

- There is no guarantee for the Qemu process to reply to a request - while
the kernel can always guarantee an enumeration result. I dont want 'perf
kvm' to hang or misbehave just because Qemu has hung.

Really, for such reasons user-space is pretty poor at doing system-wide
enumeration and resource management. Microkernels lost for a reason.

You are committing several grave design mistakes here.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Pekka Enberg <penberg(a)cs.helsinki.fi> wrote:

> Hi Frank,
>
> On Mon, Mar 22, 2010 at 7:17 PM, Frank Ch. Eigler <fche(a)redhat.com> wrote:
> > In your very previous paragraphs, you enumerate two separate causes:
> > "repository structure" and "development/maintenance process" as being
> > sources of "fun". ?Please simply accept that the former is considered
> > by many as absolutely trivial compared to the latter, and additional
> > verbose repetition of your thesis will not change this.
>
> I can accept that many people consider it trivial but the problem is that we
> have _real data_ on kmemtrace and now perf that the amount of contributors
> is significantly smaller when your code is outside the kernel repository.
> Now admittedly both of them are pretty intimate with the kernel but Ingo's
> suggestion of putting kvm-qemu in tools/ is an interesting idea
> nevertheless.

Correct.

> It's kinda funny to see people argue that having an external repository is
> not a problem and that it's not a big deal if building something from the
> repository is slightly painful as long as it doesn't require a PhD when we
> have _real world_ experience that it _does_ limit developer base in some
> cases. Whether or not that applies to kvm remains to be seen but I've yet to
> see a convincing argument why it doesn't.

Yeah.

Also, if in fact the claim that the 'repository does not matter' is true then
it doesnt matter that it's hosted in tools/kvm/ either, right?

I.e. it's a win-win situation. Worst-case nothing happens beyond a Git URI
change. Best-case the project is propelled to never seen heights due to
contribution advantages not contemplated and not experienced by the KVM guys
before ...

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/