From: Anthony Liguori on
On 03/16/2010 10:52 AM, Ingo Molnar wrote:
> You are quite mistaken: KVM isnt really a 'random unprivileged application' in
> this context, it is clearly an extension of system/kernel services.
>
> ( Which can be seen from the simple fact that what started the discussion was
> 'how do we get /proc/kallsyms from the guest'. I.e. an extension of the
> existing host-space /proc/kallsyms was desired. )
>

Random tools (like perf) should not be able to do what you describe.
It's a security nightmare.

If it's desirable to have /proc/kallsyms available, we can expose an
interface in QEMU to provide that. That can then be plumbed through
libvirt and QMP.

Then a management tool can use libvirt or QMP to obtain that information
and interact with the kernel appropriately.

> In that sense the most natural 'extension' would be the solution i mentioned a
> week or two ago: to have a (read only) mount of all guest filesystems, plus a
> channel for profiling/tracing data. That would make symbol parsing easier and
> it's what extends the existing 'host space' abstraction in the most natural
> way.
>
> ( It doesnt even have to be done via the kernel - Qemu could implement that
> via FUSE for example. )
>

No way. The guest has sensitive data and exposing it widely on the host
is a bad thing to do. It's a bad interface. We can expose specific
information about guests but only through our existing channels which
are validated through a security infrastructure.

Ultimately, your goal is to keep perf a simple tool with little
dependencies. But practically speaking, if you want to add features to
it, it's going to have to interact with other subsystems in the
appropriate way. That means, it's going to need to interact with
libvirt or QMP.

If you want all applications to expose their data via synthetic file
systems, then there's always plan9 :-)

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Anthony Liguori <anthony(a)codemonkey.ws> wrote:

> On 03/16/2010 08:08 AM, Ingo Molnar wrote:
> >* Avi Kivity<avi(a)redhat.com> wrote:
> >
> >>On 03/16/2010 02:29 PM, Ingo Molnar wrote:
> >>>I mean, i can trust a kernel service and i can trust /proc/kallsyms.
> >>>
> >>>Can perf trust a random process claiming to be Qemu? What's the trust
> >>>mechanism here?
> >>Obviously you can't trust anything you get from a guest, no matter how you
> >>get it.
> >I'm not talking about the symbol strings and addresses, and the object
> >contents for allocation (or debuginfo). I'm talking about the basic protocol
> >of establishing which guest is which.
> >
> >I.e. we really want to be able users to:
> >
> > 1) have it all working with a single guest, without having to specify 'which'
> > guest (qemu PID) to work with. That is the dominant usecase both for
> > developers and for a fair portion of testers.
>
> You're making too many assumptions.
>
> There is no list of guests anymore than there is a list of web browsers.
>
> You can have a multi-tenant scenario where you have distinct groups of
> virtual machines running as unprivileged users.

"multi-tenant" and groups is not a valid excuse at all for giving crappy
technology in the simplest case: when there's a single VM. Yes, eventually it
can be supported and any sane scheme will naturally support it too, but it's
by no means what we care about primarily when it comes to these tools.

I thought everyone learned the lesson behind SystemTap's failure (and to a
certain degree this was behind Oprofile's failure as well): when it comes to
tooling/instrumentation we dont want to concentrate on the fancy complex
setups and abstract requirements drawn up by CIOs, as development isnt being
done there. Concentrate on our developers today, and provide no-compromises
usability to those who contribute stuff.

If we dont help make the simplest (and most common) use-case convenient then
we are failing on a fundamental level.

> > 2) Have some reasonable symbolic identification for guests. For example a
> > usable approach would be to have 'perf kvm list', which would list all
> > currently active guests:
> >
> > $ perf kvm list
> > [1] Fedora
> > [2] OpenSuse
> > [3] Windows-XP
> > [4] Windows-7
> >
> > And from that point on 'perf kvm -g OpenSuse record' would do the obvious
> > thing. Users will be able to just use the 'OpenSuse' symbolic name for
> > that guest, even if the guest got restarted and switched its main PID.
>
> Does "perf kvm list" always run as root? What if two unprivileged users
> both have a VM named "Fedora"?

Again, the single-VM case is the most important case, by far. If you have
multiple VMs running and want to develop the kernel on multiple VMs (sounds
rather messy if you think it through ...), what would happen is similar to
what happens when we have two probes for example:

# perf probe schedule
Added new event:
probe:schedule (on schedule+0)

You can now use it on all perf tools, such as:

perf record -e probe:schedule -a sleep 1

# perf probe -f schedule
Added new event:
probe:schedule_1 (on schedule+0)

You can now use it on all perf tools, such as:

perf record -e probe:schedule_1 -a sleep 1

# perf probe -f schedule
Added new event:
probe:schedule_2 (on schedule+0)

You can now use it on all perf tools, such as:

perf record -e probe:schedule_2 -a sleep 1

Something similar could be used for KVM/Qemu: whichever got created first is
named 'Fedora', the second is named 'Fedora-2'.

> If we look at the use-case, it's going to be something like, a user is
> creating virtual machines and wants to get performance information about
> them.
>
> Having to run a separate tool like perf is not going to be what they would
> expect they had to do. Instead, they would either use their existing GUI
> tool (like virt-manager) or they would use their management interface
> (either QMP or libvirt).
>
> The complexity of interaction is due to the fact that perf shouldn't be a
> stand alone tool. It should be a library or something with a programmatic
> interface that another tool can make use of.

But ... a GUI interface/integration is of course possible too, and it's being
worked on.

perf is mainly a kernel developer tool, and kernel developers generally dont
use GUIs to do their stuff: which is the (sole) reason why its first ~850
commits of tools/perf/ were done without a GUI. We go where our developers
are.

In any case it's not an excuse to have no proper command-line tooling. In fact
if you cannot get simpler, more atomic command-line tooling right then you'll
probably doubly suck at doing a GUI as well.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Anthony Liguori <aliguori(a)linux.vnet.ibm.com> wrote:

> On 03/16/2010 10:52 AM, Ingo Molnar wrote:
> >You are quite mistaken: KVM isnt really a 'random unprivileged application' in
> >this context, it is clearly an extension of system/kernel services.
> >
> >( Which can be seen from the simple fact that what started the discussion was
> > 'how do we get /proc/kallsyms from the guest'. I.e. an extension of the
> > existing host-space /proc/kallsyms was desired. )
>
> Random tools (like perf) should not be able to do what you describe. It's a
> security nightmare.

A security nightmare exactly how? Mind to go into details as i dont understand
your point.

> If it's desirable to have /proc/kallsyms available, we can expose an
> interface in QEMU to provide that. That can then be plumbed through libvirt
> and QMP.
>
> Then a management tool can use libvirt or QMP to obtain that information and
> interact with the kernel appropriately.
>
> > In that sense the most natural 'extension' would be the solution i
> > mentioned a week or two ago: to have a (read only) mount of all guest
> > filesystems, plus a channel for profiling/tracing data. That would make
> > symbol parsing easier and it's what extends the existing 'host space'
> > abstraction in the most natural way.
> >
> > ( It doesnt even have to be done via the kernel - Qemu could implement that
> > via FUSE for example. )
>
> No way. The guest has sensitive data and exposing it widely on the host is
> a bad thing to do. [...]

Firstly, you are putting words into my mouth, as i said nothing about
'exposing it widely'. I suggest exposing it under the privileges of whoever
has access to the guest image.

Secondly, regarding confidentiality, and this is guest security 101: whoever
can access the image on the host _already_ has access to all the guest data!

A Linux image can generally be loopback mounted straight away:

losetup -o 32256 /dev/loop0 ./guest-image.img
mount -o ro /dev/loop0 /mnt-guest

(Or, if you are an unprivileged user who cannot mount, it can be read via ext2
tools.)

There's nothing the guest can do about that. The host is in total control of
guest image data for heaven's sake!

All i'm suggesting is to make what is already possible more convenient.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Anthony Liguori on
On 03/16/2010 12:52 PM, Ingo Molnar wrote:
> * Anthony Liguori<aliguori(a)linux.vnet.ibm.com> wrote:
>
>
>> On 03/16/2010 10:52 AM, Ingo Molnar wrote:
>>
>>> You are quite mistaken: KVM isnt really a 'random unprivileged application' in
>>> this context, it is clearly an extension of system/kernel services.
>>>
>>> ( Which can be seen from the simple fact that what started the discussion was
>>> 'how do we get /proc/kallsyms from the guest'. I.e. an extension of the
>>> existing host-space /proc/kallsyms was desired. )
>>>
>> Random tools (like perf) should not be able to do what you describe. It's a
>> security nightmare.
>>
> A security nightmare exactly how? Mind to go into details as i dont understand
> your point.
>

Assume you're using SELinux to implement mandatory access control. How
do you label this file system?

Generally speaking, we don't know the difference between /proc/kallsyms
vs. /dev/mem if we do generic passthrough. While it might be safe to
have a relaxed label of kallsyms (since it's read only), it's clearly
not safe to do that for /dev/mem, /etc/shadow, or any file containing
sensitive information.

Rather, we ought to expose a higher level interface that we have more
confidence in with respect to understanding the ramifications of
exposing that guest data.

>>
>> No way. The guest has sensitive data and exposing it widely on the host is
>> a bad thing to do. [...]
>>
> Firstly, you are putting words into my mouth, as i said nothing about
> 'exposing it widely'. I suggest exposing it under the privileges of whoever
> has access to the guest image.
>

That doesn't work as nicely with SELinux.

It's completely reasonable to have a user that can interact in a read
only mode with a VM via libvirt but cannot read the guest's disk images
or the guest's memory contents.

> Secondly, regarding confidentiality, and this is guest security 101: whoever
> can access the image on the host _already_ has access to all the guest data!
>
> A Linux image can generally be loopback mounted straight away:
>
> losetup -o 32256 /dev/loop0 ./guest-image.img
> mount -o ro /dev/loop0 /mnt-guest
>
> (Or, if you are an unprivileged user who cannot mount, it can be read via ext2
> tools.)
>
> There's nothing the guest can do about that. The host is in total control of
> guest image data for heaven's sake!
>

It's not that simple in a MAC environment.

Regards,

Anthony Liguori

> All i'm suggesting is to make what is already possible more convenient.
>
> Ingo
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Anthony Liguori <aliguori(a)linux.vnet.ibm.com> wrote:

> On 03/16/2010 12:52 PM, Ingo Molnar wrote:
> >* Anthony Liguori<aliguori(a)linux.vnet.ibm.com> wrote:
> >
> >>On 03/16/2010 10:52 AM, Ingo Molnar wrote:
> >>>You are quite mistaken: KVM isnt really a 'random unprivileged application' in
> >>>this context, it is clearly an extension of system/kernel services.
> >>>
> >>>( Which can be seen from the simple fact that what started the discussion was
> >>> 'how do we get /proc/kallsyms from the guest'. I.e. an extension of the
> >>> existing host-space /proc/kallsyms was desired. )
> >>Random tools (like perf) should not be able to do what you describe. It's a
> >>security nightmare.
> >A security nightmare exactly how? Mind to go into details as i dont understand
> >your point.
>
> Assume you're using SELinux to implement mandatory access control.
> How do you label this file system?
>
> Generally speaking, we don't know the difference between /proc/kallsyms vs.
> /dev/mem if we do generic passthrough. While it might be safe to have a
> relaxed label of kallsyms (since it's read only), it's clearly not safe to
> do that for /dev/mem, /etc/shadow, or any file containing sensitive
> information.

What's your _point_? Please outline a threat model, a vector of attack,
_anything_ that substantiates your "it's a security nightmare" claim.

> Rather, we ought to expose a higher level interface that we have more
> confidence in with respect to understanding the ramifications of exposing
> that guest data.

Exactly, we want something that has a flexible namespace and works well with
Linux tools in general. Preferably that namespace should be human readable,
and it should be hierarchic, and it should have a well-known permission model.

This concept exists in Linux and is generally called a 'filesystem'.

> >> No way. The guest has sensitive data and exposing it widely on the host
> >> is a bad thing to do. [...]
> >
> > Firstly, you are putting words into my mouth, as i said nothing about
> > 'exposing it widely'. I suggest exposing it under the privileges of
> > whoever has access to the guest image.
>
> That doesn't work as nicely with SELinux.
>
> It's completely reasonable to have a user that can interact in a read only
> mode with a VM via libvirt but cannot read the guest's disk images or the
> guest's memory contents.

If a user cannot read the image file then the user has no access to its
contents via other namespaces either. That is, of course, a basic security
aspect.

( That is perfectly true with a non-SELinux Unix permission model as well, and
is true in the SELinux case as well. )

> > Secondly, regarding confidentiality, and this is guest security 101: whoever
> > can access the image on the host _already_ has access to all the guest data!
> >
> > A Linux image can generally be loopback mounted straight away:
> >
> > losetup -o 32256 /dev/loop0 ./guest-image.img
> > mount -o ro /dev/loop0 /mnt-guest
> >
> >(Or, if you are an unprivileged user who cannot mount, it can be read via ext2
> >tools.)
> >
> > There's nothing the guest can do about that. The host is in total control of
> > guest image data for heaven's sake!
>
> It's not that simple in a MAC environment.

Erm. Please explain to me, what exactly is 'not that simple' in a MAC
environment?

Also, i'd like to note that the 'restrictive SELinux setups' usecases are
pretty secondary.

To demonstrate that, i'd like every KVM developer on this list who reads this
mail and who has their home development system where they produce their
patches set up in a restrictive MAC environment, in that you cannot even read
the images you are using, to chime in with a "I'm doing that" reply.

If there's just a _single_ KVM developer amongst dozens and dozens of
developers on this list who develops in an environment like that i'd be
surprised. That result should pretty much tell you where the weight of
instrumentation focus should lie - and it isnt on restrictive MAC environments
....

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/