VFIO driver: Non-privileged user level PCI drivers [Kernel]

Prev: ACPI patches for 2.6.35-merge
Next: ext4: Fix file fragmentation during large file write.

From: Avi Kivity on 2 Jun 2010 09:10

On 06/02/2010 03:50 PM, Joerg Roedel wrote:
>
>> The problem with this is that it is assymetric, dev1 is treated
>> differently from dev[234]. It's an unintuitive API.
>>
> Its by far more unintuitive that a process needs to explicitly bind a
> device to an iommu domain before it can do anything with it.

I don't really care about the iommu domain. It's a side effect. The
kernel takes care of it. I'm only worried about the API.

We have a memory map that is (often) the same for a set of devices. If
you were coding a non-kernel interface, how would you code it?

struct memory_map;
void memory_map_init(struct memory_map *mm, ...);
struct device;
void device_set_memory_map(struct device *device, struct memory_map *mm);

or

struct device;
void device_init_memory_map(struct device *dev, ...);
void device_clone_memory_map(struct device *dev, struct device *other);

I wouldn't even think of the second one personally.

> If its
> required anyway the binding can happen implicitly. We could allow to do
> a nop 'ioctl(dev1, SHARE, dev1)' to remove the asymmetry.
>

It's still special. You define the memory map only for the first
device. You have to make sure dev1 doesn't go away while sharing it.

> Note that this way of handling userspace iommu mappings is also a lot
> simpler for most use-cases outside of KVM. If a developer wants to write
> a userspace driver all it needs to do is:
>
> dev = open();
> ioctl(dev, MAP, ...);
> /* use device with mappings */
> close(dev);
>
> Which is much easier than the need to create a domain explicitly.
>

mm = open()
ioctl(mm, MAP, ...)
dev = open();
ioctl(dev, BIND, mm);
....
close(mm);
close(dev);

so yes, more work, but once you have multiple devices which come and go
dynamically things become simpler. The map object has global lifetime
(you can even construct it if you don't assign any devices), the devices
attach to it, memory hotplug updates the memory map but doesn't touch
devices.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Joerg Roedel on 2 Jun 2010 09:10

On Wed, Jun 02, 2010 at 03:34:17PM +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 02, 2010 at 02:19:28PM +0200, Joerg Roedel wrote:

> you normally need device mapped to start DMA.
> SHARE makes this bug more likely as you allow
> switching domains: mmap could be done before switching.

We need to support domain switching anyway for iommu emulation in a
guest. So if you consider this to be a problem (I don't) it will not go
away with your proposal.

> > dev1 = open();
> > ioctl(dev2, SHARE, dev1);
> > ioctl(dev3, SHARE, dev1);
> > ioctl(dev4, SHARE, dev1);
> >
> > So we actually save an ioctl.
>
> I thought we had a BIND ioctl?

I can't remember a BIND ioctl in my proposal. I remember an UNBIND, but
thats bad naming as you pointed out below. See my statement on this
below too.

> You undo SHARE with UNBIND?

Thats bad naming, agreed. Lets keep UNSHARE. Point is, we only need one
parameter to do this which removes any ambiguity:

ioctl(dev1, UNSHARE);

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Michael S. Tsirkin on 2 Jun 2010 09:30

On Wed, Jun 02, 2010 at 02:50:50PM +0200, Joerg Roedel wrote:
> On Wed, Jun 02, 2010 at 03:25:11PM +0300, Avi Kivity wrote:
> > On 06/02/2010 03:19 PM, Joerg Roedel wrote:
> >>
> >>> Yes. so you do:
> >>> iommu = open
> >>> ioctl(dev1, BIND, iommu)
> >>> ioctl(dev2, BIND, iommu)
> >>> ioctl(dev3, BIND, iommu)
> >>> ioctl(dev4, BIND, iommu)
> >>>
> >>> No need to add a SHARE ioctl.
> >>>
> >> In my proposal this looks like:
> >>
> >>
> >> dev1 = open();
> >> ioctl(dev2, SHARE, dev1);
> >> ioctl(dev3, SHARE, dev1);
> >> ioctl(dev4, SHARE, dev1);
> >>
> >> So we actually save an ioctl.
> >>
> >
> > The problem with this is that it is assymetric, dev1 is treated
> > differently from dev[234]. It's an unintuitive API.
>
> Its by far more unintuitive that a process needs to explicitly bind a
> device to an iommu domain before it can do anything with it.

The reason it is more intuitive is because it is harder to get it
wrong. If you swap iommu and device in the call, you get BADF
so you know you made a mistake. We can even make it work
both ways if we wanted to. With ioctl(dev1, BIND, dev2)
it breaks silently.

> If its
> required anyway the binding can happen implicitly. We could allow to do
> a nop 'ioctl(dev1, SHARE, dev1)' to remove the asymmetry.

And then when we assign meaning to it we find that half the apps
are broken because they did not call this ioctl.

> Note that this way of handling userspace iommu mappings is also a lot
> simpler for most use-cases outside of KVM. If a developer wants to write
> a userspace driver all it needs to do is:
>
> dev = open();
> ioctl(dev, MAP, ...);
> /* use device with mappings */
> close(dev);
>
> Which is much easier than the need to create a domain explicitly.
>
> Joerg

This simple scenario ignores all the real-life corner cases.
For example, with an explicit iommu open and bind application
can naturally detect that:
- we have run out of iommu domains
- iommu is unsupported
- iommu is in use by another, incompatible device
- device is in bad state
because each is a separate operation, so it is easy to produce meaningful
errors.

Another interesting thing that a separate iommu device supports is when
application A controls the iommu and application B
controls the device. This might be good to e.g. improve security
(B is run by root, A is unpriveledged and passes commands to/from B
over a pipe).

This is not possible when same fd is used for iommu and device.

--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Joerg Roedel on 2 Jun 2010 10:00

On Wed, Jun 02, 2010 at 04:06:21PM +0300, Avi Kivity wrote:
> On 06/02/2010 03:50 PM, Joerg Roedel wrote:

>> Its by far more unintuitive that a process needs to explicitly bind a
>> device to an iommu domain before it can do anything with it.
>
> I don't really care about the iommu domain. It's a side effect. The
> kernel takes care of it. I'm only worried about the API.

The proposed memory-map object is nothing else than a userspace
abstraction of an iommu-domain.

> We have a memory map that is (often) the same for a set of devices. If
> you were coding a non-kernel interface, how would you code it?
>
> struct memory_map;
> void memory_map_init(struct memory_map *mm, ...);
> struct device;
> void device_set_memory_map(struct device *device, struct memory_map *mm);
>
> or
>
> struct device;
> void device_init_memory_map(struct device *dev, ...);
> void device_clone_memory_map(struct device *dev, struct device *other);
>
> I wouldn't even think of the second one personally.

Right, a kernel-interface would be designed the first way. The IOMMU-API
is actually designed in this manner. But I still think we should keep it
simpler for userspace.

>> If its required anyway the binding can happen implicitly. We could
>> allow to do a nop 'ioctl(dev1, SHARE, dev1)' to remove the asymmetry.
>
> It's still special. You define the memory map only for the first
> device. You have to make sure dev1 doesn't go away while sharing it.

Must be a misunderstanding. In my proposal the domain is not owned by
one device. It is owned by all devices that share it and will only
vanish if all devices that use it are unbound (which happens when the file
descriptor is closed, for example).

> so yes, more work, but once you have multiple devices which come and go
> dynamically things become simpler. The map object has global lifetime
> (you can even construct it if you don't assign any devices), the devices
> attach to it, memory hotplug updates the memory map but doesn't touch
> devices.

I still think a userspace interface should be as simple as possible. But
since both ways will work I am not really opposed to Michael's proposal.
I just think its overkill for the common non-kvm usecase (a userspace
device driver).

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Joerg Roedel on 2 Jun 2010 10:10

On Wed, Jun 02, 2010 at 04:17:19PM +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 02, 2010 at 02:50:50PM +0200, Joerg Roedel wrote:
> > On Wed, Jun 02, 2010 at 03:25:11PM +0300, Avi Kivity wrote:
> > > On 06/02/2010 03:19 PM, Joerg Roedel wrote:

>
> > If its
> > required anyway the binding can happen implicitly. We could allow to do
> > a nop 'ioctl(dev1, SHARE, dev1)' to remove the asymmetry.
>
> And then when we assign meaning to it we find that half the apps
> are broken because they did not call this ioctl.

The meaning is already assigned and chaning it means changing the
userspace-abi which is a no-go.

> This simple scenario ignores all the real-life corner cases.
> For example, with an explicit iommu open and bind application
> can naturally detect that:
> - we have run out of iommu domains

ioctl(dev, MAP, ...) will fail in this case.

> - iommu is unsupported

Is best checked by open() anyway because userspace can't do anything
with the device before it is bound to a domain.

> - iommu is in use by another, incompatible device

How should this happen?

> - device is in bad state

How is this checked with your proposal and why can this not be detected
with my one?

> because each is a separate operation, so it is easy to produce meaningful
> errors.

Ok, this is true.

> Another interesting thing that a separate iommu device supports is when
> application A controls the iommu and application B
> controls the device.

Until Linux becomes a micro-kernel the IOMMU itself will _never_ be
controlled by an application.

> This might be good to e.g. improve security (B is run by root, A is
> unpriveledged and passes commands to/from B over a pipe).

Micro-kernel arguments. I hope a userspace controlled IOMMU in Linux
will never happen ;-)

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Prev: ACPI patches for 2.6.35-merge
Next: ext4: Fix file fragmentation during large file write.