From: Avi Kivity on
On 06/02/2010 01:04 PM, Joerg Roedel wrote:
> On Wed, Jun 02, 2010 at 12:49:28PM +0300, Avi Kivity wrote:
>
>> On 06/02/2010 12:45 PM, Joerg Roedel wrote:
>>
>>> IOMMU mapped memory can not be swapped out because we can't do demand
>>> paging on io-page-faults with current devices. We have to pin _all_
>>> userspace memory that is mapped into an IOMMU domain.
>>>
>> vhost doesn't pin memory.
>>
>> What I proposed is to describe the memory map using an object (fd), and
>> pass it around to clients that use it: kvm, vhost, vfio. That way you
>> maintain the memory map in a central location and broadcast changes to
>> clients. Only a vfio client would result in memory being pinned.
>>
> Ah ok, so its only about the database which keeps the mapping
> information.
>

Yes.

>
>> It can still work, but the interface needs to be extended to include
>> dirty bitmap logging.
>>
> Thats hard to do. I am not sure about VT-d but the AMD IOMMU has no
> dirty-bits in the page-table. And without demand-paging we can't really
> tell what pages a device has written to. The only choice is to mark all
> IOMMU-mapped pages dirty as long as they are mapped.
>
>

The interface would only work for clients which support it: kvm, vhost,
and iommu/devices with restartable dma.

Note dirty logging is not very interesting for vfio anyway, since you
can't live migrate with assigned devices.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Joerg Roedel on
On Wed, Jun 02, 2010 at 02:21:00PM +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 02, 2010 at 01:12:25PM +0200, Joerg Roedel wrote:

> > Even if it is bound to a domain the userspace driver could program the
> > device to do dma to unmapped regions causing io-page-faults. The kernel
> > can't do anything about it.
>
> It can always corrupt its own memory directly as well :)
> But that is not a reason not to detect errors if we can,
> and not to make APIs hard to misuse.

Changing the domain of a device while dma can happen is the same type of
bug as unmapping potential dma target addresses. We can't catch this
kind of misuse.

> > > With 10 devices you have 10 extra ioctls.
> >
> > And this works implicitly with your proposal?
>
> Yes. so you do:
> iommu = open
> ioctl(dev1, BIND, iommu)
> ioctl(dev2, BIND, iommu)
> ioctl(dev3, BIND, iommu)
> ioctl(dev4, BIND, iommu)
>
> No need to add a SHARE ioctl.

In my proposal this looks like:


dev1 = open();
ioctl(dev2, SHARE, dev1);
ioctl(dev3, SHARE, dev1);
ioctl(dev4, SHARE, dev1);

So we actually save an ioctl.

> > Remember that we still need to be able to provide seperate mappings
> > for each device to support IOMMU emulation for the guest.
>
> Generally not true. E.g. guest can enable iommu passthrough
> or have domain per a group of devices.

What I meant was that there may me multiple io-addresses spaces
necessary for one process. I didn't want to say that every device
_needs_ to have its own address space.

> > As I wrote the domain has a reference count and is destroyed only when
> > it goes down to zero. This does not happen as long as a device is bound
> > to it.
> >
> > Joerg
>
> We were talking about UNSHARE ioctl:
> ioctl(dev1, UNSHARE, dev2)
> Does it change the domain for dev1 or dev2?
> If you make a mistake you get a hard to debug bug.

As I already wrote we would have an UNBIND ioctl which just removes a
device from its current domain. UNBIND is better than UNSHARE for
exactly the reason you pointed out above. I thought I stated that
already.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Avi Kivity on
On 06/02/2010 03:19 PM, Joerg Roedel wrote:
>
>> Yes. so you do:
>> iommu = open
>> ioctl(dev1, BIND, iommu)
>> ioctl(dev2, BIND, iommu)
>> ioctl(dev3, BIND, iommu)
>> ioctl(dev4, BIND, iommu)
>>
>> No need to add a SHARE ioctl.
>>
> In my proposal this looks like:
>
>
> dev1 = open();
> ioctl(dev2, SHARE, dev1);
> ioctl(dev3, SHARE, dev1);
> ioctl(dev4, SHARE, dev1);
>
> So we actually save an ioctl.
>

The problem with this is that it is assymetric, dev1 is treated
differently from dev[234]. It's an unintuitive API.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael S. Tsirkin on
On Wed, Jun 02, 2010 at 02:19:28PM +0200, Joerg Roedel wrote:
> On Wed, Jun 02, 2010 at 02:21:00PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Jun 02, 2010 at 01:12:25PM +0200, Joerg Roedel wrote:
>
> > > Even if it is bound to a domain the userspace driver could program the
> > > device to do dma to unmapped regions causing io-page-faults. The kernel
> > > can't do anything about it.
> >
> > It can always corrupt its own memory directly as well :)
> > But that is not a reason not to detect errors if we can,
> > and not to make APIs hard to misuse.
>
> Changing the domain of a device while dma can happen is the same type of
> bug as unmapping potential dma target addresses. We can't catch this
> kind of misuse.

you normally need device mapped to start DMA.
SHARE makes this bug more likely as you allow
switching domains: mmap could be done before switching.

> > > > With 10 devices you have 10 extra ioctls.
> > >
> > > And this works implicitly with your proposal?
> >
> > Yes. so you do:
> > iommu = open
> > ioctl(dev1, BIND, iommu)
> > ioctl(dev2, BIND, iommu)
> > ioctl(dev3, BIND, iommu)
> > ioctl(dev4, BIND, iommu)
> >
> > No need to add a SHARE ioctl.
>
> In my proposal this looks like:
>
>
> dev1 = open();
> ioctl(dev2, SHARE, dev1);
> ioctl(dev3, SHARE, dev1);
> ioctl(dev4, SHARE, dev1);
>
> So we actually save an ioctl.

I thought we had a BIND ioctl?

> > > Remember that we still need to be able to provide seperate mappings
> > > for each device to support IOMMU emulation for the guest.
> >
> > Generally not true. E.g. guest can enable iommu passthrough
> > or have domain per a group of devices.
>
> What I meant was that there may me multiple io-addresses spaces
> necessary for one process. I didn't want to say that every device
> _needs_ to have its own address space.
>
> > > As I wrote the domain has a reference count and is destroyed only when
> > > it goes down to zero. This does not happen as long as a device is bound
> > > to it.
> > >
> > > Joerg
> >
> > We were talking about UNSHARE ioctl:
> > ioctl(dev1, UNSHARE, dev2)
> > Does it change the domain for dev1 or dev2?
> > If you make a mistake you get a hard to debug bug.
>
> As I already wrote we would have an UNBIND ioctl which just removes a
> device from its current domain. UNBIND is better than UNSHARE for
> exactly the reason you pointed out above. I thought I stated that
> already.
>
> Joerg

You undo SHARE with UNBIND?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Joerg Roedel on
On Wed, Jun 02, 2010 at 03:25:11PM +0300, Avi Kivity wrote:
> On 06/02/2010 03:19 PM, Joerg Roedel wrote:
>>
>>> Yes. so you do:
>>> iommu = open
>>> ioctl(dev1, BIND, iommu)
>>> ioctl(dev2, BIND, iommu)
>>> ioctl(dev3, BIND, iommu)
>>> ioctl(dev4, BIND, iommu)
>>>
>>> No need to add a SHARE ioctl.
>>>
>> In my proposal this looks like:
>>
>>
>> dev1 = open();
>> ioctl(dev2, SHARE, dev1);
>> ioctl(dev3, SHARE, dev1);
>> ioctl(dev4, SHARE, dev1);
>>
>> So we actually save an ioctl.
>>
>
> The problem with this is that it is assymetric, dev1 is treated
> differently from dev[234]. It's an unintuitive API.

Its by far more unintuitive that a process needs to explicitly bind a
device to an iommu domain before it can do anything with it. If its
required anyway the binding can happen implicitly. We could allow to do
a nop 'ioctl(dev1, SHARE, dev1)' to remove the asymmetry.

Note that this way of handling userspace iommu mappings is also a lot
simpler for most use-cases outside of KVM. If a developer wants to write
a userspace driver all it needs to do is:

dev = open();
ioctl(dev, MAP, ...);
/* use device with mappings */
close(dev);

Which is much easier than the need to create a domain explicitly.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/