From: Avi Kivity on
On 06/02/2010 12:42 PM, Joerg Roedel wrote:
> On Tue, Jun 01, 2010 at 12:55:32PM +0300, Michael S. Tsirkin wrote:
>
>
>> There seems to be some misunderstanding. The userspace interface
>> proposed forces a separate domain per device and forces userspace to
>> repeat iommu programming for each device. We are better off sharing a
>> domain between devices and programming the iommu once.
>>
>> The natural way to do this is to have an iommu driver for programming
>> iommu.
>>
> IMO a seperate iommu-userspace driver is a nightmare for a userspace
> interface. It is just too complicated to use. We can solve the problem
> of multiple devices-per-domain with an ioctl which allows binding one
> uio-device to the address-space on another. Thats much simpler.
>

This is non trivial with hotplug.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Joerg Roedel on
On Wed, Jun 02, 2010 at 12:49:28PM +0300, Avi Kivity wrote:
> On 06/02/2010 12:45 PM, Joerg Roedel wrote:
>> IOMMU mapped memory can not be swapped out because we can't do demand
>> paging on io-page-faults with current devices. We have to pin _all_
>> userspace memory that is mapped into an IOMMU domain.
>
> vhost doesn't pin memory.
>
> What I proposed is to describe the memory map using an object (fd), and
> pass it around to clients that use it: kvm, vhost, vfio. That way you
> maintain the memory map in a central location and broadcast changes to
> clients. Only a vfio client would result in memory being pinned.

Ah ok, so its only about the database which keeps the mapping
information.

> It can still work, but the interface needs to be extended to include
> dirty bitmap logging.

Thats hard to do. I am not sure about VT-d but the AMD IOMMU has no
dirty-bits in the page-table. And without demand-paging we can't really
tell what pages a device has written to. The only choice is to mark all
IOMMU-mapped pages dirty as long as they are mapped.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Joerg Roedel on
On Wed, Jun 02, 2010 at 12:53:12PM +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 02, 2010 at 11:42:01AM +0200, Joerg Roedel wrote:

> > IMO a seperate iommu-userspace driver is a nightmare for a userspace
> > interface. It is just too complicated to use.
>
> One advantage would be that we can reuse the uio framework
> for the devices themselves. So an existing app can just program
> an iommu for DMA and keep using uio for interrupts and access.

The driver is called UIO and not U-INTR-MMIO ;-) So I think handling
IOMMU mappings belongs there.

> > We can solve the problem
> > of multiple devices-per-domain with an ioctl which allows binding one
> > uio-device to the address-space on another.
>
> This would imply switching an iommu domain for a device while
> it could potentially be doing DMA. No idea whether this can be done
> in a safe manner.

It can. The worst thing that can happen is an io-page-fault.

> Forcing iommu assignment to be done as a first step seems much saner.

If we force it, there is no reason why not doing it implicitly.

We can do something like this then:

dev1 = open();
ioctl(dev1, IOMMU_MAP, ...); /* creates IOMMU domain and assigns dev1 to
it*/

dev2 = open();
ioctl(dev2, IOMMU_MAP, ...);

/* Now dev1 and dev2 are in seperate domains */

ioctl(dev2, IOMMU_SHARE, dev1); /* destroys all mapping for dev2 and
assigns it to the same domain as
dev1. Domain has a refcount of two
now */

close(dev1); /* domain refcount goes down to one */
close(dev2); /* domain refcount is zero and domain gets destroyed */


Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael S. Tsirkin on
On Wed, Jun 02, 2010 at 12:04:04PM +0200, Joerg Roedel wrote:
> On Wed, Jun 02, 2010 at 12:49:28PM +0300, Avi Kivity wrote:
> > On 06/02/2010 12:45 PM, Joerg Roedel wrote:
> >> IOMMU mapped memory can not be swapped out because we can't do demand
> >> paging on io-page-faults with current devices. We have to pin _all_
> >> userspace memory that is mapped into an IOMMU domain.
> >
> > vhost doesn't pin memory.
> >
> > What I proposed is to describe the memory map using an object (fd), and
> > pass it around to clients that use it: kvm, vhost, vfio. That way you
> > maintain the memory map in a central location and broadcast changes to
> > clients. Only a vfio client would result in memory being pinned.
>
> Ah ok, so its only about the database which keeps the mapping
> information.
>
> > It can still work, but the interface needs to be extended to include
> > dirty bitmap logging.
>
> Thats hard to do. I am not sure about VT-d but the AMD IOMMU has no
> dirty-bits in the page-table. And without demand-paging we can't really
> tell what pages a device has written to. The only choice is to mark all
> IOMMU-mapped pages dirty as long as they are mapped.
>
> Joerg

Or mark them dirty when they are unmapped.

--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael S. Tsirkin on
On Wed, Jun 02, 2010 at 11:45:27AM +0200, Joerg Roedel wrote:
> On Tue, Jun 01, 2010 at 03:41:55PM +0300, Avi Kivity wrote:
> > On 06/01/2010 01:46 PM, Michael S. Tsirkin wrote:
>
> >> Main difference is that vhost works fine with unlocked
> >> memory, paging it in on demand. iommu needs to unmap
> >> memory when it is swapped out or relocated.
> >>
> > So you'd just take the memory map and not pin anything. This way you
> > can reuse the memory map.
> >
> > But no, it doesn't handle the dirty bitmap, so no go.
>
> IOMMU mapped memory can not be swapped out because we can't do demand
> paging on io-page-faults with current devices. We have to pin _all_
> userspace memory that is mapped into an IOMMU domain.
>
> Joerg


One of the issues I see with the current patch is that
it uses the mlock rlimit to do this pinning. So this wastes the rlimit
for an app that did mlockall already, and also consumes
this resource transparently, so an app might call mlock
on a small buffer and be surprised that it fails.

Using mmu notifiers might help?


--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/