From: Joerg Roedel on
On Wed, Jun 02, 2010 at 01:15:34PM +0300, Michael S. Tsirkin wrote:
> One of the issues I see with the current patch is that
> it uses the mlock rlimit to do this pinning. So this wastes the rlimit
> for an app that did mlockall already, and also consumes
> this resource transparently, so an app might call mlock
> on a small buffer and be surprised that it fails.
>
> Using mmu notifiers might help?

MMU notifiers are problematic because they are designed for situations
where we can do demand paging. The invalidate_range_start and
invalidate_range_end functions are not only called on munmap, they also
run when mprotect is called (in which case we don't want to tear down
iommu mappings). So what may happen with mmu notifiers is that we
accidentially tear down iommu mappings. With demand-paging this is no
problem because the io-ptes could be re-faulted. But that does not work
here.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Joerg Roedel on
On Wed, Jun 02, 2010 at 01:21:44PM +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 02, 2010 at 12:19:40PM +0200, Joerg Roedel wrote:

> > It can. The worst thing that can happen is an io-page-fault.
>
> devices might not be able to recover from this.

With the userspace interface a process can create io-page-faults
anyway if it wants. We can't protect us from this. And the process is
also responsible to not tear down iommu-mappings that are currently in
use.

> What you describe below does 3 ioctls for what can be done with 1.

The second IOMMU_MAP ioctl is just to show that existing mappings would
be destroyed if the device is assigned to another address space. Not
strictly necessary. So we have two ioctls but save one call to create
the iommu-domain.

> > ioctl(dev2, IOMMU_SHARE, dev1); /* destroys all mapping for dev2 and
> > assigns it to the same domain as
> > dev1. Domain has a refcount of two
> > now */
>
> Or maybe it destroys mapping for dev1?
> How do you remember?

Because we express here that "dev2 shares the iommu mappings of dev1".
Thats easy to remember.

> Also, no way to unshare? That seems limiting.

Just left out for simplicity reasons. An IOMMU_UNBIND (no IOMMU_UNSHARE
because that would require a second parameter) ioctl is certainly also
required.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael S. Tsirkin on
On Wed, Jun 02, 2010 at 12:19:40PM +0200, Joerg Roedel wrote:
> On Wed, Jun 02, 2010 at 12:53:12PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Jun 02, 2010 at 11:42:01AM +0200, Joerg Roedel wrote:
>
> > > IMO a seperate iommu-userspace driver is a nightmare for a userspace
> > > interface. It is just too complicated to use.
> >
> > One advantage would be that we can reuse the uio framework
> > for the devices themselves. So an existing app can just program
> > an iommu for DMA and keep using uio for interrupts and access.
>
> The driver is called UIO and not U-INTR-MMIO ;-) So I think handling
> IOMMU mappings belongs there.

Maybe it could be put there but the patch posted did not use uio.
And one of the reasons is that uio framework provides for
device access and interrupts but not for programming memory mappings.

Solutions (besides giving up on uio completely)
could include extending the framework in some way
(which was tried, but the result was not pretty) or adding
a separate driver for iommu and binding to that.

--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Joerg Roedel on
On Wed, Jun 02, 2010 at 01:38:28PM +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 02, 2010 at 12:35:16PM +0200, Joerg Roedel wrote:

> > With the userspace interface a process can create io-page-faults
> > anyway if it wants. We can't protect us from this.
>
> We could fail all operations until an iommu is bound. This will help
> catch bugs with access before setup. We can not do this if a domain is
> bound by default.

Even if it is bound to a domain the userspace driver could program the
device to do dma to unmapped regions causing io-page-faults. The kernel
can't do anything about it.

> > The second IOMMU_MAP ioctl is just to show that existing mappings would
> > be destroyed if the device is assigned to another address space. Not
> > strictly necessary. So we have two ioctls but save one call to create
> > the iommu-domain.
>
> With 10 devices you have 10 extra ioctls.

And this works implicitly with your proposal? Remember that we still
need to be able to provide seperate mappings for each device to support
IOMMU emulation for the guest. I think my proposal does not have any
extra costs.

> > Because we express here that "dev2 shares the iommu mappings of dev1".
> > Thats easy to remember.
>
> they both share the mappings. which one gets the iommu
> destroyed (breaking the device if it is now doing DMA)?

As I wrote the domain has a reference count and is destroyed only when
it goes down to zero. This does not happen as long as a device is bound
to it.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael S. Tsirkin on
On Wed, Jun 02, 2010 at 01:12:25PM +0200, Joerg Roedel wrote:
> On Wed, Jun 02, 2010 at 01:38:28PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Jun 02, 2010 at 12:35:16PM +0200, Joerg Roedel wrote:
>
> > > With the userspace interface a process can create io-page-faults
> > > anyway if it wants. We can't protect us from this.
> >
> > We could fail all operations until an iommu is bound. This will help
> > catch bugs with access before setup. We can not do this if a domain is
> > bound by default.
>
> Even if it is bound to a domain the userspace driver could program the
> device to do dma to unmapped regions causing io-page-faults. The kernel
> can't do anything about it.

It can always corrupt its own memory directly as well :)
But that is not a reason not to detect errors if we can,
and not to make APIs hard to misuse.

> > > The second IOMMU_MAP ioctl is just to show that existing mappings would
> > > be destroyed if the device is assigned to another address space. Not
> > > strictly necessary. So we have two ioctls but save one call to create
> > > the iommu-domain.
> >
> > With 10 devices you have 10 extra ioctls.
>
> And this works implicitly with your proposal?

Yes. so you do:
iommu = open
ioctl(dev1, BIND, iommu)
ioctl(dev2, BIND, iommu)
ioctl(dev3, BIND, iommu)
ioctl(dev4, BIND, iommu)

No need to add a SHARE ioctl.


> Remember that we still
> need to be able to provide seperate mappings for each device to support
> IOMMU emulation for the guest.

Generally not true. E.g. guest can enable iommu passthrough
or have domain per a group of devices.

> I think my proposal does not have any
> extra costs.

with my proposal we have 1 ioctl per device + 1 per domain.
with yours we have 2 ioctls per device is iommu is shared
and 1 if it is not shared.

as current apps share iommu it seems to make sense
to optimize for that.

> > > Because we express here that "dev2 shares the iommu mappings of dev1".
> > > Thats easy to remember.
> >
> > they both share the mappings. which one gets the iommu
> > destroyed (breaking the device if it is now doing DMA)?
>
> As I wrote the domain has a reference count and is destroyed only when
> it goes down to zero. This does not happen as long as a device is bound
> to it.
>
> Joerg

We were talking about UNSHARE ioctl:
ioctl(dev1, UNSHARE, dev2)
Does it change the domain for dev1 or dev2?
If you make a mistake you get a hard to debug bug.

--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/