From: Avi Kivity on
On 06/02/2010 08:29 AM, Chris Wright wrote:
> * Avi Kivity (avi(a)redhat.com) wrote:
>
>> On 06/02/2010 12:26 AM, Tom Lyon wrote:
>>
>>> I'm not really opposed to multiple devices per domain, but let me point out how I
>>> ended up here. First, the driver has two ways of mapping pages, one based on the
>>> iommu api and one based on the dma_map_sg api. With the latter, the system
>>> already allocates a domain per device and there's no way to control it. This was
>>> presumably done to help isolation between drivers. If there are multiple drivers
>>> in the user level, do we not want the same isoation to apply to them?
>>>
>> In the case of kvm, we don't want isolation between devices, because
>> that doesn't happen on real hardware.
>>
> Sure it does. That's exactly what happens when there's an iommu
> involved with bare metal.
>

But we are emulating a machine without an iommu.

When we emulate a machine with an iommu, then yes, we'll want to use as
many domains as the guest does.

>> So if the guest programs
>> devices to dma to each other, we want that to succeed.
>>
> And it will as long as ATS is enabled (this is a basic requirement
> for PCIe peer-to-peer traffic to succeed with an iommu involved on
> bare metal).
>
> That's how things currently are, i.e. we put all devices belonging to a
> single guest in the same domain. However, it can be useful to put each
> device belonging to a guest in a unique domain. Especially as qemu
> grows support for iommu emulation, and guest OSes begin to understand
> how to use a hw iommu.
>

Right, we need to keep flexibility.

>>> And then there's the fact that it is possible to have multiple disjoint iommus on a system,
>>> so it may not even be possible to bring 2 devices under one domain.
>>>
>> That's indeed a deficiency.
>>
> Not sure it's a deficiency. Typically to share page table mappings
> across multiple iommu's you just have to do update/invalidate to each
> hw iommu that is sharing the mapping. Alternatively, you can use more
> memory and build/maintain identical mappings (as Tom alludes to below).
>

Sharing the page tables is just an optimization, I was worried about
devices in separate domains not talking to each other. if ATS fixes
that, great.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Joerg Roedel on
On Tue, Jun 01, 2010 at 12:55:32PM +0300, Michael S. Tsirkin wrote:

> There seems to be some misunderstanding. The userspace interface
> proposed forces a separate domain per device and forces userspace to
> repeat iommu programming for each device. We are better off sharing a
> domain between devices and programming the iommu once.
>
> The natural way to do this is to have an iommu driver for programming
> iommu.

IMO a seperate iommu-userspace driver is a nightmare for a userspace
interface. It is just too complicated to use. We can solve the problem
of multiple devices-per-domain with an ioctl which allows binding one
uio-device to the address-space on another. Thats much simpler.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael S. Tsirkin on
On Wed, Jun 02, 2010 at 11:42:01AM +0200, Joerg Roedel wrote:
> On Tue, Jun 01, 2010 at 12:55:32PM +0300, Michael S. Tsirkin wrote:
>
> > There seems to be some misunderstanding. The userspace interface
> > proposed forces a separate domain per device and forces userspace to
> > repeat iommu programming for each device. We are better off sharing a
> > domain between devices and programming the iommu once.
> >
> > The natural way to do this is to have an iommu driver for programming
> > iommu.
>
> IMO a seperate iommu-userspace driver is a nightmare for a userspace
> interface. It is just too complicated to use.

One advantage would be that we can reuse the uio framework
for the devices themselves. So an existing app can just program
an iommu for DMA and keep using uio for interrupts and access.

> We can solve the problem
> of multiple devices-per-domain with an ioctl which allows binding one
> uio-device to the address-space on another.

This would imply switching an iommu domain for a device while
it could potentially be doing DMA. No idea whether this can be done
in a safe manner.
Forcing iommu assignment to be done as a first step seems much saner.


> Thats much simpler.
>
> Joerg


So instead of
dev = open();
ioctl(dev, ASSIGN, iommu)
mmap

and if we for ioctl mmap will fail
we have

dev = open();
if (ndevices > 0)
ioctl(devices[0], ASSIGN, dev)
mmap

And if we forget ioctl we get errors from device.
Seems more complicated to me.


There will also always exist the confusion: address space for
which device are we modifying? With a separate driver for iommu,
we can safely check that binding is done correctly.

--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Joerg Roedel on
On Tue, Jun 01, 2010 at 09:59:40PM -0700, Tom Lyon wrote:
> This is just what I was thinking. But rather than a get/set, just use two fds.
>
> ioctl(vfio_fd1, VFIO_SET_DOMAIN, vfio_fd2);
>
> This may fail if there are really 2 different IOMMUs, so user code must be
> prepared for failure, In addition, this is strictlyupwards compatible with
> what is there now, so maybe we can add it later.

How can this fail with multiple IOMMUs? This should be handled
transparently by the IOMMU driver.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Avi Kivity on
On 06/02/2010 12:45 PM, Joerg Roedel wrote:
> On Tue, Jun 01, 2010 at 03:41:55PM +0300, Avi Kivity wrote:
>
>> On 06/01/2010 01:46 PM, Michael S. Tsirkin wrote:
>>
>
>>> Main difference is that vhost works fine with unlocked
>>> memory, paging it in on demand. iommu needs to unmap
>>> memory when it is swapped out or relocated.
>>>
>>>
>> So you'd just take the memory map and not pin anything. This way you
>> can reuse the memory map.
>>
>> But no, it doesn't handle the dirty bitmap, so no go.
>>
> IOMMU mapped memory can not be swapped out because we can't do demand
> paging on io-page-faults with current devices. We have to pin _all_
> userspace memory that is mapped into an IOMMU domain.
>

vhost doesn't pin memory.

What I proposed is to describe the memory map using an object (fd), and
pass it around to clients that use it: kvm, vhost, vfio. That way you
maintain the memory map in a central location and broadcast changes to
clients. Only a vfio client would result in memory being pinned.

It can still work, but the interface needs to be extended to include
dirty bitmap logging.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/