From: Avi Kivity on
On 06/02/2010 12:26 AM, Tom Lyon wrote:
>
> I'm not really opposed to multiple devices per domain, but let me point out how I
> ended up here. First, the driver has two ways of mapping pages, one based on the
> iommu api and one based on the dma_map_sg api. With the latter, the system
> already allocates a domain per device and there's no way to control it. This was
> presumably done to help isolation between drivers. If there are multiple drivers
> in the user level, do we not want the same isoation to apply to them?
>

In the case of kvm, we don't want isolation between devices, because
that doesn't happen on real hardware. So if the guest programs devices
to dma to each other, we want that to succeed.

> Also, domains are not a very scarce resource - my little core i5 has 256,
> and the intel architecture goes to 64K.
>

But there is a 0.2% of mapped memory per domain cost for the page
tables. For the kvm use case, that could be significant since a guest
may have large amounts of memory and large numbers of assigned devices.

> And then there's the fact that it is possible to have multiple disjoint iommus on a system,
> so it may not even be possible to bring 2 devices under one domain.
>

That's indeed a deficiency.

> Given all that, I am inclined to leave it alone until someone has a real problem.
> Note that not sharing iommu domains doesn't mean you can't share device memory,
> just that you have to do multiple mappings
>

I think we do have a real problem (though a mild one).

The only issue I see with deferring the solution is that the API becomes
gnarly; both the kernel and userspace will have to support both APIs
forever. Perhaps we can implement the new API but defer the actual
sharing until later, don't know how much work this saves. Or Alex/Chris
can pitch in and help.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alex Williamson on
On Tue, 2010-06-01 at 13:28 +0300, Avi Kivity wrote:
> On 06/01/2010 12:55 PM, Michael S. Tsirkin wrote:
> >
> >> It can't program the iommu.
> >> What
> >> the patch proposes is that userspace tells vfio about the needed
> >> mappings, and vfio programs the iommu.
> >>
> > There seems to be some misunderstanding. The userspace interface
> > proposed forces a separate domain per device and forces userspace to
> > repeat iommu programming for each device. We are better off sharing a
> > domain between devices and programming the iommu once.
> >
>
> iommufd = open(/dev/iommu);
> ioctl(iommufd, IOMMUFD_ASSIGN_RANGE, ...)
> ioctl(vfiofd, VFIO_SET_IOMMU, iommufd)

It seems part of the annoyance of the current KVM device assignment is
that we have multiple files open, we mmap here, read there, write over
there, maybe, if it's not emulated. I quite like Tom's approach that we
have one stop shopping with /dev/vfio<n>, including config space
emulation so each driver doesn't have to try to write their own. So
continuing with that, shouldn't we be able to add a GET_IOMMU/SET_IOMMU
ioctl to vfio so that after we setup one device we can bind the next to
the same domain?

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Tom Lyon on
On Tuesday 01 June 2010 09:29:47 pm Alex Williamson wrote:
> On Tue, 2010-06-01 at 13:28 +0300, Avi Kivity wrote:
> > On 06/01/2010 12:55 PM, Michael S. Tsirkin wrote:
> > >
> > >> It can't program the iommu.
> > >> What
> > >> the patch proposes is that userspace tells vfio about the needed
> > >> mappings, and vfio programs the iommu.
> > >>
> > > There seems to be some misunderstanding. The userspace interface
> > > proposed forces a separate domain per device and forces userspace to
> > > repeat iommu programming for each device. We are better off sharing a
> > > domain between devices and programming the iommu once.
> > >
> >
> > iommufd = open(/dev/iommu);
> > ioctl(iommufd, IOMMUFD_ASSIGN_RANGE, ...)
> > ioctl(vfiofd, VFIO_SET_IOMMU, iommufd)
>
> It seems part of the annoyance of the current KVM device assignment is
> that we have multiple files open, we mmap here, read there, write over
> there, maybe, if it's not emulated. I quite like Tom's approach that we
> have one stop shopping with /dev/vfio<n>, including config space
> emulation so each driver doesn't have to try to write their own. So
> continuing with that, shouldn't we be able to add a GET_IOMMU/SET_IOMMU
> ioctl to vfio so that after we setup one device we can bind the next to
> the same domain?

This is just what I was thinking. But rather than a get/set, just use two fds.

ioctl(vfio_fd1, VFIO_SET_DOMAIN, vfio_fd2);

This may fail if there are really 2 different IOMMUs, so user code must be
prepared for failure, In addition, this is strictlyupwards compatible with
what is there now, so maybe we can add it later.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Avi Kivity on
On 06/02/2010 07:59 AM, Tom Lyon wrote:
>
> This is just what I was thinking. But rather than a get/set, just use two fds.
>
> ioctl(vfio_fd1, VFIO_SET_DOMAIN, vfio_fd2);
>
> This may fail if there are really 2 different IOMMUs, so user code must be
> prepared for failure, In addition, this is strictlyupwards compatible with
> what is there now, so maybe we can add it later.
>
>

What happens if one of the fds is later closed?

I don't like this conceptually. There is a 1:n relationship between the
memory map and the devices. Ignoring it will cause the API to have
warts. It's more straightforward to have an object to represent the
memory mapping (and talk to the iommus), and have devices bind to this
object.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Chris Wright on
* Avi Kivity (avi(a)redhat.com) wrote:
> On 06/02/2010 12:26 AM, Tom Lyon wrote:
> >
> >I'm not really opposed to multiple devices per domain, but let me point out how I
> >ended up here. First, the driver has two ways of mapping pages, one based on the
> >iommu api and one based on the dma_map_sg api. With the latter, the system
> >already allocates a domain per device and there's no way to control it. This was
> >presumably done to help isolation between drivers. If there are multiple drivers
> >in the user level, do we not want the same isoation to apply to them?
>
> In the case of kvm, we don't want isolation between devices, because
> that doesn't happen on real hardware.

Sure it does. That's exactly what happens when there's an iommu
involved with bare metal.

> So if the guest programs
> devices to dma to each other, we want that to succeed.

And it will as long as ATS is enabled (this is a basic requirement
for PCIe peer-to-peer traffic to succeed with an iommu involved on
bare metal).

That's how things currently are, i.e. we put all devices belonging to a
single guest in the same domain. However, it can be useful to put each
device belonging to a guest in a unique domain. Especially as qemu
grows support for iommu emulation, and guest OSes begin to understand
how to use a hw iommu.

> >Also, domains are not a very scarce resource - my little core i5 has 256,
> >and the intel architecture goes to 64K.
>
> But there is a 0.2% of mapped memory per domain cost for the page
> tables. For the kvm use case, that could be significant since a
> guest may have large amounts of memory and large numbers of assigned
> devices.
>
> >And then there's the fact that it is possible to have multiple disjoint iommus on a system,
> >so it may not even be possible to bring 2 devices under one domain.
>
> That's indeed a deficiency.

Not sure it's a deficiency. Typically to share page table mappings
across multiple iommu's you just have to do update/invalidate to each
hw iommu that is sharing the mapping. Alternatively, you can use more
memory and build/maintain identical mappings (as Tom alludes to below).

> >Given all that, I am inclined to leave it alone until someone has a real problem.
> >Note that not sharing iommu domains doesn't mean you can't share device memory,
> >just that you have to do multiple mappings
>
> I think we do have a real problem (though a mild one).
>
> The only issue I see with deferring the solution is that the API
> becomes gnarly; both the kernel and userspace will have to support
> both APIs forever. Perhaps we can implement the new API but defer
> the actual sharing until later, don't know how much work this saves.
> Or Alex/Chris can pitch in and help.

It really shouldn't be that complicated to create the API to allow for
flexible device <-> domain mappings, so I agree, makes sense to do it
right up front.

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/