VFIO driver: Non-privileged user level PCI drivers [Kernel]

Prev: ACPI patches for 2.6.35-merge
Next: ext4: Fix file fragmentation during large file write.

From: Michael S. Tsirkin on 1 Jul 2010 12:30

On Thu, Jul 01, 2010 at 09:48:41AM -0600, Alex Williamson wrote:
> On Thu, 2010-07-01 at 18:31 +0300, Michael S. Tsirkin wrote:
> > On Thu, Jul 01, 2010 at 09:29:04AM -0600, Alex Williamson wrote:
> > > On Tue, 2010-06-08 at 14:21 -0700, Tom Lyon wrote:
> > > > +The VFIO_DMA_MASK ioctl is used to set the maximum permissible DMA address
> > > > +(device dependent). It takes a single unsigned 64 bit integer as an argument.
> > > > +This call also has the side effect of enabling PCI bus mastership.
> > >
> > > Hi Tom,
> > >
> > > This interface doesn't make sense for the MAP_IOVA user. Especially in
> > > qemu, we have no idea what the DMA mask is for the device we're
> > > assigning. It doesn't really matter though because the guest will use
> > > bounce buffers internally once it loads the device specific drivers and
> > > discovers the DMA mask. This only seems relevant if we're using a
> > > DMA_MAP call that gets to pick the dmaaddr, so I'd propose we only make
> > > this a required call for that interface, and create a separate ioctl for
> > > actually enabling bus master. Thanks,
> > >
> > > Alex
> >
> > I expect there's no need for a separate ioctl to do this:
> > you can do this by write to the control register.
>
> Nope, vfio only allows direct writes to the memory and io space bits of
> the command register,

I don't see why's there need to protect the control register.
As far as I can see, nothing userspace does with it
can damage the host.

> all other bits are virtualized. I wonder if
> that's necessary though since we require the device to be attached to an
> iommu domain before we allow config space access.
>
> Alex
>

I don't think it's necessary. IMHO all the virtualization
tables can just be replaced with
if (pci header type == PCI_HEADER_TYPE_NORMAL)
if (addr < PCI_BASE_ADDRESS_0 + 24 &&
addr + len >= PCI_BASE_ADDRESS_0)
return -EPERM;
else /* similar thing for the bridge and cardbus
types */

Much simpler and more readable than tables full of 0xffff.

Reason this is enough is because virt drivers like qemu already
have the code to treat interrupt disable, MSI/MSIX capabilities
and BARs registers specially. custom userspace drivers simply
have no reason to touch anything besides the interrupt disable bit.

--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Tom Lyon on 1 Jul 2010 15:00

On Thursday 01 July 2010 08:48:41 am Alex Williamson wrote:
> On Thu, 2010-07-01 at 18:31 +0300, Michael S. Tsirkin wrote:
> > On Thu, Jul 01, 2010 at 09:29:04AM -0600, Alex Williamson wrote:
> > > On Tue, 2010-06-08 at 14:21 -0700, Tom Lyon wrote:
> > > > +The VFIO_DMA_MASK ioctl is used to set the maximum permissible DMA address
> > > > +(device dependent). It takes a single unsigned 64 bit integer as an argument.
> > > > +This call also has the side effect of enabling PCI bus mastership.
> > >
> > > Hi Tom,
> > >
> > > This interface doesn't make sense for the MAP_IOVA user. Especially in
> > > qemu, we have no idea what the DMA mask is for the device we're
> > > assigning. It doesn't really matter though because the guest will use
> > > bounce buffers internally once it loads the device specific drivers and
> > > discovers the DMA mask. This only seems relevant if we're using a
> > > DMA_MAP call that gets to pick the dmaaddr, so I'd propose we only make
> > > this a required call for that interface, and create a separate ioctl for
> > > actually enabling bus master. Thanks,
> > >
> > > Alex
> >
> > I expect there's no need for a separate ioctl to do this:
> > you can do this by write to the control register.
>
> Nope, vfio only allows direct writes to the memory and io space bits of
> the command register, all other bits are virtualized. I wonder if
> that's necessary though since we require the device to be attached to an
> iommu domain before we allow config space access.
>
I had already gotten rid of the mask setting & master mode ioctl. It was a remnant
of using the dma_map_sg API which is no longer in there. And I tweaked the
config stuff to allow writing the master bit.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Alex Williamson on 6 Jul 2010 01:00

Hi Tom.

A few MSI issues below. Thanks,

Alex

On Tue, 2010-06-08 at 14:21 -0700, Tom Lyon wrote:
> diff -uprN linux-2.6.34/drivers/vfio/vfio_pci_config.c vfio-linux-2.6.34/drivers/vfio/vfio_pci_config.c
> --- linux-2.6.34/drivers/vfio/vfio_pci_config.c 1969-12-31 16:00:00.000000000 -0800
> +++ vfio-linux-2.6.34/drivers/vfio/vfio_pci_config.c 2010-05-28 14:26:47.000000000 -0700
> +/*
> + * Lengths of PCI Config Capabilities
> + * 0 means unknown (but at least 4)
> + * FF means special/variable
> + */
> +static u8 pci_capability_length[] = {
> + [PCI_CAP_ID_BASIC] = 64, /* pci config header */
> + [PCI_CAP_ID_PM] = PCI_PM_SIZEOF,
> + [PCI_CAP_ID_AGP] = PCI_AGP_SIZEOF,
> + [PCI_CAP_ID_VPD] = 8,
> + [PCI_CAP_ID_SLOTID] = 4,
> + [PCI_CAP_ID_MSI] = 0xFF, /* 10, 14, or 24 */

I think this is actually 10, 14, 20, or 24.

> +static struct perm_bits pci_cap_msi_perm[] = {
> + { 0, 0, }, /* 0x00 MSI message control */
> + { 0xFFFFFFFF, 0xFFFFFFFF, }, /* 0x04 MSI message address */
> + { 0xFFFFFFFF, 0xFFFFFFFF, }, /* 0x08 MSI message addr/data */
> + { 0x0000FFFF, 0x0000FFFF, }, /* 0x0c MSI message data */
> + { 0, 0xFFFFFFFF, }, /* 0x10 MSI mask bits */
> + { 0, 0xFFFFFFFF, }, /* 0x14 MSI pending bits */
> +};

Is there any reason for mask bits to have virtualized writes? I don't
think we can support all 4 MSI capability sizes with this one table. We
probably need a 32bit and 64bit version, then we can drop the mask,
pending, and reserved fields via the length.

> + if (len == 0xFF) {
> + switch (cap) {
> + case PCI_CAP_ID_MSI:
> + ret = pci_read_config_word(pdev,
> + pos + PCI_MSI_FLAGS, &flags);
> + if (ret < 0)
> + return ret;
> + if (flags & PCI_MSI_FLAGS_MASKBIT)
> + /* per vec masking */
> + len = 24;
> + else if (flags & PCI_MSI_FLAGS_64BIT)

These aren't mutually exclusive features aiui.

> + /* 64 bit */
> + len = 14;
> + else
> + len = 10;
> + break;

This should probably be something like

len = 10;
if (flags & PCI_MSI_FLAGS_MASKBIT)
len += 10;
if (flags & PCI_MSI_FLAGS_64BIT) {
/* set 64bit permission table */
len += 4;
} else
/* set 32bit permission table */

> diff -uprN linux-2.6.34/include/linux/vfio.h vfio-linux-2.6.34/include/linux/vfio.h
> --- linux-2.6.34/include/linux/vfio.h 1969-12-31 16:00:00.000000000 -0800
> +++ vfio-linux-2.6.34/include/linux/vfio.h 2010-06-07 12:20:06.000000000 -0700

> +/* request MSI interrupts; use given eventfd */
> +#define VFIO_EVENTFD_MSI _IOW(';', 105, int)

Any intention of supporting MSI multiple message capability? If so,
this might turn into the same interface as MSIX.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Piotr Jaroszyński on 17 Jul 2010 04:50

On 16 July 2010 23:58, Tom Lyon <pugs(a)cisco.com> wrote:
> The VFIO "driver" is used to allow privileged AND non-privileged processes to
> implement user-level device drivers for any well-behaved PCI, PCI-X, and PCIe
> devices.

Thanks for working on that! I wonder whether it's possible to say what
are the chances of it being merged to mainline and which version we
might be talking about?

--
Best Regards
Piotr Jaroszyński
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Greg KH on 20 Jul 2010 16:30

On Sat, Jul 17, 2010 at 10:45:23AM +0200, Piotr Jaroszy??ski wrote:
> On 16 July 2010 23:58, Tom Lyon <pugs(a)cisco.com> wrote:
> > The VFIO "driver" is used to allow privileged AND non-privileged processes to
> > implement user-level device drivers for any well-behaved PCI, PCI-X, and PCIe
> > devices.
>
> Thanks for working on that! I wonder whether it's possible to say what
> are the chances of it being merged to mainline and which version we
> might be talking about?

We still have a long way to go before you need to worry about what
kernel version it's going to show up in...

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 9 10 11 12 13 14 15 16 17 18 19 20 21
Prev: ACPI patches for 2.6.35-merge
Next: ext4: Fix file fragmentation during large file write.