VFIO driver: Non-privileged user level PCI drivers [Kernel]

Prev: ACPI patches for 2.6.35-merge
Next: ext4: Fix file fragmentation during large file write.

From: Alex Williamson on 30 Jun 2010 10:10

On Wed, 2010-06-30 at 16:36 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 30, 2010 at 12:14:12AM -0600, Alex Williamson wrote:
> > On Tue, 2010-06-08 at 14:21 -0700, Tom Lyon wrote:
> > > The VFIO "driver" is used to allow privileged AND non-privileged processes to
> > > implement user-level device drivers for any well-behaved PCI, PCI-X, and PCIe
> > > devices.
> >
> > Hi Tom,
> >
> > I found a few bugs. Patch below. The first chunk clears the
> > pci_config_map on close, otherwise we end up passing virtualized state
> > from one user to the next. The second is an off by one in the basic
> > perms. Finally, vfio_bar_fixup() needs an overhaul. It wasn't setting
> > the lower bits right and is allowing virtual writes of bits that aren't
> > aligned to the size. This section probably needs another pass or two of
> > refinement. Thanks,
> >
> > Alex
> >
>
> I still don't see why are we sticking all this emulation in kernel. It
> is far from performance hotpath and can easily be emulated in userspace.
> qemu does this, you can lift code from there if you like.
> Maybe we need to protect the BARs from being manipulated by userspace,
> but that should be all. No need for tables.

The benefit I see so far is that it removes duplicate code. Should
every user of this interface try to extract qemu's PCI config space
emulation and jury rig it into their code base? Tom is already
providing access to more capability bits than the kvm device assignment
code. If the kernel community will accept it, I think it saves vfio
usperspace writers some hassle and provides a better environment by
having emulation in a single, well tested, hopefully well used place.
Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Tom Lyon on 30 Jun 2010 18:30

Thanks, Alex!
Am incorporating...

On Tuesday 29 June 2010 11:14:12 pm Alex Williamson wrote:
> On Tue, 2010-06-08 at 14:21 -0700, Tom Lyon wrote:
> > The VFIO "driver" is used to allow privileged AND non-privileged processes to
> > implement user-level device drivers for any well-behaved PCI, PCI-X, and PCIe
> > devices.
>
> Hi Tom,
>
> I found a few bugs. Patch below. The first chunk clears the
> pci_config_map on close, otherwise we end up passing virtualized state
> from one user to the next. The second is an off by one in the basic
> perms. Finally, vfio_bar_fixup() needs an overhaul. It wasn't setting
> the lower bits right and is allowing virtual writes of bits that aren't
> aligned to the size. This section probably needs another pass or two of
> refinement. Thanks,
>
> Alex
>
> Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
> ---
>
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 96639e5..a0e8227 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -129,6 +129,10 @@ static int vfio_release(struct inode *inode, struct file *filep)
> eventfd_ctx_put(vdev->ev_msi);
> vdev->ev_irq = NULL;
> }
> + if (vdev->pci_config_map) {
> + kfree(vdev->pci_config_map);
> + vdev->pci_config_map = NULL;
> + }
> vfio_domain_unset(vdev);
> /* reset to known state if we can */
> (void) pci_reset_function(vdev->pdev);
> diff --git a/drivers/vfio/vfio_pci_config.c b/drivers/vfio/vfio_pci_config.c
> index c821b5d..f6e26b1 100644
> --- a/drivers/vfio/vfio_pci_config.c
> +++ b/drivers/vfio/vfio_pci_config.c
> @@ -79,18 +79,18 @@ struct perm_bits {
> static struct perm_bits pci_cap_basic_perm[] = {
> { 0xFFFFFFFF, 0, }, /* 0x00 vendor & device id - RO */
> { 0, 0xFFFFFFFC, }, /* 0x04 cmd & status except mem/io */
> - { 0, 0xFF00FFFF, }, /* 0x08 bist, htype, lat, cache */
> - { 0xFFFFFFFF, 0xFFFFFFFF, }, /* 0x0c bar */
> + { 0, 0, }, /* 0x08 class code & revision id */
> + { 0, 0xFF00FFFF, }, /* 0x0c bist, htype, lat, cache */
> { 0xFFFFFFFF, 0xFFFFFFFF, }, /* 0x10 bar */
> { 0xFFFFFFFF, 0xFFFFFFFF, }, /* 0x14 bar */
> { 0xFFFFFFFF, 0xFFFFFFFF, }, /* 0x18 bar */
> { 0xFFFFFFFF, 0xFFFFFFFF, }, /* 0x1c bar */
> { 0xFFFFFFFF, 0xFFFFFFFF, }, /* 0x20 bar */
> - { 0, 0, }, /* 0x24 cardbus - not yet */
> - { 0, 0, }, /* 0x28 subsys vendor & dev */
> - { 0xFFFFFFFF, 0xFFFFFFFF, }, /* 0x2c rom bar */
> - { 0, 0, }, /* 0x30 capability ptr & resv */
> - { 0, 0, }, /* 0x34 resv */
> + { 0xFFFFFFFF, 0xFFFFFFFF, }, /* 0x24 bar */
> + { 0, 0, }, /* 0x28 cardbus - not yet */
> + { 0, 0, }, /* 0x2c subsys vendor & dev */
> + { 0xFFFFFFFF, 0xFFFFFFFF, }, /* 0x30 rom bar */
> + { 0, 0, }, /* 0x34 capability ptr & resv */
> { 0, 0, }, /* 0x38 resv */
> { 0x000000FF, 0x000000FF, }, /* 0x3c max_lat ... irq */
> };
> @@ -318,30 +318,55 @@ static void vfio_virt_init(struct vfio_dev *vdev)
> static void vfio_bar_fixup(struct vfio_dev *vdev)
> {
> struct pci_dev *pdev = vdev->pdev;
> - int bar;
> - u32 *lp;
> - u32 len;
> + int bar, mem64 = 0;
> + u32 *lp = NULL;
> + u64 len = 0;
>
> for (bar = 0; bar <= 5; bar++) {
> - len = pci_resource_len(pdev, bar);
> - lp = (u32 *)&vdev->vinfo.bar[bar * 4];
> - if (len == 0) {
> - *lp = 0;
> - } else if (pci_resource_flags(pdev, bar) & IORESOURCE_MEM) {
> - *lp &= ~0x1;
> - *lp = (*lp & ~(len-1)) |
> - (*lp & ~PCI_BASE_ADDRESS_MEM_MASK);
> - if (*lp & PCI_BASE_ADDRESS_MEM_TYPE_64)
> - bar++;
> - } else if (pci_resource_flags(pdev, bar) & IORESOURCE_IO) {
> + if (!mem64) {
> + len = pci_resource_len(pdev, bar);
> + lp = (u32 *)&vdev->vinfo.bar[bar * 4];
> + if (len == 0) {
> + *lp = 0;
> + continue;
> + }
> +
> + len = ~(len - 1);
> + } else
> + len >>= 32;
> +
> + if (*lp == ~0U)
> + *lp = (u32)len;
> + else
> + *lp &= (u32)len;
> +
> + if (mem64) {
> + mem64 = 0;
> + continue;
> + }
> +
> + if (pci_resource_flags(pdev, bar) & IORESOURCE_IO)
> *lp |= PCI_BASE_ADDRESS_SPACE_IO;
> - *lp = (*lp & ~(len-1)) |
> - (*lp & ~PCI_BASE_ADDRESS_IO_MASK);
> + else {
> + *lp |= PCI_BASE_ADDRESS_SPACE_MEMORY;
> + if (pci_resource_flags(pdev, bar) & IORESOURCE_PREFETCH)
> + *lp |= PCI_BASE_ADDRESS_MEM_PREFETCH;
> + if (pci_resource_flags(pdev, bar) & IORESOURCE_MEM_64) {
> + *lp |= PCI_BASE_ADDRESS_MEM_TYPE_64;
> + mem64 = 1;
> + }
> }
> }
> +
> lp = (u32 *)vdev->vinfo.rombar;
> len = pci_resource_len(pdev, PCI_ROM_RESOURCE);
> - *lp = *lp & PCI_ROM_ADDRESS_MASK & ~(len-1);
> + len = ~(len - 1);
> +
> + if (*lp == ~PCI_ROM_ADDRESS_ENABLE)
> + *lp = (u32)len;
> + else
> + *lp = *lp & ((u32)len | PCI_ROM_ADDRESS_ENABLE);
> +
> vdev->vinfo.bardirty = 0;
> }
>
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Michael S. Tsirkin on 30 Jun 2010 18:40

On Wed, Jun 30, 2010 at 03:17:55PM -0700, Tom Lyon wrote:
> Thanks, Alex!
> Am incorporating...

I get it there's no chance you'll drop the "virtualization"
from the driver then?

--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Tom Lyon on 30 Jun 2010 19:00

On Wednesday 30 June 2010 03:32:56 pm Michael S. Tsirkin wrote:
> On Wed, Jun 30, 2010 at 03:17:55PM -0700, Tom Lyon wrote:
> > Thanks, Alex!
> > Am incorporating...
>
> I get it there's no chance you'll drop the "virtualization"
> from the driver then?
>
I think it'll get a whole lot simpler by depending on iommu interrupt remapping,
but I thinke some of it is still useful.

I am now stuck trying to get access to a system with interrupt remapping.
Turns out my Intel IOMMU doesn't have it.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Alex Williamson on 1 Jul 2010 00:20

On Tue, 2010-06-08 at 14:21 -0700, Tom Lyon wrote:
> +int vfio_dma_unmap_dm(struct vfio_listener *listener, struct vfio_dma_map *dmp)
> +{
> + unsigned long start, npage;
> + struct dma_map_page *mlp;
> + struct list_head *pos, *pos2;
> + int ret;
> +
> + start = dmp->vaddr & ~PAGE_SIZE;
> + npage = dmp->size >> PAGE_SHIFT;
> +
> + ret = -ENXIO;
> + mutex_lock(&listener->vdev->dgate);
> + list_for_each_safe(pos, pos2, &listener->dm_list) {
> + mlp = list_entry(pos, struct dma_map_page, list);
> + if (dmp->vaddr != mlp->vaddr || mlp->npage != npage)
> + continue;
> + ret = 0;
> + vfio_dma_unmap(listener, mlp);
> + break;
> + }

Hi Tom,

Shouldn't we be matching the mlp based on daddr instead of vaddr? We
can have multiple dma address pointing at the same virtual address, so
dma address is the unique element. I'm also nervous about this dm_list.
For qemu device assignment, we're potentially statically mapping many GB
of iova space. It seems like this could get incredibly bloated and
slow. Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Prev: ACPI patches for 2.6.35-merge
Next: ext4: Fix file fragmentation during large file write.