mm: iommu: The Virtual Contiguous Memory Manager [Kernel]

Prev: [PATCH v2]kernel.h Move preprocessor #warning about using kernel headers in userpsace to types.h
Next: scsi/sg: remove casts from void*

From: Andi Kleen on 2 Jul 2010 04:30

On Thu, Jul 01, 2010 at 11:17:34PM -0700, Zach Pfeffer wrote:
> Andi Kleen wrote:
> >> The VCMM provides a more abstract, global view with finer-grained
> >> control of each mapping a user wants to create. For instance, the
> >> semantics of iommu_map preclude its use in setting up just the IOMMU
> >> side of a mapping. With a one-sided map, two IOMMU devices can be
> >
> > Hmm? dma_map_* does not change any CPU mappings. It only sets up
> > DMA mapping(s).
>
> Sure, but I was saying that iommu_map() doesn't just set up the IOMMU
> mappings, its sets up both the iommu and kernel buffer mappings.

Normally the data is already in the kernel or mappings, so why
would you need another CPU mapping too? Sometimes the CPU
code has to scatter-gather, but that is considered acceptable
(and if it really cannot be rewritten to support sg it's better
to have an explicit vmap operation)

In general on larger systems with many CPUs changing CPU mappings
also gets expensive (because you have to communicate with all cores),
and is not a good idea on frequent IO paths.

>
> >
> >> Additionally, the current IOMMU interface does not allow users to
> >> associate one page table with multiple IOMMUs unless the user explicitly
> >
> > That assumes that all the IOMMUs on the system support the same page table
> > format, right?
>
> Actually no. Since the VCMM abstracts a page-table as a Virtual
> Contiguous Region (VCM) a VCM can be associated with any device,
> regardless of their individual page table format.

But then there is no real page table sharing, isn't it?
The real information should be in the page tables, nowhere else.

> > The standard Linux approach to such a problem is to write
> > a library that drivers can use for common functionality, not put a middle
> > layer in between. Libraries are much more flexible than layers.
>
> That's true up to the, "is this middle layer so useful that its worth
> it" point. The VM is a middle layer, you could make the same argument
> about it, "the mapping code isn't too hard, just map in the memory
> that you need and be done with it". But the VM middle layer provides a
> clean separation between page frames and pages which turns out to be

Actually we use both PFNs and struct page *s in many layers up
and down, there's not really any layering in that.

-Andi
--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Zach Pfeffer on 2 Jul 2010 14:50

Andi Kleen wrote:
> On Thu, Jul 01, 2010 at 11:17:34PM -0700, Zach Pfeffer wrote:
>> Andi Kleen wrote:
>>>> The VCMM provides a more abstract, global view with finer-grained
>>>> control of each mapping a user wants to create. For instance, the
>>>> semantics of iommu_map preclude its use in setting up just the IOMMU
>>>> side of a mapping. With a one-sided map, two IOMMU devices can be
>>> Hmm? dma_map_* does not change any CPU mappings. It only sets up
>>> DMA mapping(s).
>> Sure, but I was saying that iommu_map() doesn't just set up the IOMMU
>> mappings, its sets up both the iommu and kernel buffer mappings.
>
> Normally the data is already in the kernel or mappings, so why
> would you need another CPU mapping too? Sometimes the CPU
> code has to scatter-gather, but that is considered acceptable
> (and if it really cannot be rewritten to support sg it's better
> to have an explicit vmap operation)
>
> In general on larger systems with many CPUs changing CPU mappings
> also gets expensive (because you have to communicate with all cores),
> and is not a good idea on frequent IO paths.

That's all true, but what a VCMM allows is for these trade-offs to be
made by the user for future systems. It may not be too expensive to
change the IO path around on future chips or the user may be okay with
the performance penalty. A VCMM doesn't enforce a policy on the user,
it lets the user make their own policy.

>>>> Additionally, the current IOMMU interface does not allow users to
>>>> associate one page table with multiple IOMMUs unless the user explicitly
>>> That assumes that all the IOMMUs on the system support the same page table
>>> format, right?
>> Actually no. Since the VCMM abstracts a page-table as a Virtual
>> Contiguous Region (VCM) a VCM can be associated with any device,
>> regardless of their individual page table format.
>
> But then there is no real page table sharing, isn't it?
> The real information should be in the page tables, nowhere else.

Yeah, and the implementation ensures that it. The VCMM just adds a few
fields like start_addr, len and the device. The device still manages
the its page-tables.

>>> The standard Linux approach to such a problem is to write
>>> a library that drivers can use for common functionality, not put a middle
>>> layer in between. Libraries are much more flexible than layers.
>> That's true up to the, "is this middle layer so useful that its worth
>> it" point. The VM is a middle layer, you could make the same argument
>> about it, "the mapping code isn't too hard, just map in the memory
>> that you need and be done with it". But the VM middle layer provides a
>> clean separation between page frames and pages which turns out to be
>
> Actually we use both PFNs and struct page *s in many layers up
> and down, there's not really any layering in that.

Sure, but the PFNs and the struct page *s are the middle layer. Its
just that things haven't been layered on top of them. A VCMM is the
higher level abstraction, since it allows the size of the PFs to vary
and the consumers of the VCM's to be determined at run-time.

--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Zach Pfeffer on 3 Jul 2010 02:40

Andi Kleen wrote:
> The standard Linux approach to such a problem is to write
> a library that drivers can use for common functionality, not put a middle
> layer inbetween. Libraries are much more flexible than layers.

I've been thinking about this statement. Its very true. I use the
genalloc lib which is a great piece of software to manage VCMs
(domains in linux/iommu.h parlance?).

On our hardware we have 3 things we have to do, use the minimum set of
mappings to map a buffer because of the extremely small TLBs in all the
IOMMUs we have to support, use special virtual alignments and direct
various multimedia flows through certain IOMMUs. To support this we:

1. Use the genalloc lib to allocate virtual space for our IOMMUs,
allowing virtual alignment to be specified.

2. Have a maxmunch allocator that manages our own physical pool.

I think I may be able to support this using the iommu interface and
some util functions. The big thing that's lost is the unified topology
management, but as demonstrated that may fall out from a refactor.

Anyhow, sounds like a few things to try. Thanks for the feedback so
far. I'll do some refactoring and see what's missing.

--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Joerg Roedel on 10 Jul 2010 10:40

On Fri, Jul 02, 2010 at 12:09:02AM -0700, Zach Pfeffer wrote:
> Hari Kanigeri wrote:
> >> He demonstrated the usage of his code in one of the emails he sent out
> >> initially. Did you go over that, and what (or how many) step would you
> >> use with the current code to do the same thing?
> >
> > -- So is this patch set adding layers and abstractions to help the User ?
> >
> > If the idea is to share some memory across multiple devices, I guess
> > you can achieve the same by calling the map function provided by iommu
> > module and sharing the mapped address to the 10's or 100's of devices
> > to access the buffers. You would only need a dedicated virtual pool
> > per IOMMU device to manage its virtual memory allocations.
>
> Yeah, you can do that. My idea is to get away from explicit addressing
> and encapsulate the "device address to physical address" link into a
> mapping.

The DMA-API already does this with the help of IOMMUs if they are
present. What is the benefit of your approach over that?

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Joerg Roedel on 10 Jul 2010 11:00

On Fri, Jul 02, 2010 at 12:33:51AM -0700, Zach Pfeffer wrote:
> Daniel Walker wrote:

> > So if we include this code which "map implementations" could you
> > collapse into this implementations ? Generally , what currently existing
> > code can VCMM help to eliminate?
>
> In theory, it can eliminate all code the interoperates between IOMMU,
> CPU and non-IOMMU based devices and all the mapping code, alignment,
> mapping attribute and special block size support that's been
> implemented.

Thats a very abstract statement. Can you point to particular code files
and give a rough sketch how it could be improved using VCMM?

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8
Prev: [PATCH v2]kernel.h Move preprocessor #warning about using kernel headers in userpsace to types.h
Next: scsi/sg: remove casts from void*