mm: iommu: An API to unify IOMMU, CPU and device memory management [Kernel]

Prev: [PATCH] Add a pair of system calls to make extended file stats available [ver #3]
Next: [PATCHv3 3/9] dspbridge: rename bridge_brd_mem_map/unmap to a proper name

From: FUJITA Tomonori on 22 Jul 2010 00:50

On Wed, 21 Jul 2010 21:30:34 -0700
Zach Pfeffer <zpfeffer(a)codeaurora.org> wrote:

> On Wed, Jul 21, 2010 at 10:44:37AM +0900, FUJITA Tomonori wrote:
> > On Tue, 20 Jul 2010 15:20:01 -0700
> > Zach Pfeffer <zpfeffer(a)codeaurora.org> wrote:
> >
> > > > I'm not saying that it's reasonable to pass (or even allocate) a 1MB
> > > > buffer via the DMA API.
> > >
> > > But given a bunch of large chunks of memory, is there any API that can
> > > manage them (asked this on the other thread as well)?
> >
> > What is the problem about mapping a 1MB buffer with the DMA API?
> >
> > Possibly, an IOMMU can't find space for 1MB but it's not the problem
> > of the DMA API.
>
> This goes to the nub of the issue. We need a lot of 1 MB physically
> contiguous chunks. The system is going to fragment and we'll never get
> our 12 1 MB chunks that we'll need, since the DMA API allocator uses
> the system pool it will never succeed. For this reason we reserve a
> pool of 1 MB chunks (and 16 MB, 64 KB etc...) to satisfy our
> requests. This same use case is seen on most embedded "media" engines
> that are getting built today.

We don't need a new abstraction to reserve some memory.

If you want pre-allocated memory pool per device (and share them with
some), the DMA API can for coherent memory (see
dma_alloc_from_coherent). You can extend the DMA API if necessary.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Zach Pfeffer on 22 Jul 2010 12:30

On Thu, Jul 22, 2010 at 08:34:55AM +0100, Russell King - ARM Linux wrote:
> On Wed, Jul 21, 2010 at 09:25:28PM -0700, Zach Pfeffer wrote:
> > Yes it is a problem, as Russell has brought up, but there's something
> > I probably haven't communicated well. I'll use the following example:
> >
> > There are 3 devices: A CPU, a decoder and a video output device. All 3
> > devices need to map the same 12 MB buffer at the same time.
>
> Why do you need the same buffer mapped by the CPU?
>
> Let's take your example of a video decoder and video output device.
> Surely the CPU doesn't want to be writing to the same memory region
> used for the output picture as the decoder is writing to. So what's
> the point of mapping that memory into the CPU's address space?

It may, especially if you're doing some software post processing. Also
by mapping all the buffers its extremly fast to "pass the buffers"
around in this senario - the buffer passing becomes a simple signal.

>
> Surely the video output device doesn't need to see the input data to
> the decoder either?

No, but other devices may (like the CPU).

>
> Surely, all you need is:
>
> 1. a mapping for the CPU for a chunk of memory to pass data to the
> decoder.
> 2. a mapping for the decoder to see the chunk of memory to receive data
> from the CPU.
> 3. a mapping for the decoder to see a chunk of memory used for the output
> video buffer.
> 4. a mapping for the output device to see the video buffer.
>
> So I don't see why everything needs to be mapped by everything else.

That's fair, but we do share buffers and we do have many, very large
mappings, and we do need to pull these from a separate pools because
they need to exhibit a particular allocation profile. I agree with you
that things should work like you've listed, but with Qualcomm's ARM
multimedia engines we're seeing some different usage scenarios. Its
the giant buffers, needing to use our own buffer allocator, the need
to share and the need to swap out virtual IOMMU space (which we
haven't talked about much) which make the DMA API seem like a
mismatch. (we haven't even talked about graphics usage ;) ).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Zach Pfeffer on 22 Jul 2010 12:30

On Thu, Jul 22, 2010 at 08:39:17AM +0100, Russell King - ARM Linux wrote:
> On Wed, Jul 21, 2010 at 09:30:34PM -0700, Zach Pfeffer wrote:
> > This goes to the nub of the issue. We need a lot of 1 MB physically
> > contiguous chunks. The system is going to fragment and we'll never get
> > our 12 1 MB chunks that we'll need, since the DMA API allocator uses
> > the system pool it will never succeed.
>
> By the "DMA API allocator" I assume you mean the coherent DMA interface,
> The DMA coherent API and DMA streaming APIs are two separate sub-interfaces
> of the DMA API and are not dependent on each other.

I didn't know that, but yes. As far as I can tell they both allocate
memory from the VM. We'd need a way to hook in our our own minimized
mapping allocator.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Zach Pfeffer on 22 Jul 2010 12:50

On Thu, Jul 22, 2010 at 01:43:26PM +0900, FUJITA Tomonori wrote:
> On Wed, 21 Jul 2010 21:30:34 -0700
> Zach Pfeffer <zpfeffer(a)codeaurora.org> wrote:
>
> > On Wed, Jul 21, 2010 at 10:44:37AM +0900, FUJITA Tomonori wrote:
> > > On Tue, 20 Jul 2010 15:20:01 -0700
> > > Zach Pfeffer <zpfeffer(a)codeaurora.org> wrote:
> > >
> > > > > I'm not saying that it's reasonable to pass (or even allocate) a 1MB
> > > > > buffer via the DMA API.
> > > >
> > > > But given a bunch of large chunks of memory, is there any API that can
> > > > manage them (asked this on the other thread as well)?
> > >
> > > What is the problem about mapping a 1MB buffer with the DMA API?
> > >
> > > Possibly, an IOMMU can't find space for 1MB but it's not the problem
> > > of the DMA API.
> >
> > This goes to the nub of the issue. We need a lot of 1 MB physically
> > contiguous chunks. The system is going to fragment and we'll never get
> > our 12 1 MB chunks that we'll need, since the DMA API allocator uses
> > the system pool it will never succeed. For this reason we reserve a
> > pool of 1 MB chunks (and 16 MB, 64 KB etc...) to satisfy our
> > requests. This same use case is seen on most embedded "media" engines
> > that are getting built today.
>
> We don't need a new abstraction to reserve some memory.
>
> If you want pre-allocated memory pool per device (and share them with
> some), the DMA API can for coherent memory (see
> dma_alloc_from_coherent). You can extend the DMA API if necessary.

That function won't work for us. We can't use
bitmap_find_free_region(), we need to use our own allocator. If
anything we need a dma_alloc_from_custom(my_allocator). Take a look
at:

mm: iommu: A physical allocator for the VCMM
vcm_alloc_max_munch()
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev |
Pages: 1 2 3 4 5 6
Prev: [PATCH] Add a pair of system calls to make extended file stats available [ver #3]
Next: [PATCHv3 3/9] dspbridge: rename bridge_brd_mem_map/unmap to a proper name