From: FUJITA Tomonori on
On Thu, 21 Jan 2010 19:41:58 +0100
Jarek Poplawski <jarkao2(a)gmail.com> wrote:

> On Fri, Jan 22, 2010 at 12:22:10AM +0900, FUJITA Tomonori wrote:
> > On Wed, 20 Jan 2010 23:53:22 +0100
> > Jarek Poplawski <jarkao2(a)gmail.com> wrote:
> >
> > > On Wed, Jan 20, 2010 at 10:24:14PM +0000, Alan Cox wrote:
> > > > > > Seems like an underlying bug in the DMA api. Maybe it just can't
> > > > > > handle operations on partial mapping.
> > > > > >
> > > > > > Other drivers with same problem:
> > > > > > bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3,
> > > > >
> > > > > It seems using the same length (even without pci_unmap_len()) is
> > > > > crucial here, but I hope maintainers (added to CC) will take care.
> > > >
> > > > The API needs fixing - if you've got a large mapping and you want to sync
> > > > part of it then we need to support that. Now it might well be that the
> > > > implementation on some braindead platform has to sync the entire thing,
> > > > and some implementations entire pages or cache lines.
> > > >
> > > > You can't fix this in the drivers, they requested a service and they
> > > > don't have enough information nor is it their job to know about all the
> > > > platform specific rules.
> > >
> > > Yes, the need to repeat some other values if there is a dedicated
> > > structure/pointer could be misleading. Btw, it seems to be a trivial
> > > overlooking since there is dma_sync_single_range() ready to use.
> >
> > Yeah, dma_sync_single_range() enables you to do a partial sync. But
> > you must be really careful with a partial sync (as DMA-API.txt says).
>
> Actually, we are trying to establish here (and a few more netdev@
> threads) what exactly the author was worried about. After looking at

James added to Cc,


> some implementations it seems to me this carefullness in observing
> the cache alignment and width is needed only wrt. the 'offset'. But
> then, the way the 'size' is used (or rather not used for anything
> crucial) suggests dma_sync_single_range() with zero offset seems
> completely safe. But then it's equivalent to dma_sync_single() with

Even if 'offset' is zero, 'size' still matters, I think. If 'size' is
not a multiple of the cache line size, it's possible that driver
writers who aren't familiar with cache would be surprised (it depends
on the way their drivers use buffers though).

The easiest way for 'completely safe sync for any driver writers' is
asking for all the sync parameters must be the same as those passed
into the single mapping API. If writes knows what they do, they can do
a partial sync with sync_range API. That's the author intention, I
guess.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Miller on
From: FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp>
Date: Fri, 22 Jan 2010 14:11:29 +0900

> Even if 'offset' is zero, 'size' still matters, I think. If 'size' is
> not a multiple of the cache line size, it's possible that driver
> writers who aren't familiar with cache would be surprised (it depends
> on the way their drivers use buffers though).
>
> The easiest way for 'completely safe sync for any driver writers' is
> asking for all the sync parameters must be the same as those passed
> into the single mapping API. If writes knows what they do, they can do
> a partial sync with sync_range API. That's the author intention, I
> guess.

This is not reasonable.

You have to think about how people actually use these
interfaces.

They have a large buffer, and if they receive a small request they
want to allocate a smaller buffer, copy into that smaller buffer, and
give the larger buffer back to the hardware.

It's an optimization, it performs better this way.

If you make it so that the DMA sync has to cover the entire large
buffer, the whole point of the optimization is taken away.

That makes no sense at all.

I know that when I designed and wrote the first implementation of the
PCI DMA interfaces, I sure as hell meant to allow partial DMA sync
operations.

I know this as a fact, because the first drivers ported over
to these interfaces were network drivers. And I definitely
knew about the copy-break mechanism I describe above and how
networking drivers use such a scheme pretty much across the
board.

The DMA API documentation is wrong, it must be fixed to allow partial
syncs of arbitrary offsets and sizes.

The issue of cache line boundaries and such are the domain of the DMA
API implementation, and has absolutely no business in the definition
of these interfaces. Nor should it be something driver authors have
to be knowledgable about, that would be completely unreasonable.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Miller on
From: FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp>
Date: Wed, 3 Feb 2010 10:18:39 +0900

> Can we safely assume that the arch implementations already round
> up/down to the safe boundary internally in this API (they should
> already)?

I can only speak for sparc64 and x86 directly and those are fine.

Any such improper implementations would fail with many common
ethernet drivers already.

> I don't like two DMA docs. I like to make pci_dma_* API obsolete. We
> have the generic DMA API with generic devices so we are always able to
> use the API (as you did with sbus_map_*). The majority arch
> implementations safely call the bus specific DMA functions via the
> generic DMA API. So there are not many things to do. We can just
> convert pci_dma_* to dma_* API slowly.
>
> Opinions?

I have no problem with this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: FUJITA Tomonori on
Sorry for the late reply,

On Thu, 21 Jan 2010 22:38:41 -0800 (PST)
David Miller <davem(a)davemloft.net> wrote:

> From: FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp>
> Date: Fri, 22 Jan 2010 14:11:29 +0900
>
> > Even if 'offset' is zero, 'size' still matters, I think. If 'size' is
> > not a multiple of the cache line size, it's possible that driver
> > writers who aren't familiar with cache would be surprised (it depends
> > on the way their drivers use buffers though).
> >
> > The easiest way for 'completely safe sync for any driver writers' is
> > asking for all the sync parameters must be the same as those passed
> > into the single mapping API. If writes knows what they do, they can do
> > a partial sync with sync_range API. That's the author intention, I
> > guess.
>
> This is not reasonable.
>
> You have to think about how people actually use these
> interfaces.
>
> They have a large buffer, and if they receive a small request they
> want to allocate a smaller buffer, copy into that smaller buffer, and
> give the larger buffer back to the hardware.
>
> It's an optimization, it performs better this way.
>
> If you make it so that the DMA sync has to cover the entire large
> buffer, the whole point of the optimization is taken away.

I talked with James. He is ok with changing (or fixing) this API to
enable users to do a partial sync (I'm ok with that too. I just
guessed that he designed the API in such way intentionally not by
mistake).

Can we safely assume that the arch implementations already round
up/down to the safe boundary internally in this API (they should
already)?

As you know, the patch to remove the description of
dma_sync_single/pci_dma_sync_single/dma_sync_sg/pci_dma_sync_pci that
always require a full sync in DMA-API.txt is already -mm so what we
need to do are:

- adding 'a partial sync' description to PCI-DMA-mapping.txt.
- duplicating the similar description to DMA-API.txt.

I don't like two DMA docs. I like to make pci_dma_* API obsolete. We
have the generic DMA API with generic devices so we are always able to
use the API (as you did with sbus_map_*). The majority arch
implementations safely call the bus specific DMA functions via the
generic DMA API. So there are not many things to do. We can just
convert pci_dma_* to dma_* API slowly.

Opinions?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/