From: Russell King - ARM Linux on
On Thu, Jul 29, 2010 at 10:12:05AM +0200, Arnd Bergmann wrote:
> On Wednesday 28 July 2010 23:21:56 Russell King - ARM Linux wrote:
> > On Wed, Jul 28, 2010 at 07:50:20PM +0200, Arnd Bergmann wrote:
> > > The DMA API is extremely flexible, it works just fine with all the
> > > IOMMUs that I've seen so far. Please take a look at
> > > include/asm-generic/dma-mapping-common.h and its users to see how
> > > to use multiple IOMMUs depending on the device.
> >
> > We don't yet use those DMA API interface extensions because we haven't
> > had the need. If someone who has the need wants to put the effort in
> > though...
>
> Right, it shouldn't be hard now that the groundwork for that is done.
> Also, it's only really needed if you have IOMMUs of different types in the
> same system. If msm doesn't have any swiotlb or dmabounce devices,
> it could always use the same implementation for all devices.
>
> > One of the problems with it though is the abstraction of the sync*
> > operations is the wrong way around for stuff like dmabounce - we want
> > to be passed the base address of the buffer (so we can look this up),
> > plus offset and length. We don't want to know just the region which
> > is affected.
>
> Yes, but that is an unrelated (dmabounce specific) problem that seems to
> be fixed by an existing patch.

It's not unrelated because it stands in the way of using that interface.
The patch also seems to be buggy in that it doesn't fix the for_device
case - it leaves 'off' as zero.

I'm also not sold on this idea that the sync_range API is being obsoleted.
It seems to me to be a step in the wrong direction. The range API is a
natural subset of the 'normal' sync API, yet people are trying to shoehorn
the range API into the 'norma' API. If anything it's the 'normal' API
which should be obsoleted as it provides reduced information to
implementations, which then have to start fuzzy-matching the passed
address.

If we're going to start fuzzy-matching the passed address, then I think
we also need to add detection of overlapping mappings and BUG() on such
cases - otherwise we risk the possibility of having multiple overlapping
mappings and hitting the wrong mapping with this reduced-information sync
API.

> The driver posted by Stepan doesn't even support the dma_sync_single_*
> style operations, and I don't think it can run into that specific problem.
> Are there even (hardware) IOMMUs that are connected to noncoherent
> buses?

Yes. Virtually all ARM systems have non-cache coherent DMA. Doesn't
matter if there's an IOMMU or not.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Roedel, Joerg on
On Thu, Jul 29, 2010 at 07:25:47AM -0400, Arnd Bergmann wrote:
> On Thursday 29 July 2010, Roedel, Joerg wrote:
>
> > > You designed it for what you need at the time. It should have been
> > > named appropriately to avoid confusion. Later, when we actually
> > > understand what other IOMMUs need, we can evolve the specific API for
> > > generic purposes. Then we can rename the API to more generic.
> >
> > At the time the iommu-api was written is was generic enough for what we
> > had. So it was designed as an generic API. At this point in time nobody
> > knew what the future requirements would we. So today it turns out that
> > it is not generic enough anymore for latest hardware. The logical
> > consequence is to fix this in the iommu-api.
>
> Well, I think the real question is why we have two APIs that both claim
> to work with IOMMUs in a generic way and how we could unify the two.

The DMA-API itself does not claim to be an iommu-frontend. The purpose
of the DMA-API is to convert physical memory addresses into dma handles
and do all the management of these handles. Backend implementations can
use hardware iommus for this task. But depending on the hardware in the
system the DMA-API can very well be implemented without any hardware
support. This is an important difference to the IOMMU-API which needs
hardware because it exposes hardware iommu features to software.

> The Intel and AMD IOMMU drivers currently register at both the DMA
> API and the IOMMU API. The first one is used by everything except
> KVM and the second is only used by KVM.

Right. But there is also a mode where the AMD IOMMU driver only
registers for the IOMMU-API.

> I really think we should not extend the (KVM) IOMMU API further but
> just use the generic DMA mapping api for KVM and extend it as necessary.
> It already has the concept of cache coherency and mapping/unmapping
> that are in the IOMMU API and could be extended to support domains
> as well, through the use of dma_attrs.

If we find a nice and clean way to expose lower-level iommu
functionality through the DMA-API, thats fine. We could certainly
discuss ideas in this direction. I think this is going to be hard
because the DMA-API today does not provide enough flexibility to let the
user choose both sides of a io-virtual<->cpu-physical address mapping.
Thats fine for most drivers because it makes sense for them to use the
generic io-address-allocator the DMA-API provides but not for KVM which
needs this flexibility.

Joerg

--
Joerg Roedel - AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arnd Bergmann on
On Thursday 29 July 2010, Roedel, Joerg wrote:
> On Thu, Jul 29, 2010 at 07:25:47AM -0400, Arnd Bergmann wrote:
> > On Thursday 29 July 2010, Roedel, Joerg wrote:
> >
> > > > You designed it for what you need at the time. It should have been
> > > > named appropriately to avoid confusion. Later, when we actually
> > > > understand what other IOMMUs need, we can evolve the specific API for
> > > > generic purposes. Then we can rename the API to more generic.
> > >
> > > At the time the iommu-api was written is was generic enough for what we
> > > had. So it was designed as an generic API. At this point in time nobody
> > > knew what the future requirements would we. So today it turns out that
> > > it is not generic enough anymore for latest hardware. The logical
> > > consequence is to fix this in the iommu-api.
> >
> > Well, I think the real question is why we have two APIs that both claim
> > to work with IOMMUs in a generic way and how we could unify the two.
>
> The DMA-API itself does not claim to be an iommu-frontend. The purpose
> of the DMA-API is to convert physical memory addresses into dma handles
> and do all the management of these handles. Backend implementations can
> use hardware iommus for this task. But depending on the hardware in the
> system the DMA-API can very well be implemented without any hardware
> support. This is an important difference to the IOMMU-API which needs
> hardware because it exposes hardware iommu features to software.

Well, you could call that a limitation in the IOMMU API ;-)

The idea behind the DMA mapping API is to allow a device driver
to work without knowing if the hardware can, cannot or must use
an IOMMU.

> > I really think we should not extend the (KVM) IOMMU API further but
> > just use the generic DMA mapping api for KVM and extend it as necessary.
> > It already has the concept of cache coherency and mapping/unmapping
> > that are in the IOMMU API and could be extended to support domains
> > as well, through the use of dma_attrs.
>
> If we find a nice and clean way to expose lower-level iommu
> functionality through the DMA-API, thats fine. We could certainly
> discuss ideas in this direction. I think this is going to be hard
> because the DMA-API today does not provide enough flexibility to let the
> user choose both sides of a io-virtual<->cpu-physical address mapping.
> Thats fine for most drivers because it makes sense for them to use the
> generic io-address-allocator the DMA-API provides but not for KVM which
> needs this flexibility.

One way to do this would be to add a new attribute, e.g.

enum dma_attr {
DMA_ATTR_WRITE_BARRIER,
DMA_ATTR_WEAK_ORDERING,
DMA_ATTR_FIXED_MAPPING, /* this one is new */
DMA_ATTR_MAX,
};

struct dma_attrs {
unsigned long flags[__DMA_ATTRS_LONGS];
dma_add_t dest;
};

Nothing except for KVM would need to use that attribute, and KVM would
obviously need a way to check if this is supported by the underlying
implementation.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: stepanm on
Joerg,

Thanks for the information. I have been trying to adapt the MSM IOMMU
driver to use your IOMMU interface, and it looks like it might work, with
one minor modification.

Unlike a more traditional system with one IOMMU between the bus and
memory, MSM has multiple IOMMUs, with each one hard-wired to a dedicated
device. Furthermore, each IOMMU can have more than one translation
context. One of the use cases is being able to create mappings within
multiple instances of one context, and arbitrarily context-switch the
IOMMU on the fly.

It sounds like the domain abstraction and attach_device/detach_device can
encapsulate this rather nicely and I am in the process of updating my
driver to fit this framework.

My problem, however, is with iommu_domain_alloc(). This will set up a
domain and call the ops function to initialize it, but I want to be able
to pass it an �IOMMU id" that will tell the underlying driver which IOMMU
(and which "stream id") is to be associated with that domain instance.
This can be a void* parameter that gets passed through to domain_init. I
feel like this change will make it easy to deal with multiple
IOMMUs/translation contexts, and implementations that have only a singular
IOMMU/translation context are free to ignore that parameter.

The alternative for me is to have a separate msm_iommu_domain_alloc(void
*context_id) function, to which I can specify which IOMMU I want to use,
but I would like to fully use your API if possible.

What are your thoughts? I can prepare a patch if you like - the
domain_alloc change looks like it will be very innocuous.

Thanks
Steve


> On Thu, Jul 29, 2010 at 04:46:59AM -0400, FUJITA Tomonori wrote:
>> On Thu, 29 Jul 2010 10:40:19 +0200
>> "Roedel, Joerg" <Joerg.Roedel(a)amd.com> wrote:
>>
>> > The IOMMU-API is not about SR-IOV.
>>
>> That's true. However, the point is that include/iommu.h is far from
>> the IOMMU-API.
>>
>> You could still insist that include/iommu.h is designed for the
>> generic IOMMU-API. But the fact is that it's designed for very
>> specific purposes. No intention to make it for generic purposes.
>
> I have no clue about the ARM iommus on the omap-platform. From a quick
> look into the header file I see some similarities to the IOMMU-API. I am
> also very open for discussions about how the IOMMU-API could be extended
> to fit the needs of other platforms. Only because nobody has tried to
> discuss about such an effort is no reason to push the IOMMU-API back.
>
>>
>> Since you added it two years ago, nobody has tried to extend
>> it. Instead, we have something like
>> arch/arm/plat-omap/include/plat/iommu.h.
>
> And I think we should try to merge this platform-specific functionality
> into the IOMMU-API.
>
> Joerg
>
> --
> Joerg Roedel - AMD Operating System Research Center
>
> Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
> General Managers: Alberto Bozzo, Andrew Bowd
> Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr.
> 43632
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm"
> in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: FUJITA Tomonori on
On Thu, 29 Jul 2010 12:47:26 +0100
Russell King - ARM Linux <linux(a)arm.linux.org.uk> wrote:

> On Thu, Jul 29, 2010 at 10:12:05AM +0200, Arnd Bergmann wrote:
> > On Wednesday 28 July 2010 23:21:56 Russell King - ARM Linux wrote:
> > > On Wed, Jul 28, 2010 at 07:50:20PM +0200, Arnd Bergmann wrote:
> > > > The DMA API is extremely flexible, it works just fine with all the
> > > > IOMMUs that I've seen so far. Please take a look at
> > > > include/asm-generic/dma-mapping-common.h and its users to see how
> > > > to use multiple IOMMUs depending on the device.
> > >
> > > We don't yet use those DMA API interface extensions because we haven't
> > > had the need. If someone who has the need wants to put the effort in
> > > though...
> >
> > Right, it shouldn't be hard now that the groundwork for that is done.
> > Also, it's only really needed if you have IOMMUs of different types in the
> > same system. If msm doesn't have any swiotlb or dmabounce devices,
> > it could always use the same implementation for all devices.
> >
> > > One of the problems with it though is the abstraction of the sync*
> > > operations is the wrong way around for stuff like dmabounce - we want
> > > to be passed the base address of the buffer (so we can look this up),
> > > plus offset and length. We don't want to know just the region which
> > > is affected.
> >
> > Yes, but that is an unrelated (dmabounce specific) problem that seems to
> > be fixed by an existing patch.
>
> It's not unrelated because it stands in the way of using that interface.
> The patch also seems to be buggy in that it doesn't fix the for_device
> case - it leaves 'off' as zero.

Ah, sorry about the bug. Surely, the for_device needs to do the same
as the for_cpu. I've attached the updated patch.

We need to fix dmabounce.c anyway (even if we keep the sync_range API)
because drivers use the sync API to do a partial sync.


> I'm also not sold on this idea that the sync_range API is being obsoleted.
> It seems to me to be a step in the wrong direction. The range API is a
> natural subset of the 'normal' sync API, yet people are trying to shoehorn
> the range API into the 'norma' API. If anything it's the 'normal' API
> which should be obsoleted as it provides reduced information to
> implementations, which then have to start fuzzy-matching the passed
> address.

It would have been nice if you had opposed when this issue was
discussed...

commit 8127bfc5645db0e050468e0ff971b4081f73ddcf
Author: FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp>
Date: Wed Mar 10 15:23:18 2010 -0800

DMA-API.txt: remove dma_sync_single_range description


As you said, the range API might be safer (since it requires more
information). However, there were already drivers using the
dma_sync_single_for API to do a partial sync (i.e. do a sync on
range).

Inspecting all the usage of the dma_sync_single_for API to see which
drivers to do a partial sync looks unrealistic. So keeping the
dma_sync_single_range_for API is pointless since drivers keep using
dma_sync_single_for API.

And the majority of implementations doesn't use 'range' information,
i.e., the implementation of dma_sync_single_for and
dma_sync_single_range_for API is identical.


> If we're going to start fuzzy-matching the passed address, then I think
> we also need to add detection of overlapping mappings and BUG() on such
> cases - otherwise we risk the possibility of having multiple overlapping
> mappings and hitting the wrong mapping with this reduced-information sync
> API.

Strict checking would be nice. If architectures can do such easily, we
had better to do so.

However, I'm not sure we need to take special care for the
dma_sync_single_for API. In general, misuse of the majority of the DMA
functions is deadly.

=
From: FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp>
Subject: [PATCH] ARM: dmabounce: fix partial sync in dma_sync_single_* API

Some network drivers do a partial sync with
dma_sync_single_for_{device|cpu}. The dma_addr argument might not be
the same as one as passed into the mapping API.

This adds some tricks to find_safe_buffer() for
dma_sync_single_for_{device|cpu}.

Signed-off-by: FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp>
---
arch/arm/common/dmabounce.c | 32 +++++++++++++++++++++++---------
1 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index cc0a932..dbd30dc 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -163,7 +163,8 @@ alloc_safe_buffer(struct dmabounce_device_info *device_info, void *ptr,

/* determine if a buffer is from our "safe" pool */
static inline struct safe_buffer *
-find_safe_buffer(struct dmabounce_device_info *device_info, dma_addr_t safe_dma_addr)
+find_safe_buffer(struct dmabounce_device_info *device_info, dma_addr_t safe_dma_addr,
+ int for_sync)
{
struct safe_buffer *b, *rb = NULL;
unsigned long flags;
@@ -171,9 +172,17 @@ find_safe_buffer(struct dmabounce_device_info *device_info, dma_addr_t safe_dma_
read_lock_irqsave(&device_info->lock, flags);

list_for_each_entry(b, &device_info->safe_buffers, node)
- if (b->safe_dma_addr == safe_dma_addr) {
- rb = b;
- break;
+ if (for_sync) {
+ if (b->safe_dma_addr <= safe_dma_addr &&
+ safe_dma_addr < b->safe_dma_addr + b->size) {
+ rb = b;
+ break;
+ }
+ } else {
+ if (b->safe_dma_addr == safe_dma_addr) {
+ rb = b;
+ break;
+ }
}

read_unlock_irqrestore(&device_info->lock, flags);
@@ -205,7 +214,8 @@ free_safe_buffer(struct dmabounce_device_info *device_info, struct safe_buffer *
/* ************************************************** */

static struct safe_buffer *find_safe_buffer_dev(struct device *dev,
- dma_addr_t dma_addr, const char *where)
+ dma_addr_t dma_addr, const char *where,
+ int for_sync)
{
if (!dev || !dev->archdata.dmabounce)
return NULL;
@@ -216,7 +226,7 @@ static struct safe_buffer *find_safe_buffer_dev(struct device *dev,
pr_err("unknown device: Trying to %s invalid mapping\n", where);
return NULL;
}
- return find_safe_buffer(dev->archdata.dmabounce, dma_addr);
+ return find_safe_buffer(dev->archdata.dmabounce, dma_addr, for_sync);
}

static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size,
@@ -286,7 +296,7 @@ static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size,
static inline void unmap_single(struct device *dev, dma_addr_t dma_addr,
size_t size, enum dma_data_direction dir)
{
- struct safe_buffer *buf = find_safe_buffer_dev(dev, dma_addr, "unmap");
+ struct safe_buffer *buf = find_safe_buffer_dev(dev, dma_addr, "unmap", 0);

if (buf) {
BUG_ON(buf->size != size);
@@ -398,7 +408,7 @@ int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n",
__func__, addr, off, sz, dir);

- buf = find_safe_buffer_dev(dev, addr, __func__);
+ buf = find_safe_buffer_dev(dev, addr, __func__, 1);
if (!buf)
return 1;

@@ -411,6 +421,8 @@ int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
DO_STATS(dev->archdata.dmabounce->bounce_count++);

if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL) {
+ if (addr != buf->safe_dma_addr)
+ off = addr - buf->safe_dma_addr;
dev_dbg(dev, "%s: copy back safe %p to unsafe %p size %d\n",
__func__, buf->safe + off, buf->ptr + off, sz);
memcpy(buf->ptr + off, buf->safe + off, sz);
@@ -427,7 +439,7 @@ int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr,
dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n",
__func__, addr, off, sz, dir);

- buf = find_safe_buffer_dev(dev, addr, __func__);
+ buf = find_safe_buffer_dev(dev, addr, __func__, 1);
if (!buf)
return 1;

@@ -440,6 +452,8 @@ int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr,
DO_STATS(dev->archdata.dmabounce->bounce_count++);

if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL) {
+ if (addr != buf->safe_dma_addr)
+ off = addr - buf->safe_dma_addr;
dev_dbg(dev, "%s: copy out unsafe %p to safe %p, size %d\n",
__func__,buf->ptr + off, buf->safe + off, sz);
memcpy(buf->safe + off, buf->ptr + off, sz);
--
1.6.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/