From: Catalin Marinas on
On Tue, 2010-03-02 at 17:05 +0000, Catalin Marinas wrote:
> On Tue, 2010-03-02 at 21:11 +0900, FUJITA Tomonori wrote:
> > On Sun, 28 Feb 2010 10:31:03 +0530
> > James Bottomley <James.Bottomley(a)HansenPartnership.com> wrote:
> > > But the point of all of this is that I cache invalidation doesn't appear
> > > anywhere in the I/O path ... so if we're getting I/D incoherency,
> > > there's some problem in the mm code (or there's a missing arch
> > > assumption ... like I cache gets moved in more aggressively than we
> > > expect). Parisc is very sensitive to I/D incoherency, so we'd notice if
> > > there were a serious generic problem here.
> >
> > I'm not sure that there are some problems in the mm or common code. Is
> > this ARM's implementation issue? (Of course, the usb stack and the
> > driver's misuse of the DMA API needs to be fixed too).
>
> Just to summarise - on ARM (PIPT / non-aliasing VIPT) there is I-cache
> invalidation for user pages in update_mmu_cache() (it could actually be
> in set_pte_at on SMP to avoid a race but that's for another thread). The
> D-cache is flushed by this function only if the PG_arch_1 bit is set.
> This bit is set in the ARM case by flush_dcache_page(), following the
> advice in Documentation/cachetlb.txt.
>
> With some drivers (those doing PIO) or subsystems (SCSI mass storage
> over USB HCD), there is no call to flush_dcache_page() for page cache
> pages, hence the ARM implementation of update_mmu_cache() doesn't flush
> the D-cache (and only invalidating the I-cache doesn't help).
>
> The viable solutions so far:
>
> 1. Implement a PIO mapping API similar to the DMA API which takes
> care of the D-cache flushing. This means that PIO drivers would
> need to be modified to use an API like pio_kmap()/pio_kunmap()
> before writing to a page cache page.
> 2. Invert the meaning of PG_arch_1 to denote a clean page. This
> means that by default newly allocated page cache pages are
> considered dirty and even if there isn't a call to
> flush_dcache_page(), update_mmu_cache() would flush the D-cache.
> This is the PowerPC approach.
>
> Option 2 above looks pretty appealing to me since it can be done in the
> ARM code exclusively. I've done some tests and it indeed solves the
> cache coherency with a rootfs on a USB stick. As Russell suggested, it
> can be optimised to mark a page as clean when the DMA API is involved to
> avoid duplicate flushing.

Actually, option 2 still has an issue - does not easily work on SMP
systems where cache maintenance operations aren't broadcast in hardware.
In this case (ARM11MPCore), flush_dcache_page() is implemented
non-lazily so that the flushing happens on the same processor that
dirtied the cache. But since with some drivers there is no call to this
function, it wouldn't make any difference.

A solution is to do something like read-for-ownership before flushing
the D-cache in update_mmu_cache() (or set_pte_at()).

--
Catalin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on
On Tue, 2010-03-02 at 17:47 +0000, Catalin Marinas wrote:
>
> Actually, option 2 still has an issue - does not easily work on SMP
> systems where cache maintenance operations aren't broadcast in hardware.
> In this case (ARM11MPCore), flush_dcache_page() is implemented
> non-lazily so that the flushing happens on the same processor that
> dirtied the cache. But since with some drivers there is no call to this
> function, it wouldn't make any difference.

Also, option 1 would not solve the icache issue which has the same
problem related to IPIs. You -really- need to spank some HW folks
here :-)

> A solution is to do something like read-for-ownership before flushing
> the D-cache in update_mmu_cache() (or set_pte_at()).

You might also want to experiment with not clearing PG_arch_1 in
flush_dcache_page(). I'm not 100% convinced it is necessary and that may
reduce the amount of flushing needed.

Another thing is, on powerpc, we only do the cleaning when we try to
execute from the pages. IE. We basically "filter out" exec permission
when pages are not clean. At least on processors that support per-page
exec permission. You may want to consider something like that as well.

Cheers,
Ben.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on
On Tue, 2010-03-02 at 17:05 +0000, Catalin Marinas wrote:

> The viable solutions so far:
>
> 1. Implement a PIO mapping API similar to the DMA API which takes
> care of the D-cache flushing. This means that PIO drivers would
> need to be modified to use an API like pio_kmap()/pio_kunmap()
> before writing to a page cache page.
> 2. Invert the meaning of PG_arch_1 to denote a clean page. This
> means that by default newly allocated page cache pages are
> considered dirty and even if there isn't a call to
> flush_dcache_page(), update_mmu_cache() would flush the D-cache.
> This is the PowerPC approach.

I don't see the point of a "PIO" API. I would thus vote for 2 :-) Note
that flushing the D-cache isn't enough, you also need to invalidate the
I-cache as we discussed earlier, though you mostly get away if you don't
by luck.

There's also a question as to whether clearing PG_arch_1 is
flush_dcache_page() is really necessary or not.

> Option 2 above looks pretty appealing to me since it can be done in the
> ARM code exclusively. I've done some tests and it indeed solves the
> cache coherency with a rootfs on a USB stick. As Russell suggested, it
> can be optimised to mark a page as clean when the DMA API is involved to
> avoid duplicate flushing.

That wouldn't solve the need for invalidating the I-cache... Unless we
use another bit.

> It was also suggested to add a PG_arch_2 flag which would keep track of
> the I-cache status as well.
>
> I can post a proposal to modify the cachetlb.txt document to reflect the
> issues we currently have on ARM.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on
On Tue, 2010-03-02 at 21:11 +0900, FUJITA Tomonori wrote:
>
> > Sorry to be a bit late to the party (on holiday), but I/D coherency
> is
> > supposed to be taken care of using flush_cache_page in the memory
> > mapping routines.
>
> powerpc does that? To be exact, powerpc doesn't need
> flush_cache_page() and handles I/D coherency in the pte modification
> code. powerpc uses PG_arch_1 to avoid unnecessarily handling I/D
> coherency. Seems that IA64 does the same trick with PG_arch_1.

Right. We set PG_arch_1 to avoid doing it again of a given physical
page. We assume that it's always cleared when a page is recycled by the
page cache and we also clear it in flush_dcache_page() though the need
for that later thing is dubious...

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: FUJITA Tomonori on
On Wed, 03 Mar 2010 10:29:54 +1100
Benjamin Herrenschmidt <benh(a)kernel.crashing.org> wrote:

> On Tue, 2010-03-02 at 17:05 +0000, Catalin Marinas wrote:
>
> > The viable solutions so far:
> >
> > 1. Implement a PIO mapping API similar to the DMA API which takes
> > care of the D-cache flushing. This means that PIO drivers would
> > need to be modified to use an API like pio_kmap()/pio_kunmap()
> > before writing to a page cache page.
> > 2. Invert the meaning of PG_arch_1 to denote a clean page. This
> > means that by default newly allocated page cache pages are
> > considered dirty and even if there isn't a call to
> > flush_dcache_page(), update_mmu_cache() would flush the D-cache.
> > This is the PowerPC approach.
>
> I don't see the point of a "PIO" API. I would thus vote for 2 :-) Note

Yeah, as powerpc and ia64 do, arm can flush D cache and invalidate I
cache when inserting a executable page to pte, IIUC. No need for the
new API for I/D consistency.

The ways to improve the approach (introducing PG_arch_2 or marking a
page clean on dma_unmap_* with DMA_FROM_DEVICE like ia64 does) is up
to architectures.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/