From: Catalin Marinas on
On Sun, 2010-02-28 at 00:24 +0000, Benjamin Herrenschmidt wrote:
> On Fri, 2010-02-26 at 21:49 +0000, Russell King - ARM Linux wrote:
> > On Sat, Feb 27, 2010 at 08:40:29AM +1100, Benjamin Herrenschmidt wrote:
> > > Hrm, the DMA API certainly doesn't handle the I$/D$ coherency on
> > > powerpc.. I'm afraid that whole cache handling stuff is totally
> > > inconsistent since different archs have different expectations here.
> >
> > It doesn't on ARM either.
>
> Ok, pfiew :-)
>
> So far, my understanding with I$/D$ is that we only care in a few cases
> which is executing of an mmap'ed piece of executable that is -not- being
> written to, and swap.
>
> I -think- that in both cases, the page cache always pops up a new page
> with PG_arch_1 clear before the driver gets to either DMA or PIO to it
> when faulted the first time around, before any PTE is inserted.

That's my understanding too.

> So the current approach on powerpc with I$/D$ should work fine, and it
> -might- make sense to use a similar one on PIPT ARM, provided we don't
> have expectations of the I$/D$ coherency being maintained on
> -subsequent- writes (PIO or DMA either) to such a page by the same
> program transparently by the kernel.

Are these subsequent writes likely to happen?

> There's two potential problems with the approach, and maybe more that I
> have missed though. One is the case of a networked filesystem where the
> executable pages are modified remotely. However, I would expect such a
> program to invalidate the PTE mappings before making the change visible,
> so we -do- get a chance to re-flush provided something clears PG_arch_1.

I think the NFS code in Linux calls flush_dcache_page(). This function
can check whether the page is already mapped and do the cache flushing
rather than deferring it to set_pte_at().

> Then, there's In the case of a multithread app, where one thread does
> the cache flush and another thread then executes, the earlier ARMs
> without broadcast ops have a potential problem there. In fact, some
> variant of PowerPC 440 have the same problem and some people are
> (ab)using those for SMP setups I'm being told.

Yes. That could be solved at set_pte_at() level using IPIs.

> For that case, I see two options. One is a big hammer but would make
> existing code work to "most" extent: Don't allow a page to be both
> writable and executable. Ping-pong the page permission lazily and flush
> when transitioning from write to exec.

Are you referring to the SMP and non-broadcasting cache maintenance
issue? The same pte could be shared between multiple CPUs, so once you
make it executable on one it becomes executable on the others.

--
Catalin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Russell King - ARM Linux on
On Mon, Mar 01, 2010 at 10:39:14AM +0000, Catalin Marinas wrote:
> On Sun, 2010-02-28 at 05:01 +0000, James Bottomley wrote:
> > But the point of all of this is that I cache invalidation doesn't appear
> > anywhere in the I/O path ... so if we're getting I/D incoherency,
> > there's some problem in the mm code (or there's a missing arch
> > assumption ... like I cache gets moved in more aggressively than we
> > expect). Parisc is very sensitive to I/D incoherency, so we'd notice if
> > there were a serious generic problem here.
>
> On ARM PIPT, it's probably because flush_cache_page isn't implemented.
> But as I said above, given the speculative fetches I don't think it
> would help much (well, it would work a bit better but not a complete
> fix).

Not quite. flush_cache_page() is called when we unmap or replace a page
in userspace, which is completely the wrong place to do I-cache coherency
when you have speculatively loaded caches - or even D-cache coherency if
your cache behaves as a speculatively loaded PIPT or non-aliasing VIPT.

Flushing the I-cache after a page has been in userspace does nothing to
ensure that there aren't any I-cache lines associated with that page
when you next come to map it into userspace.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on
On Mon, 2010-03-01 at 11:10 +0000, Catalin Marinas wrote:
>
>
> Yes. That could be solved at set_pte_at() level using IPIs.

Well, set_pte_at() itself is called with the PTE lock held, so you have
to be careful with IPIs at that point. You need the flush to happen
-before- the PTE is visible and you cannot synchronously send an IPI.

> > For that case, I see two options. One is a big hammer but would make
> > existing code work to "most" extent: Don't allow a page to be both
> > writable and executable. Ping-pong the page permission lazily and
> flush
> > when transitioning from write to exec.
>
> Are you referring to the SMP and non-broadcasting cache maintenance
> issue? The same pte could be shared between multiple CPUs, so once you
> make it executable on one it becomes executable on the others.

Right, you would have to play the ping-pong trick globally. That's what
I do on ppc 440 for bluegene though that code isn't upstream.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: FUJITA Tomonori on
On Sun, 28 Feb 2010 10:31:03 +0530
James Bottomley <James.Bottomley(a)HansenPartnership.com> wrote:

> On Sun, 2010-02-28 at 11:14 +1100, Benjamin Herrenschmidt wrote:
> > On Fri, 2010-02-26 at 21:00 +0000, Russell King - ARM Linux wrote:
> > > On Fri, Feb 26, 2010 at 04:25:21PM +0000, Catalin Marinas wrote:
> > > > For mmap'ed pages (and present in the page cache), is it guaranteed that
> > > > the HCD driver won't write to it once it has been mapped into user
> > > > space? If that's the case, it may solve the problem by just reversing
> > > > the meaning of PG_arch_1 on ARM and assume that a newly allocated page
> > > > has dirty D-cache by default.
> > >
> > > I guess we could also set PG_arch_1 in the DMA API as well, to avoid the
> > > unnecessary D cache flushing when clean pages get mapped into userspace.
> >
> > That's an interesting thought for us too. When doing I$/D$ coherency, we
> > have to fist flush the D$ and then invalidate the I$. If we could keep
> > track of D$ and I$ separately, we could avoid the first step in many
> > cases, including the DMA API trick you mentioned.
> >
> > I wonder if it's time to get a PG_arch_2 :-)
>
> Sorry to be a bit late to the party (on holiday), but I/D coherency is
> supposed to be taken care of using flush_cache_page in the memory
> mapping routines.

powerpc does that? To be exact, powerpc doesn't need
flush_cache_page() and handles I/D coherency in the pte modification
code. powerpc uses PG_arch_1 to avoid unnecessarily handling I/D
coherency. Seems that IA64 does the same trick with PG_arch_1.


> But the point of all of this is that I cache invalidation doesn't appear
> anywhere in the I/O path ... so if we're getting I/D incoherency,
> there's some problem in the mm code (or there's a missing arch
> assumption ... like I cache gets moved in more aggressively than we
> expect). Parisc is very sensitive to I/D incoherency, so we'd notice if
> there were a serious generic problem here.

I'm not sure that there are some problems in the mm or common code. Is
this ARM's implementation issue? (Of course, the usb stack and the
driver's misuse of the DMA API needs to be fixed too).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Catalin Marinas on
On Tue, 2010-03-02 at 21:11 +0900, FUJITA Tomonori wrote:
> On Sun, 28 Feb 2010 10:31:03 +0530
> James Bottomley <James.Bottomley(a)HansenPartnership.com> wrote:
> > But the point of all of this is that I cache invalidation doesn't appear
> > anywhere in the I/O path ... so if we're getting I/D incoherency,
> > there's some problem in the mm code (or there's a missing arch
> > assumption ... like I cache gets moved in more aggressively than we
> > expect). Parisc is very sensitive to I/D incoherency, so we'd notice if
> > there were a serious generic problem here.
>
> I'm not sure that there are some problems in the mm or common code. Is
> this ARM's implementation issue? (Of course, the usb stack and the
> driver's misuse of the DMA API needs to be fixed too).

Just to summarise - on ARM (PIPT / non-aliasing VIPT) there is I-cache
invalidation for user pages in update_mmu_cache() (it could actually be
in set_pte_at on SMP to avoid a race but that's for another thread). The
D-cache is flushed by this function only if the PG_arch_1 bit is set.
This bit is set in the ARM case by flush_dcache_page(), following the
advice in Documentation/cachetlb.txt.

With some drivers (those doing PIO) or subsystems (SCSI mass storage
over USB HCD), there is no call to flush_dcache_page() for page cache
pages, hence the ARM implementation of update_mmu_cache() doesn't flush
the D-cache (and only invalidating the I-cache doesn't help).

The viable solutions so far:

1. Implement a PIO mapping API similar to the DMA API which takes
care of the D-cache flushing. This means that PIO drivers would
need to be modified to use an API like pio_kmap()/pio_kunmap()
before writing to a page cache page.
2. Invert the meaning of PG_arch_1 to denote a clean page. This
means that by default newly allocated page cache pages are
considered dirty and even if there isn't a call to
flush_dcache_page(), update_mmu_cache() would flush the D-cache.
This is the PowerPC approach.

Option 2 above looks pretty appealing to me since it can be done in the
ARM code exclusively. I've done some tests and it indeed solves the
cache coherency with a rootfs on a USB stick. As Russell suggested, it
can be optimised to mark a page as clean when the DMA API is involved to
avoid duplicate flushing.

It was also suggested to add a PG_arch_2 flag which would keep track of
the I-cache status as well.

I can post a proposal to modify the cachetlb.txt document to reflect the
issues we currently have on ARM.

--
Catalin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/