From: Russell King - ARM Linux on
On Fri, Mar 05, 2010 at 08:28:34AM +1100, Benjamin Herrenschmidt wrote:
> I don't think there's a core or driver problem in this specific case. As
> we discussed earlier, I believe the problem is that ARM considers a
> fresh page out of the page cache as "clean" instead of "dirty", and
> inverting that like we do on powerpc will fix their problem too.

The only concern is that it means we treat anonymous pages as dirty
by default.

That's quite sub-optimal since we take care (eg) on write faults to
copy the page and take care of the cache issues while we do that -
whether that be remapping the page to be coherent with the user
address, or cleaning each cache line as we copy the data.

Of course, the simple solution is to also arrange for PG_arch_1 to be
set in this case.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Catalin Marinas on
On Thu, 2010-03-04 at 21:37 +0000, Benjamin Herrenschmidt wrote:
> On Thu, 2010-03-04 at 18:07 +0000, Catalin Marinas wrote:
> > I'm not familiar with SH but for PIO devices the flushing shouldn't be
> > more aggressive. For the DMA devices, Russell suggested that we mark
> > the page as clean (set PG_dcache_clean) in the DMA API to avoid the
> > default flushing.
>
> I really like that idea, as I said earlier, but I'm worried about the I$
> side of things. IE. What I'm trying to say is that I can't see how to do
> that optimisation without ending up with missing I$ invalidations or
> doing way too many of them, unless we have a separate bit to track I$
> state.

But does this optimisation really matter? I think with careful checking
in set_pte_at(), you are not going to invalidate the I-cache more than
necessary. If the original page wasn't pte_present() you would need to
do the I-cache invalidation. The other cases where set_pte_at() is
called for LRU (pte_young) or COW (pte_write) we can avoid the extra
invalidation.

--
Catalin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andreas Mohr on
> Here is a program which Lothar sent me some time ago (the timestamp on
> the .c is June 2004 - I can't find the original email though.) I've
> just checked with Lothar, who has given me permission to reproduce it.
>
> I can't guarantee that this program still shows a problem - since I
> believe I've never been able to reproduce it myself. It might be worth
> checking how other architectures behave.

Tried this on my BCM4710 MIPSEL 2.6.31.9 OpenWrt/Debian (problematic
cache-suspected history due to possibly related USB-audio lockups),
/dev/sda3 ext2 on an USB stick, no errors here, even when increasing tenfold
to 2560 and adding a sleep(2) in between.

Will investigate these things for real sometime later.

Andreas Mohr

P.S.: KARO (Lothar...) makes very nice boards :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Paul Mundt on
On Fri, Mar 05, 2010 at 08:37:40AM +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2010-03-04 at 18:07 +0000, Catalin Marinas wrote:
> > Are you more in favour if a PIO kmap API than inverting the meaning of
> > PG_arch_1?
>
> My main worry with this approach is the sheer amount of drivers that
> need fixing. I believe inverting PG_arch_1 is a better solution and I
> somewhat fail to see how we end up doing too much flushing if we have
> per-page execute permission (but maybe SH doesn't ?)
>
Basically we have two different MMUs on VIPT parts, the older one on all
SH-4 parts were all read-implies-exec with no ability to differentiate
between read or exec access. For these parts the PG_dcache_dirty approach
saves us from a lot of flushing, and the corner cases were isolated
enough that we could tolerate fixups at the driver level, even on a
write-allocate D-cache.

For second generation SH-4A (SH-X2) and up parts, read and exec are split
out and we could reasonably adopt the PG_dcache_clean approach there
while adopting the same sort of flushing semantics as PPC to avoid
flushing constantly. The current generation of parts far outnumber their
legacy counterparts, so it's certainly something I plan to experiment
with.

We have an additional level of complexity on some of the SMP parts with a
non-coherent I-cache, some of the early CPUs have broken broadcasting of
the cacheops in hardware and so need to rely on IPIs, while the later
parts broadcast properly. We also need to deal with D-cache IPIs when
using mixed coherency protocols on different CPUs.

For older PIPT parts we've never used the deferred flush, since the only
time we ever had to bother with cache maintenance was in the DMA ops, as
anything closer to the CPU than the PCI DMAC had no opportunity to be
snooped.

> > I'm not familiar with SH but for PIO devices the flushing shouldn't be
> > more aggressive. For the DMA devices, Russell suggested that we mark
> > the page as clean (set PG_dcache_clean) in the DMA API to avoid the
> > default flushing.
>
> I really like that idea, as I said earlier, but I'm worried about the I$
> side of things. IE. What I'm trying to say is that I can't see how to do
> that optimisation without ending up with missing I$ invalidations or
> doing way too many of them, unless we have a separate bit to track I$
> state.
>
Using PG_dcache_clean from the DMA API sounds like a pretty good idea,
and certainly worth experimenting with. I don't know how we would do the
I-cache optimization without a PG_arch_2, though.

In any event, if there's going to be a mass exodus to PG_dcache_clean,
Documentation/cachetlb.txt could use a considerable amount of expanding.
The read/exec and I-cache optimizations are something that would be
valuable to document, as opposed to simply being pointed at the sparc64
approach with the regular PG_dcache_dirty caveats.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on
On Thu, 2010-03-04 at 21:40 +0000, Russell King - ARM Linux wrote:
> On Fri, Mar 05, 2010 at 08:28:34AM +1100, Benjamin Herrenschmidt wrote:
> > I don't think there's a core or driver problem in this specific case. As
> > we discussed earlier, I believe the problem is that ARM considers a
> > fresh page out of the page cache as "clean" instead of "dirty", and
> > inverting that like we do on powerpc will fix their problem too.
>
> The only concern is that it means we treat anonymous pages as dirty
> by default.
>
> That's quite sub-optimal since we take care (eg) on write faults to
> copy the page and take care of the cache issues while we do that -

If you do the cache handling inside your copy_user_highpage() then you
can just set PG_arch_1 stuff there.

> whether that be remapping the page to be coherent with the user
> address, or cleaning each cache line as we copy the data.
>
> Of course, the simple solution is to also arrange for PG_arch_1 to be
> set in this case.

Right.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/