From: Pavel Machek on
Hi!

> > I'm not sure that there are some problems in the mm or common code. Is
> > this ARM's implementation issue? (Of course, the usb stack and the
> > driver's misuse of the DMA API needs to be fixed too).
>
> Just to summarise - on ARM (PIPT / non-aliasing VIPT) there is I-cache
> invalidation for user pages in update_mmu_cache() (it could actually be
> in set_pte_at on SMP to avoid a race but that's for another thread). The
> D-cache is flushed by this function only if the PG_arch_1 bit is set.
> This bit is set in the ARM case by flush_dcache_page(), following the
> advice in Documentation/cachetlb.txt.
>
> With some drivers (those doing PIO) or subsystems (SCSI mass storage
> over USB HCD), there is no call to flush_dcache_page() for page cache
> pages, hence the ARM implementation of update_mmu_cache() doesn't flush
> the D-cache (and only invalidating the I-cache doesn't help).
>
> The viable solutions so far:
>
> 1. Implement a PIO mapping API similar to the DMA API which takes
> care of the D-cache flushing. This means that PIO drivers would
> need to be modified to use an API like pio_kmap()/pio_kunmap()
> before writing to a page cache page.
> 2. Invert the meaning of PG_arch_1 to denote a clean page. This
> means that by default newly allocated page cache pages are
> considered dirty and even if there isn't a call to
> flush_dcache_page(), update_mmu_cache() would flush the D-cache.
> This is the PowerPC approach.

What about option

3. Forget about PG_arch_1 and always do the flush?

How big is the performance impact? Note that current code does not
even *work* so working, 10% slower code will be an improvement.

Pavel

(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on
On Wed, 2010-03-03 at 11:10 +0530, James Bottomley wrote:
> On Wed, 2010-03-03 at 16:10 +1100, Benjamin Herrenschmidt wrote:
> > On Wed, 2010-03-03 at 12:47 +0900, FUJITA Tomonori wrote:
> > > The ways to improve the approach (introducing PG_arch_2 or marking a
> > > page clean on dma_unmap_* with DMA_FROM_DEVICE like ia64 does) is up
> > > to architectures.
> >
> > How does the above work ? IE, the dma unmap will flush the D side but
> > not the I side ... or is the ia64 flush primitive magic enough to do
> > both ?
>
> The point is that in a well regulated system, the I cache shouldn't need
> extra flushing in the kernel. We should only be faulting in R-X pages.
> If we're operating on RWX pages (i.e. self modifying code), it's the job
> of userspace to keep I/D coherency.
>
> So the only case the kernel needs to worry about is the R-X fault case
> for executable text code.

Still, you do need to flush I when a page cache page is recycled.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: James Bottomley on
On Thu, 2010-03-04 at 13:00 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2010-03-03 at 11:10 +0530, James Bottomley wrote:
> > On Wed, 2010-03-03 at 16:10 +1100, Benjamin Herrenschmidt wrote:
> > > On Wed, 2010-03-03 at 12:47 +0900, FUJITA Tomonori wrote:
> > > > The ways to improve the approach (introducing PG_arch_2 or marking a
> > > > page clean on dma_unmap_* with DMA_FROM_DEVICE like ia64 does) is up
> > > > to architectures.
> > >
> > > How does the above work ? IE, the dma unmap will flush the D side but
> > > not the I side ... or is the ia64 flush primitive magic enough to do
> > > both ?
> >
> > The point is that in a well regulated system, the I cache shouldn't need
> > extra flushing in the kernel. We should only be faulting in R-X pages.
> > If we're operating on RWX pages (i.e. self modifying code), it's the job
> > of userspace to keep I/D coherency.
> >
> > So the only case the kernel needs to worry about is the R-X fault case
> > for executable text code.
>
> Still, you do need to flush I when a page cache page is recycled.

Technically not if we've got all the I flushing when mapped executable
sorted out. This is one of the dangers of over flushing ... if we start
flushing where we don't need it "just to be sure" we end up papering
over holes in the operating system and make catching actual bugs in
operations a lot harder.

The other thing you might not appreciate in ppc land is that for a lot
of other systems (well, like parisc) flushing a dirty cache line is
incredibly expensive (because we halt the CPU to wait for the memory
eviction), so ideally we want to flush as late as possible to give the
natural operations a chance to clean most of the cache lines. Flushing
a clean cache line on parisc as well as invalidations are fast
operations. That's why the kmap makes the most sense to us for
implementing PIO ops ... it's the farthest point we can flush the cache
at (because beyond it we've lost the mapping the VIPT cache requires to
flush).

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Catalin Marinas on
On Wed, 2010-03-03 at 21:54 +0000, Pavel Machek wrote:
> > With some drivers (those doing PIO) or subsystems (SCSI mass storage
> > over USB HCD), there is no call to flush_dcache_page() for page cache
> > pages, hence the ARM implementation of update_mmu_cache() doesn't flush
> > the D-cache (and only invalidating the I-cache doesn't help).
> >
> > The viable solutions so far:
> >
> > 1. Implement a PIO mapping API similar to the DMA API which takes
> > care of the D-cache flushing. This means that PIO drivers would
> > need to be modified to use an API like pio_kmap()/pio_kunmap()
> > before writing to a page cache page.
> > 2. Invert the meaning of PG_arch_1 to denote a clean page. This
> > means that by default newly allocated page cache pages are
> > considered dirty and even if there isn't a call to
> > flush_dcache_page(), update_mmu_cache() would flush the D-cache.
> > This is the PowerPC approach.
>
> What about option
>
> 3. Forget about PG_arch_1 and always do the flush?
>
> How big is the performance impact? Note that current code does not
> even *work* so working, 10% slower code will be an improvement.

The driver fix is as simple as calling a flush_dcache_page() and I've
been carrying such patches in my tree for some time now. The question is
whether we need to do it in the driver or not (would need to update
Documentation/cachetlb.txt as well).

The reason I'm not in favour always doing the flush is that we penalise
DMA drivers where there is no need for extra D-cache flushing (already
handled by the DMA API; option 1 above is similar, just that it is meant
for PIO usage). An ARM patch I proposed for inverting the meaning of
PG_arch_1 also marks a page as clean in the dma_map_* functions.

--
Catalin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Pavel Machek on
> On Wed, 2010-03-03 at 21:54 +0000, Pavel Machek wrote:
> > > With some drivers (those doing PIO) or subsystems (SCSI mass storage
> > > over USB HCD), there is no call to flush_dcache_page() for page cache
> > > pages, hence the ARM implementation of update_mmu_cache() doesn't flush
> > > the D-cache (and only invalidating the I-cache doesn't help).
> > >
> > > The viable solutions so far:
> > >
> > > 1. Implement a PIO mapping API similar to the DMA API which takes
> > > care of the D-cache flushing. This means that PIO drivers would
> > > need to be modified to use an API like pio_kmap()/pio_kunmap()
> > > before writing to a page cache page.
> > > 2. Invert the meaning of PG_arch_1 to denote a clean page. This
> > > means that by default newly allocated page cache pages are
> > > considered dirty and even if there isn't a call to
> > > flush_dcache_page(), update_mmu_cache() would flush the D-cache.
> > > This is the PowerPC approach.
> >
> > What about option
> >
> > 3. Forget about PG_arch_1 and always do the flush?
> >
> > How big is the performance impact? Note that current code does not
> > even *work* so working, 10% slower code will be an improvement.
>
> The driver fix is as simple as calling a flush_dcache_page() and I've
> been carrying such patches in my tree for some time now. The question is
> whether we need to do it in the driver or not (would need to update
> Documentation/cachetlb.txt as well).
>
> The reason I'm not in favour always doing the flush is that we penalise
> DMA drivers where there is no need for extra D-cache flushing (already
> handled by the DMA API; option 1 above is similar, just that it is meant
> for PIO usage). An ARM patch I proposed for inverting the meaning of
> PG_arch_1 also marks a page as clean in the dma_map_* functions.

But you are not fixing driver bug, are you?

Seems like ARM has requirement other architectures do not, that is
a) not documented anywhere
b) causes problems

You could argue that performance improvement (how big is it, anyway?)
is worth it, but this should be agreed to by wider community...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/