From: Catalin Marinas on
On Thu, 2010-03-04 at 15:41 +0000, Paul Mundt wrote:
> On Thu, Mar 04, 2010 at 03:29:38PM +0000, Catalin Marinas wrote:
> > On Thu, 2010-03-04 at 14:21 +0000, James Bottomley wrote:
> > > The thing which was discovered in this thread is basically that ARM is
> > > handling deferred flushing (for D/I coherency) in a slightly different
> > > way from everyone else ...
> >
> > Doing a grep for PG_dcache_dirty defined in terms of PG_arch_1 reveals
> > that MIPS, Parisc, Score, SH and SPARC do similar things to ARM. PowerPC
> > and IA-64 use PG_arch_1 as a clean rather than dirty bit.
>
> SH used to use it as a PG_mapped which was roughly similar to the
> PG_dcache_clean approach, at which point things like flushing for the PIO
> case in the HCD wasn't necessary. It did result in rather aggressive over
> flushing though, which is one of the reasons we elected to switch to
> PG_dcache_dirty.

Are you more in favour if a PIO kmap API than inverting the meaning of
PG_arch_1?

I'm not familiar with SH but for PIO devices the flushing shouldn't be
more aggressive. For the DMA devices, Russell suggested that we mark the
page as clean (set PG_dcache_clean) in the DMA API to avoid the default
flushing.

> Note that the PG_dcache_dirty semantics are also outlined in
> Documentation/cachetlb.txt for PG_arch_1 usage, so it's hardly esoteric.

Yes, but the flush_dcache_page() semantics outlined in the same file
aren't followed by all the PIO drivers in the kernel.

--
Catalin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on

> > Still, you do need to flush I when a page cache page is recycled.
>
> Technically not if we've got all the I flushing when mapped executable
> sorted out. This is one of the dangers of over flushing ... if we start
> flushing where we don't need it "just to be sure" we end up papering
> over holes in the operating system and make catching actual bugs in
> operations a lot harder.

Well, ok so we are talking past each other here :-) So let me try to
summarize what we do, and then write up what I'd like to be able to do
but can't quite see how to get there just yet.

On PPC, we keep track of whether a page is "cache clean" with PG_arch1.

We only bother with flushing it when mapping it and yes, it's an
expensive operation.

We do it from within set_pte_at() and/or ptep_set_access_flags(), at
which point w test PG_arch_1, and if clear, do the flush and set it.

On systems that support per-page exec permission, we optimize things a
bit, in that unless this is an exec fault, we "skip" the flush when
mapping the page and filter out the exec permission (so that's a read
access for example). We later do the flush when exec is attempted.

On systems that don't (earlier 32-bit powerpc), we -have- to flush any
mapped page sadly as one could be mapped for read and actually executed
from. This is -not- a case of "let userspace shoot themselves in the
foot", letting stale icache leak through to userspace here is actually a
security hole in theory (granted, unlikely but we got barked at enough
when we tried to optimize that out).

Now, when we do the flush as described above, we do both D$ and I$
passes at once.

It would be indeed nice to be able to avoid the D$ flush when the page
was the target of a DMA operation, since the D$ flush is the most
expensive part of the process.

However, I don't see how to do that without having a separate page bit
to keep track of the D$ vs. I$ state. For example, if we use PG_arch_1
exclusively for D$, and always flush I$ on mapping to userspace, we end
up with a lot of I$ spurrious flushes any time glibc text for example is
mapped into a new process.

> The other thing you might not appreciate in ppc land is that for a lot
> of other systems (well, like parisc) flushing a dirty cache line is
> incredibly expensive (because we halt the CPU to wait for the memory
> eviction),

Same here. High end server PPCs have the I$ snoop the D$ but on all the
other ones, we pay a dear price for those flushes, which is why I'm
trying to see how I could exploit the trick of not doing the D$ side
flush at least for targets of DMA ops, but as I said, I can't see how it
can be done properly without another tracking bit in struct page.

> so ideally we want to flush as late as possible to give the
> natural operations a chance to clean most of the cache lines. Flushing
> a clean cache line on parisc as well as invalidations are fast
> operations. That's why the kmap makes the most sense to us for
> implementing PIO ops ... it's the farthest point we can flush the cache
> at (because beyond it we've lost the mapping the VIPT cache requires to

Cheers,
Ben.

>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel(a)lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on
On Thu, 2010-03-04 at 18:07 +0000, Catalin Marinas wrote:
>
> Are you more in favour if a PIO kmap API than inverting the meaning of
> PG_arch_1?

My main worry with this approach is the sheer amount of drivers that
need fixing. I believe inverting PG_arch_1 is a better solution and I
somewhat fail to see how we end up doing too much flushing if we have
per-page execute permission (but maybe SH doesn't ?)

> I'm not familiar with SH but for PIO devices the flushing shouldn't be
> more aggressive. For the DMA devices, Russell suggested that we mark
> the
> page as clean (set PG_dcache_clean) in the DMA API to avoid the
> default
> flushing.

I really like that idea, as I said earlier, but I'm worried about the I$
side of things. IE. What I'm trying to say is that I can't see how to do
that optimisation without ending up with missing I$ invalidations or
doing way too many of them, unless we have a separate bit to track I$
state.

> > Note that the PG_dcache_dirty semantics are also outlined in
> > Documentation/cachetlb.txt for PG_arch_1 usage, so it's hardly
> esoteric.
>
> Yes, but the flush_dcache_page() semantics outlined in the same file
> aren't followed by all the PIO drivers in the kernel.
>

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on
On Thu, 2010-03-04 at 15:25 +0000, Catalin Marinas wrote:
> My understanding from this long discussion is that we cannot get the
> kernel modifying a page cache page which is already mapped in user space
> (well, ptrace does this but we flush the cache there already).

Well, we -can- but it appears that we don't have to provide coherency
in that case since the modification is always done as the result of
userspace explicitely requesting that change (aka read() syscall) and
thus userspace is responsible for the flushing.

Cheers,
Ben.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on
On Thu, 2010-03-04 at 19:51 +0530, James Bottomley wrote:
>
> Technically, he is. In the old days, most VI architectures were high
> end enough not to require PIO transfers. The only exception was an
> IDE driver used by sparc, which lead to the arch specific ide in/out
> string instructions, in which sparc actually did all the necessary
> flushing.

Actually, Catalin's problem is with newer PIPT ARM :-)

> So no other drivers than old IDE grew up with cache flushing in the
> PIO case (and almost no high end VI hardware had an IDE interface, so
> they rarely got implemented in the arch layer). However, recently,
> with the transition from old IDE to libata and the prevalence of ARM
> with more commodity hardware, the deficiency is becoming exposed.
> Even the PA8000 workstations now come with an IDE CD, which means
> we're starting to have problems with them as well.

I don't think there's a core or driver problem in this specific case. As
we discussed earlier, I believe the problem is that ARM considers a
fresh page out of the page cache as "clean" instead of "dirty", and
inverting that like we do on powerpc will fix their problem too.

> > Seems like ARM has requirement other architectures do not, that is
> > a) not documented anywhere
> > b) causes problems
> >
> > You could argue that performance improvement (how big is it,
> anyway?)
> > is worth it, but this should be agreed to by wider community...
>
> Performance is always worth it provided we don't sacrifice
> correctness.
> The thing which was discovered in this thread is basically that ARM is
> handling deferred flushing (for D/I coherency) in a slightly different
> way from everyone else ... once that's fixed, ARM will likely not have
> the D/I problem, but we'll still have the libata (and other PIO
> systems) D flushing issue.

You mean older VIVT ARM will grow a new issue there ?

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/