From: Catalin Marinas on
On Tue, 2010-03-02 at 23:33 +0000, Benjamin Herrenschmidt wrote:
> On Tue, 2010-03-02 at 17:47 +0000, Catalin Marinas wrote:
> >
> > Actually, option 2 still has an issue - does not easily work on SMP
> > systems where cache maintenance operations aren't broadcast in hardware.
> > In this case (ARM11MPCore), flush_dcache_page() is implemented
> > non-lazily so that the flushing happens on the same processor that
> > dirtied the cache. But since with some drivers there is no call to this
> > function, it wouldn't make any difference.
>
> Also, option 1 would not solve the icache issue which has the same
> problem related to IPIs.

Correct. But that's true for both options.

It would have been simpler if we had software TLBs.

> You -really- need to spank some HW folks here :-)

I think they got the message :). Cortex-A9 does it properly.

> > A solution is to do something like read-for-ownership before flushing
> > the D-cache in update_mmu_cache() (or set_pte_at()).
>
> You might also want to experiment with not clearing PG_arch_1 in
> flush_dcache_page(). I'm not 100% convinced it is necessary and that may
> reduce the amount of flushing needed.

Could a file map page be swapped out (and the mapping removed), then the
page cache page modified (i.e. NFS filesystem) and flush_dcache_page()
called?

> Another thing is, on powerpc, we only do the cleaning when we try to
> execute from the pages. IE. We basically "filter out" exec permission
> when pages are not clean. At least on processors that support per-page
> exec permission. You may want to consider something like that as well.

For non-aliasing VIPT, I think that's a fair optimisation.

Thanks.

--
Catalin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Catalin Marinas on
On Wed, 2010-03-03 at 03:47 +0000, FUJITA Tomonori wrote:
> On Wed, 03 Mar 2010 10:29:54 +1100
> Benjamin Herrenschmidt <benh(a)kernel.crashing.org> wrote:
>
> > On Tue, 2010-03-02 at 17:05 +0000, Catalin Marinas wrote:
> >
> > > The viable solutions so far:
> > >
> > > 1. Implement a PIO mapping API similar to the DMA API which takes
> > > care of the D-cache flushing. This means that PIO drivers would
> > > need to be modified to use an API like pio_kmap()/pio_kunmap()
> > > before writing to a page cache page.
> > > 2. Invert the meaning of PG_arch_1 to denote a clean page. This
> > > means that by default newly allocated page cache pages are
> > > considered dirty and even if there isn't a call to
> > > flush_dcache_page(), update_mmu_cache() would flush the D-cache.
> > > This is the PowerPC approach.
> >
> > I don't see the point of a "PIO" API. I would thus vote for 2 :-) Note
>
> Yeah, as powerpc and ia64 do, arm can flush D cache and invalidate I
> cache when inserting a executable page to pte, IIUC. No need for the
> new API for I/D consistency.

I can see that IA-64 uses the PG_arch_1 bit to mark a clean page rather
than dirty (as we did for ARM). The Documentation/cachetlb.txt needs
updating.

Thanks.

--
Catalin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Catalin Marinas on
On Tue, 2010-03-02 at 23:29 +0000, Benjamin Herrenschmidt wrote:
> On Tue, 2010-03-02 at 17:05 +0000, Catalin Marinas wrote:
>
> > The viable solutions so far:
> >
> > 1. Implement a PIO mapping API similar to the DMA API which takes
> > care of the D-cache flushing. This means that PIO drivers would
> > need to be modified to use an API like pio_kmap()/pio_kunmap()
> > before writing to a page cache page.
> > 2. Invert the meaning of PG_arch_1 to denote a clean page. This
> > means that by default newly allocated page cache pages are
> > considered dirty and even if there isn't a call to
> > flush_dcache_page(), update_mmu_cache() would flush the D-cache.
> > This is the PowerPC approach.
[...]
> > Option 2 above looks pretty appealing to me since it can be done in the
> > ARM code exclusively. I've done some tests and it indeed solves the
> > cache coherency with a rootfs on a USB stick. As Russell suggested, it
> > can be optimised to mark a page as clean when the DMA API is involved to
> > avoid duplicate flushing.
>
> That wouldn't solve the need for invalidating the I-cache... Unless we
> use another bit.

Indeed. We currently always invalidate the I-cache when the page is
mapped. With PG_arch_2, we could optimise this but I'm not sure it is
worth since I think we only get an update_mmu_cache() call for a page
(unless it is unmapped and re-mapped again).

--
Catalin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Russell King - ARM Linux on
On Wed, Mar 03, 2010 at 03:54:37PM +0530, James Bottomley wrote:
> On Wed, 2010-03-03 at 09:36 +0000, Russell King - ARM Linux wrote:
> > James, that's a pipedream. If you have a processor which doesn't support
> > NX, then the kernel marks all regions executable, even if the app only
> > asks for RW protection.
>
> I'm not talking about what the processor supports ... I'm talking about
> what the user sets on the VMA. My point is that the kernel only has
> responsibility in specific situations ... it's those paths we do the I/D
> coherency on.

You may not be talking about what the processor supports, but it is
directly relevant.

> > You end up with the protection masks always having VM_EXEC set in them,
> > so there's no way to distinguish from the kernel POV which pages are
> > going to be executed and those which aren't.
>
> I think you're talking about the pte page flags, I'm talking about the
> VMA ones above.

No, I'm talking about the VMA ones.

if ((prot & PROT_READ) && (current->personality & READ_IMPLIES_EXEC))
if (!(file && (file->f_path.mnt->mnt_flags & MNT_NOEXEC)))
prot |= PROT_EXEC;
....
/* Do simple checking here so the lower-level routines won't have
* to. we assume access permissions have been handled by the open
* of the memory object, so we don't do any here.
*/
vm_flags = calc_vm_prot_bits(prot) | calc_vm_flag_bits(flags) |
mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;

calc_vm_prot_bits(unsigned long prot)
{
return _calc_vm_trans(prot, PROT_READ, VM_READ ) |
_calc_vm_trans(prot, PROT_WRITE, VM_WRITE) |
_calc_vm_trans(prot, PROT_EXEC, VM_EXEC) |
arch_calc_vm_prot_bits(prot);
}

So, if you have a CPU which does not support NX, then READ_IMPLIES_EXEC
is set in the personality. That forces PROT_EXEC for anything with
PROT_READ, which in turn forces VM_EXEC.

> I'm not saying the common path (faulting in text sections) is the
> responsibility of user space. I'm saying the uncommon path, write
> modification of binaries, is. So the kernel only needs to worry about
> the ordinary text fault path.

What I'm saying is that you can't always tell the difference between
what's an executable page and what isn't in the kernel. On NX-incapable
CPUs, the kernel treats *all* readable pages as executable, and there's
no way to tell from the VMA or page protection flags that this isn't
the case.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jamie Lokier on
Catalin Marinas wrote:
> > I wonder if it's time to get a PG_arch_2 :-)
>
> As an optimisation, I think this would help (rather than always
> invalidating the I-cache in update_mmu_cache or set_pte_at).

If PG_arch_{1,2} are used in the same way on all architectures, when
they are used at all, perhaps they should be renamed :-)

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/