From: Benjamin Herrenschmidt on
On Tue, 2010-04-06 at 14:52 +0900, KOSAKI Motohiro wrote:

(Adding linux-arch)

> This check was introduced the following commit. yes now we don't
> consider arch specific PROT_xx flags. but I don't think it is odd.
>
> Yeah, I can imagine at least embedded people certenary need arch
> specific PROT_xx flags and they hope to change it. but I don't
> think mprotect() fit for your usage. I mean mprotect() is widely
> used glibc internally. then, If mprotec can change which flags,
> glibc might turn off such flags implictly.
>
> So, Why can't we proper new syscall? It has no regression risk.

I don't care much personally whether we use mprotect() or a new syscall,
but at this stage we already have PROT_SAO going that way for powerpc so
that would be an ABI change.

However, the main issue isn't really there. The main issue is that right
now, everything we do in mmap.c, mprotect.c, ... revolves around having
everything translated into the single vm_flags field. VMA merging
decisions, construction of vm_page_prot, etc... everything is there.

However, this is a 32-bit field on 32-bit archs, and we already use all
possible bits in there. It's also a field entirely defined in generic
code with no provision for arch specific bits.

The question here thus boils down to what direction do we want to go to
if we want to untangle that and provide the ability to expose mapping
"attributes" basically. In fact, I suspect even x86 might have good use
of that to create things like relaxed ordering mappings no ?

This boils down, so far to a few facts/questions to be resolved:

- Do we want to use the existing PROT_ argument to mmap, mprotect,... ?
There's plenty of bit space, and we already have at least one example of
an arch adding something to it (powerpc with PROT_SAO - aka Strong
Access Ordering - aka Make It Look Like An x86 :-)

- If not, while a separate syscall would be fine with me for setting
attributes after the fact, it makes it harder to pass them via mmap, is
that a big deal ? IE. Ie it means one -always- has to call it after mmap
to change the attributes. That means for example that mmap will
potentially create a VMA merged with another one, just to be re-split
due to the attribute change. A bit gross...

- Do we want to keep the current "Funnel everything into vm_flags"
approach ? That leaves no option that I can see but to extend it into a
u64 so it grows on 32-bit archs.

- If not, I see two approaches here: Either having a separate / new
"attribute" field in the VMA or going straight for the vm_page_prot (ie.
the pgprot). In both cases, things like vma_merge() need to grow a new
argument since obviously we can't merge things with different
attributes.

- ... Unless we just replace VM_SAO with VM_CANT_MERGE and set that
whenever a VMA has a non-0 attributes. Sad but simpler

Any other / better idea ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on
On Tue, 2010-04-06 at 15:24 +0900, KOSAKI Motohiro wrote:

> I guess you haven't catch my intention. I didn't say we have to remove
> PROT_SAO and VM_SAO.
> I mean mmap(PROT_SAO) is ok, it's only append new flag, not change exiting
> flags meanings. I'm only against mprotect(PROT_NONE) turn off PROT_SAO
> implicitely.
>
> IOW I recommend we use three syscall
> mmap() create new mappings
> mprotect() change a protection of mapping (as a name)
> mattribute(): (or similar name)
> change an attribute of mapping (e.g. PROT_SAO or
> another arch specific flags)
>
> I'm not against changing mm/protect.c for PROT_SAO.

Ok, I see. No biggie. The main deal remains how we want to do that
inside the kernel :-) I think the less horrible options here are
to either extend vm_flags to always be 64-bit, or add a separate
vm_map_attributes flag, and add the necessary bits and pieces to
prevent merge accross different attribute vma's.

The more I try to hack it into vm_page_prot, the more I hate that
option.

Cheers
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on
On Tue, 2010-04-06 at 19:26 +0900, KOSAKI Motohiro wrote:
> > Ok, I see. No biggie. The main deal remains how we want to do that
> > inside the kernel :-) I think the less horrible options here are
> > to either extend vm_flags to always be 64-bit, or add a separate
> > vm_map_attributes flag, and add the necessary bits and pieces to
> > prevent merge accross different attribute vma's.
>
> vma->vm_flags already have VM_SAO. Why do we need more flags?
> At least, I dislike to add separate flags member into vma.
> It might introduce unnecessary messy into vma merge thing.

Well, we did shove SAO in there, and used up the very last vm_flag for
it a while back. Now I need another one, for little endian mappings. So
I'm stuck.

But the problem goes further I believe. Archs do nowadays have quite an
interesting set of MMU attributes that it would be useful to expose to
some extent.

Some powerpc's also provide storage keys for example and I think ARM
have something along those lines. There's interesting cachability
attributes too, on x86 as well. Being able to use such attributes to
request for example a relaxed ordering mapping on x86 might be useful.

I think it basically boils down to either extend vm_flags to always be
64-bit, which seems to be Nick preferred approach, or introduct a
vm_attributes with all the necessary changes to the merge code to take
it into account (not -that- hard tho, there's only half a page of
results in grep for these things :-)

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/