From: Jeremy Fitzhardinge on
I just noticed that the original mmu notifier change (cddb8a5c14a) adds
calls to mmu_notifier_invalidate_range_start/end to
apply_to_page_range(). This doesn't seem correct to me, since
apply_to_page_range can perform arbitrary operations to the range of
pages, not just invalidation of the pages. It seems to me that the
appropriate mmu notifiers should be called either around the call to
apply_to_page_range(), or from within the callback function.

Andrea, what's the rationale for mmu_notifier_invalidate_range_start/end
here?

Thanks,
J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli on
On Fri, Jul 09, 2010 at 08:06:20AM -0700, Jeremy Fitzhardinge wrote:
> I just noticed that the original mmu notifier change (cddb8a5c14a) adds
> calls to mmu_notifier_invalidate_range_start/end to
> apply_to_page_range(). This doesn't seem correct to me, since
> apply_to_page_range can perform arbitrary operations to the range of
> pages, not just invalidation of the pages. It seems to me that the
> appropriate mmu notifiers should be called either around the call to
> apply_to_page_range(), or from within the callback function.
>
> Andrea, what's the rationale for mmu_notifier_invalidate_range_start/end
> here?

As long as the secondary mappings are teardown in range_start and
allowed to be established again only after range_end, all
modifications will be picked up by the secondary mmu. Imagine
secondary mmu like a tlb, that you only invalidate, then it'll be
refilled later (after range_end).

The exception is set_pte_at_notify that is called by ksm to establish
a readonly secondary pte in KVM, KVM only calls
get_user_pages(write=1) (never write=0 even for reads) so until that
is optimized set_pte_at_notify allows guest to access readonly data
without breaking the cow. set_pte_at_notify invokes a change_pte
method, if not implemented it'll just fallback to the invalidate_page
method that is backwards compatible, so no mmu notifier user is
required to call change_pte (especially if the secondary page fault -
kind of secondary tlb-miss software handler invokes get_user_pages
with write=0 for reads, ->change_pte can only eliminate one minor
fault so no big deal).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jeremy Fitzhardinge on
On 07/09/2010 08:12 AM, Andrea Arcangeli wrote:
> On Fri, Jul 09, 2010 at 08:06:20AM -0700, Jeremy Fitzhardinge wrote:
>
>> I just noticed that the original mmu notifier change (cddb8a5c14a) adds
>> calls to mmu_notifier_invalidate_range_start/end to
>> apply_to_page_range(). This doesn't seem correct to me, since
>> apply_to_page_range can perform arbitrary operations to the range of
>> pages, not just invalidation of the pages. It seems to me that the
>> appropriate mmu notifiers should be called either around the call to
>> apply_to_page_range(), or from within the callback function.
>>
>> Andrea, what's the rationale for mmu_notifier_invalidate_range_start/end
>> here?
>>
> As long as the secondary mappings are teardown in range_start and
> allowed to be established again only after range_end, all
> modifications will be picked up by the secondary mmu. Imagine
> secondary mmu like a tlb, that you only invalidate, then it'll be
> refilled later (after range_end).
>

Yes, but apply_to_page_range isn't necessarily making changes which
requires that teardown/refill. The most common user is vmalloc, which
is just using a side-effect of apply_to_page_range to make sure that all
the intermediate levels of the pagetable are allocated over the
vmalloced range. I think all the other users of it are within Xen code.
In particular, we're using it in the gntdev driver, which also uses mmu
notifiers to keep grant mappings in sync with process mappings, so the
recursive mmu notifier call is causing problems.

It seems to me that apply_to_page_range should be agnostic to its use,
and its up to its callers to do any appropriate mmu notifier work.
Would you be happy with a patch to remove those calls?

Thanks,
J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli on
On Fri, Jul 09, 2010 at 08:51:39AM -0700, Jeremy Fitzhardinge wrote:
> Yes, but apply_to_page_range isn't necessarily making changes which
> requires that teardown/refill. The most common user is vmalloc, which
> is just using a side-effect of apply_to_page_range to make sure that all
> the intermediate levels of the pagetable are allocated over the
> vmalloced range. I think all the other users of it are within Xen code.
> In particular, we're using it in the gntdev driver, which also uses mmu
> notifiers to keep grant mappings in sync with process mappings, so the
> recursive mmu notifier call is causing problems.
>
> It seems to me that apply_to_page_range should be agnostic to its use,
> and its up to its callers to do any appropriate mmu notifier work.
> Would you be happy with a patch to remove those calls?

mmu notifier only relevant for userland mappings, not kernel
mappings. I don't know about the xen use, but for vmalloc certainly it
can't be a problem to remove those two mmu notifier invalidates.

Only bit that is worrysome is the mm == &init_mm
pte_alloc_kernel|pte_alloc_map_lock. That seems to imply it may also
be used to mangle over userland. But apparently all users are passing
&init_mm as expected. I guess if you remove the mm parameter and you
default to &init_mm definitely there will be no risk in removing the
mmu notifier range_start/end invalidates.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli on
On Fri, Jul 09, 2010 at 10:30:48AM -0700, Jeremy Fitzhardinge wrote:
> On 07/09/2010 09:22 AM, Andrea Arcangeli wrote:
> > mmu notifier only relevant for userland mappings, not kernel
> > mappings. I don't know about the xen use, but for vmalloc certainly it
> > can't be a problem to remove those two mmu notifier invalidates.
> >
> > Only bit that is worrysome is the mm == &init_mm
> > pte_alloc_kernel|pte_alloc_map_lock. That seems to imply it may also
> > be used to mangle over userland. But apparently all users are passing
> > &init_mm as expected. I guess if you remove the mm parameter and you
> > default to &init_mm definitely there will be no risk in removing the
> > mmu notifier range_start/end invalidates.
> >
>
> No, we do have some users which use it on user memory. But those users
> are using it as part of their own mmu notifier backend, so the recursive
> calls are causing a problem. My point is that anyone using
> apply_to_page_range should be making their own calls to mmu notifiers as
> appropriate to whatever they're doing.

Makes sense, it was hard in fact to see how it would cause any problem
for you considering nobody registers mmu notifiers into mm_init...

I've to say it's next to trivial for them to detect recursion and skip
the inner superflous call if they run it under a lock. But be careful
that pte_alloc/pmd_alloc and friends can block and break the mmu
notifier locking. I'm not really sure if it apply_to_page_range is a
sane interface to use inside mmu notifier methods considering it's
supposedly a blocking operation, caller must be careful to use that
inside a mmu notifier callback anyway...

I'm not opposed to removing it, I've been wondering if it made any
sense in the first place but then there was no point not to add
it. Just calling apply_to_page_range in non blocking context doesn't
look so good.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/