From: Peter Zijlstra on
On Mon, 2010-04-12 at 21:30 +0200, Peter Zijlstra wrote:
>
> We could here return a non-valid and already freed anon_vma.
>
OK, so non of the users of page_lock_anon_vma() with exception of the
memory-failure.c one could really care. And all of them seem to be safe
enough wrt dealing with a dead one.

So unless people care, I'm going to not spend more time on trying to
make page_lock_anon_vma() behave.

Instead I'll try and see wth it is that migrate.c and rmap_walk_anon are
doing.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Borislav Petkov on
From: Linus Torvalds <torvalds(a)linux-foundation.org>
Date: Mon, Apr 12, 2010 at 03:18:20PM -0700

> Oh, btw, I like your email gateway. Only noticed now:
>
> mail.skyhub.de (SuperMail on ZX Spectrum 128k)
>
> that's a tough little machine.

Yeah, and it can handle all that mail traffic just fine :)

--
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Borislav Petkov on
From: Linus Torvalds <torvalds(a)linux-foundation.org>
Date: Mon, Apr 12, 2010 at 03:11:53PM -0700

> Ok. That does sound very positive. Of course, last time you sounded
> positive, I had an email from you half an hour later that said "oh no, it
> oopsed again". So I'll take it with a bit of salt, but on the whole I'll
> be optimistic about it.

Ok, just finished testing -rc4 - no problems so far. Let's just go out
on a limb here and say with a greater certainty that this really got
fixed but be smart about it and keep an eye open if it happens again -
you never know.

Where is the champagne?

--
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Tue, 2010-04-13 at 19:53 +0900, KOSAKI Motohiro wrote:
> > struct anon_vma *page_lock_anon_vma(struct page *page)
> > {
> > @@ -294,14 +309,24 @@ struct anon_vma *page_lock_anon_vma(struct page *page)
> > unsigned long anon_mapping;
> >
> > rcu_read_lock();
> > - anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping);
> > + anon_mapping = (unsigned long)rcu_dereference(page->mapping);
> > if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
> > goto out;
> > - if (!page_mapped(page))
> > - goto out;
> >
> > anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
> > spin_lock(&anon_vma->lock);
>
> Does anon->lock dereference is guranteed if page->_mapcount==-1?
> It can be freed miliseconds ago, rcu_read_lock() doesn't provide such
> gurantee.
>
> perhaps, I'm missing your point.

No you're right, I got my head hopelessly twisted up trying to make
page_lock_anon_vma() do something reliable, but there really isn't much
that can be done.

Luckily most users (with exception of the memory-failure.c one) don't
really care and all take steps to verify the page is indeed in any of
the vmas it might find.

So I've given up on this and will only submit a patch like the below,
which hopefully does still make sense...

I do think there's a missing barrier in there as well, but I've made
enough of a fool of myself.

[ with the preemptible mmu_gather patches I introduce a refcount to
the anon_vma, and then with atomic_inc_not_zero() we can add a
guarantee that the returned anon_vma is alive ]

---
mm/rmap.c | 18 ++++++++++++++++--
1 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index eaa7a09..49a2533 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -285,8 +285,22 @@ void __init anon_vma_init(void)
}

/*
- * Getting a lock on a stable anon_vma from a page off the LRU is
- * tricky: page_lock_anon_vma rely on RCU to guard against the races.
+ * Getting a lock on a stable anon_vma from a page off the LRU is tricky!
+ *
+ * Since there is no serialization what so ever against page_remove_rmap()
+ * the best this function can do is return a locked anon_vma that might
+ * have been relevant to this page.
+ *
+ * The page might have been remapped to a different anon_vma or the anon_vma
+ * returned may already be freed (and even reused).
+ *
+ * All users of this function must be very careful when walking the anon_vma
+ * chain and verify that the page in question is indeed mapped in it
+ * [ something equivalent to page_mapped_in_vma() ].
+ *
+ * Since anon_vma's slab is DESTROY_BY_RCU and we know from page_remove_rmap()
+ * that the anon_vma pointer from page->mapping is valid if there is a
+ * mapcount, we can dereference the anon_vma after observing those.
*/
struct anon_vma *page_lock_anon_vma(struct page *page)
{

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Tue, 2010-04-13 at 21:00 +0900, KOSAKI Motohiro wrote:
> > [ with the preemptible mmu_gather patches I introduce a refcount to
> > the anon_vma, and then with atomic_inc_not_zero() we can add a
> > guarantee that the returned anon_vma is alive ]
>
> Indeed. refcount is best way. anon_vma DESTROY_BY_RCU stuff seems
> overengineering, I think. this is fastest, but anon_vma allocation is not
> (and was not) fork/exit bottleneck point. So, I guess most simply way is
> best.

Well, that refcount stuff still relies on DESTROY_BY_RCU :-)

Anyway, it also looks like a lot of races are avoided by ordering the
rmap_add/remove calls wrt to adding/removing the page to/from the LRU.

Rmap calls come from LRU pages, and it looks like rmap state is only
changed for pages that are not on the LRU.

I still have to go through all that code again to make sure, but I
couldn't find a race between page_add_anon_rmap() and
page_lock_anon_vma() due to that.

If there is, we need to look at page_mapped() before page->mapping
because page_add_anon_rmap() first increments the mapcount and only then
adjusts the mapping, so the existing order in page_anon_lock_vma() can
end up dereferencing a long dead anon_vma.






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/