From: Linus Torvalds on


On Fri, 9 Apr 2010, Borislav Petkov wrote:
>
> Yep, looks good: its mmap_region()...

Can you double-check your current diffs - maybe something got corrupted.

mmap_region installs the vma with vma_link(), and the last thing
vma_link() does with my patch is that "anon_vma_prepare()".

Maybe with all the patches flying around, you had a reject or something,
and you lost that one anon_vma_prepare()?

Or maybe I screwed up somewhere and sent you the wrong patch. Here it is
again, just in case.

[ I have a horrible cold, and can hardly think straight. So who knows,
maybe I'm missing something. But if you have lost one of the
'anon_vma_prepare()' call sites, that would certainly explain why you
get NULL anon_vma's ]

Linus

---
mm/memory.c | 12 +++---------
mm/mmap.c | 17 ++++-------------
2 files changed, 7 insertions(+), 22 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 833952d..08d4423 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2223,9 +2223,6 @@ reuse:
gotten:
pte_unmap_unlock(page_table, ptl);

- if (unlikely(anon_vma_prepare(vma)))
- goto oom;
-
if (is_zero_pfn(pte_pfn(orig_pte))) {
new_page = alloc_zeroed_user_highpage_movable(vma, address);
if (!new_page)
@@ -2766,8 +2763,6 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
/* Allocate our own private page. */
pte_unmap(page_table);

- if (unlikely(anon_vma_prepare(vma)))
- goto oom;
page = alloc_zeroed_user_highpage_movable(vma, address);
if (!page)
goto oom;
@@ -2863,10 +2858,6 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
if (flags & FAULT_FLAG_WRITE) {
if (!(vma->vm_flags & VM_SHARED)) {
anon = 1;
- if (unlikely(anon_vma_prepare(vma))) {
- ret = VM_FAULT_OOM;
- goto out;
- }
page = alloc_page_vma(GFP_HIGHUSER_MOVABLE,
vma, address);
if (!page) {
@@ -3115,6 +3106,9 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
pmd_t *pmd;
pte_t *pte;

+ if (WARN_ONCE(!vma->anon_vma, "Mapping with no anon_vma"))
+ return VM_FAULT_SIGBUS;
+
__set_current_state(TASK_RUNNING);

count_vm_event(PGFAULT);
diff --git a/mm/mmap.c b/mm/mmap.c
index 75557c6..82392c2 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -463,6 +463,8 @@ static void vma_link(struct mm_struct *mm, struct vm_area_struct *vma,

mm->map_count++;
validate_mm(mm);
+
+ anon_vma_prepare(vma);
}

/*
@@ -628,6 +630,8 @@ again: remove_next = 1 + (end > next->vm_end);
if (mapping)
spin_unlock(&mapping->i_mmap_lock);

+ anon_vma_prepare(vma);
+
if (remove_next) {
if (file) {
fput(file);
@@ -1674,12 +1678,6 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
if (!(vma->vm_flags & VM_GROWSUP))
return -EFAULT;

- /*
- * We must make sure the anon_vma is allocated
- * so that the anon_vma locking is not a noop.
- */
- if (unlikely(anon_vma_prepare(vma)))
- return -ENOMEM;
anon_vma_lock(vma);

/*
@@ -1720,13 +1718,6 @@ static int expand_downwards(struct vm_area_struct *vma,
{
int error;

- /*
- * We must make sure the anon_vma is allocated
- * so that the anon_vma locking is not a noop.
- */
- if (unlikely(anon_vma_prepare(vma)))
- return -ENOMEM;
-
address &= PAGE_MASK;
error = security_file_mmap(NULL, 0, 0, 0, address, 1);
if (error)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Fri, 9 Apr 2010, Borislav Petkov wrote:
>
> From: Linus Torvalds <torvalds(a)linux-foundation.org>
> Date: Fri, Apr 09, 2010 at 09:35:15AM -0700
>
> > Can you try with _just_ my patch?
>
> Yep, yours along with the SLUB debugging piece just survived one
> hibernation cycle without a problem. Also, no SIGBUS-killed processes,
> all seems fine. Will continue stressing it though...
>
> Let me know what you want me to do next.

Continue stress-testing it. I don't think my patch on its own should fix
the original problem, but at least we now know why you got those NULL
anon_vma's.

So what I _think_ will happen is that you'll be able to re-create the
problem that started this all. But I'd like to verify that, just because
I'm anal and I'd like these things to be tested independently.

So assuming that the original problem happens again, if you can then apply
Rik's patch, but add a

dst->anon_vma = src->anon_vma;

to just before the success case (the "return 0") in anon_vma_clone(),
that would be good.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Fri, 9 Apr 2010, Borislav Petkov wrote:
> >
> > So what I _think_ will happen is that you'll be able to re-create the
> > problem that started this all. But I'd like to verify that, just because
> > I'm anal and I'd like these things to be tested independently.
>
> Heh, that was easy. Third hibernate cycle is a charm^Wboom :)

Ok, good to know that I'm still tracking ok on the issue.

> > So assuming that the original problem happens again, if you can then apply
> > Rik's patch, but add a
> >
> > dst->anon_vma = src->anon_vma;
> >
> > to just before the success case (the "return 0") in anon_vma_clone(),
> > that would be good.
>
> It looks like this way we mangle the anon_vma chains somehow. From
> what I can see and if I'm not mistaken, we save the anon_vmas alright
> but end up in what seems like an endless list_for_each_entry()
> loop having grabbed anon_vma->lock in page_lock_anon_vma() and we
> can't seem to yield it through page_unlock_anon_vma() at the end of
> page_referenced_anon() so it has to be that code in between iterating
> over each list entry...

Ok. So scratch Rik's patch. It doesn't work even with the anon_vma set up.

Rik? I think it's back to you. I'm not going to bother committing the
change to the anon_vma locking unless you actually need the locking
guarantees for anon_vma_prepare().

And I've got the feeling that the proper fix is in the vma_adjust()
handling if your original idea was right.

Anybody?

We're at the point where I've already delayed -rc4 several days because
it's pointless cutting it without fixing this. One option is to just say
"f*ck it, we'll revert it all and try again later". But it feels so
close..

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel on
On 04/09/2010 03:32 PM, Linus Torvalds wrote:

> Rik? I think it's back to you. I'm not going to bother committing the
> change to the anon_vma locking unless you actually need the locking
> guarantees for anon_vma_prepare().

> And I've got the feeling that the proper fix is in the vma_adjust()
> handling if your original idea was right.

We can fix it on the other side, by changing anon_vma_merge
to actually link all the anon_vma structs into the VMA.

An added benefit is that we are already holding the required
lock (mmap_sem) exclusively in that code path.

I'll cook up a patch and I'll mail it out after a little
testing.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel on
On 04/09/2010 04:43 PM, Johannes Weiner wrote:

> Okay, I think I got it working. I first thought we would need an
> m^n loop to properly merge the anon_vma_chains, but we can actually
> be cleverer than that:

I've looked it over 5 times, can't find anything wrong
with it. Your approach looks like it should work just
fine.

Certainly easier than the things Linus and I tried :)

> Signed-off-by: Johannes Weiner<hannes(a)cmpxchg.org>

Reviewed-by: Rik van Riel <riel(a)redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/