From: Rik van Riel on
On 04/10/2010 04:05 PM, Linus Torvalds wrote:

> This patch is scary and untested, but the more I look at that code, the
> more convinced I am that vma_adjust was _really_ badly screwed up. The
> patch below may make things worse. I'll test it myself too, but I'm
> sending it out first, since I was writing the email as I was looking at
> the piece of cr*p.

Your patch looks correct. Gotta love how before,
"vma" could be either exporter or importer!

I'm guessing that it did not break before my
changes, because of plain old luck...

Acked-by: Rik van Riel <riel(a)redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Sat, 10 Apr 2010, Rik van Riel wrote:

> On 04/10/2010 04:05 PM, Linus Torvalds wrote:
>
> > And vma_adjust is the one place that does that anon_vma_merge(), which is
> > apart from the actual unmapping sequence the only other place that
> > actually free's anon_vmas. So there are reasons to be very suspicious of
> > that code.
>
> It frees anon_vma_chain structures, but not actual anon_vmas.

Rik, I think you're ignoring the fact that the anon_vma_chain is also the
implicit refcount.

So when you don't create the chains, you implicitly end up freeing the
anon_vma too early. In fact, it might well happen at that
'anon_vma_merge()': when it does the unlink_anon_vmas(), it may be
unlinking the last remaining anon_vma ref, and then anon_vma_unlink
_will_ in fact free the anon_vma.

Even though we have a 'vma->anon_vma' pointer that points to it - because
the chains weren't set up correctly.

> Walking the anon_vma (from rmap) requires the anon_vma->lock,
> which is taken in anon_vma_merge whenever a chain is unlinked.

None of that matters. If the dang thing got free'd, the lock isn't
reliable any more.

> A few lines up from that code, we have:
>
> if (vma->anon_vma && (insert || importer || start != vma->vm_start))
> anon_vma = vma->anon_vma;
>
> So anon_vma should always be vma->anon_vma.

No. vma->anon_vma is NULL, so the above lines are total no-ops. We're
trying to _fill_ it. But we're doing it wrong.

So we end up with:

anon_vma = next->anon-vma
importer = vma

and we do:

if (anon_vma_clone(importer, vma)) {
return -ENOMEM;
}
importer->anon_vma = anon_vma;

do you see?

The "anon_vma_clone(importer, vma)" does NOTHING, because it is cloning
from the wrong source (from 'vma', rather than from 'next', so it leaves
the vma chains empty.

And then, despite having empty chains, we do that

importer->anon_vma = anon_vma;

which sets the anon_vma to the (non-NULL) next->anon_vma.

And then, a bit later, we'll do

anon_vma_merge(vma, next);

which will happily notice that the anon_vma's of both vma and next match
(because we just _set_ them to match), and then frees the ONLY REMAINING
CHAIN - the one in next. The one we DID NOT CORRECTLY COPY, because we got
our sources completely screwed up.

> What am I overlooking?

Can you see it now?

> If we import a chain, from vma to importer, importer->anon_vma
> will be equal to vma->anon_vma.

The thing you seem to miss is that we aren't supposed to import the chain
from 'vma' AT ALL. The anon_vma came from _next_, not from 'vma'!

> I do not see how 'importer' could get a state different from 'vma'.

Stop worrying about 'vma'. Start worrying about 'next'.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Sat, 10 Apr 2010, Borislav Petkov wrote:
> From: Linus Torvalds <torvalds(a)linux-foundation.org>
> Date: Sat, Apr 10, 2010 at 01:12:46PM -0700
>
> > So I'm actually pretty optimistic that this really is it.
>
> Ok, let me verify what/in which order should be tested before I test
> something wrongly. The RCU-safe fix for the TLB flush can stay for
> correctness reasons, this last patch, obviosly, what happens with the
> find_mergeable_anon_vma() changes to use only singleton lists for
> merging? Should I keep those too?

Yes. So the patches I actually think are important are:

- the RCU fix is real, although admittedly the race window is probably
too small to ever really hit.

- the simplification rule to find_mergeable_anon_vma's is required,
because otherwise our anon_vma_merge() will do the wrong thing (maybe
Johannes' patch would be an alternative, but quite frankly, I think we
want the simpler code, and I don't think we even _want_ to share
anon_vma's that are complex due to forking)

I like my "cleanup" version (the bigger one with lots of comments) more
than the two-liner version, but they should be equivalent.

- the vma_adjust() fix is the one that I think may actually end up fixing
your problems for good. Knock wood.

So I think they are all required, but I suspect that the vma_adjust() one
is finally the most direct explanation of the problem you've seen.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel on
On 04/10/2010 04:34 PM, Linus Torvalds wrote:

>> What am I overlooking?
>
> Can you see it now?

Yeah, after reading through your patch it became obvious.
It's the code above this code that sets up the problem.

It's a small miracle it worked before...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Sat, 10 Apr 2010, Borislav Petkov wrote:
>
> Damn, nope, still no joy :(. It looked like it was fixed but one of the
> test was to hibernate right after the 3 kvm guests were shut down and I
> guess the mem freeing pattern kinda hits it where it most hurts.

Damn, I really hoped that was it. Three independent bugs found and fixed,
and still no joy? Oh well.

> By the way, do we want to create a new thread - the mailchain is off the
> screen limits of my netbook :)

I prefer to keep it in one thread so that they all show up together if I
need to, but feel free to start a new one. Not a biggie.

> [ 647.492781] BUG: unable to handle kernel NULL pointer dereference at (null)
> [ 647.493001] IP: [<ffffffff810c60a0>] page_referenced+0xee/0x1dc

Well, it sure is consistent. I'll start to think about what else could go
wrong..

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/