From: Rik van Riel on
On 04/12/2010 12:26 PM, Linus Torvalds wrote:

> But there is a _much_ more subtle case that involved swapping.
>
> So guys, here's my fairly simple theory on what happens:

That bug looks entirely possible. Given that Borislav
has heavy swapping going on, it is quite possible that
this is the bug he has been triggering.

> The patch below is my largely mindless try at fixing this. It's untested.
> I'm not entirely sure that it actually works. But it makes some amount of
> conceptual sense. No?

The patch would help avoid the bug you described.

It does have the drawback of moving all the pages of
child processes back into the anon_vma of the parent
process after swapin, even if they are privately owned
pages by the child process.

I am guessing it may need a check to see whether the
page and swap slot are exclusively owned by the current
process.

Page or swap slot shared? => oldest anon_vma
Page and swap slot exclusive? => newest anon_vma

I suspect the easiest way to achieve this would be to
pass a flag in from do_swap_page, where we already
check this, a few lines above calling page_add_anon_rmap:

if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
pte = maybe_mkwrite(pte_mkdirty(pte), vma);
flags &= ~FAULT_FLAG_WRITE;
}



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Mon, 12 Apr 2010, Borislav Petkov wrote:
>
> But I'll give the patch a run anyway in an hour or so anyway.

Thanks. I suspect you will find that even if there is no actual disk IO
swapping going on during any of the normal loads, the shrink_all_memory()
thing in your hibernation event will cause swap to happen. Or at least
swap-cache entries to be done.

Oh, and I've decided that my rcu_read_lock() patch for the tlb_gather()
thing for unmapping is bogus. Exactly because the critical issue isn't
when the page is free'd (and page->mapping is cleared), but when the page
is unmapped (and page_mapped() clears).

And that is done correctly even with the delayed frees in tlb_gather. So
addign the rcu_read_lock/rcu_read_unlock around it all doesn't actually
matter or help.

So the patches that I think fix real bugs are

- the anon_vma_prepare() fix to only share anon_vma's if they are
singletons.

- the vma_adjust() fix to copy the right anon_vma chains

- the anon_vma_clone() fix to traverse the avc's in reverse order, so
that the resulting cloned chain is the same as the original chain

You got this patch as part of the "verify_vma()" patch, but the only
part of that patch that matters is the one-liner that changes a
"for_each_list_entry" to use the "_reverse()" version..

- and that last patch to pick the right anon_vma when mapping a page
(which could still be improved: the "insert new page" case does _not_
have to take the oldest anon_vma, and Rik is correct that if we have an
exclusive swap cache entry we could also take the top one)

I think I'll re-post all four patches with real commit messages, to get
ack's for them. I'd like to finally get the much delayed -rc4 out the
door.

Oh, and if that "pick the right anon_vma" patch doesn't fix it, I suspect
we'll have to revert the whole anon_vma changes for 2.6.34. It's getting
pretty late in the -rc series to fix this bug. I'm _hoping_ that I really
nailed it this time, and that we're ok, but if Borislav reports it still
happening, and people not having any other ideas, I think I'll just have
to do an -rc4 with it all reverted, and then we can try again for 35 if
somebody figures out the bug.

Hmm? I'd hate to revert it all now because of the hours I've put in
looking at the code (to the point that I feel I understand it), but at the
same time, if it was somebody else who was chasing this bug and not being
able to fix it, I'd tell them "revert it, it's too late". Amount of effort
spent doesn't matter if the bug still happens ;^(

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


Oh, btw, I like your email gateway. Only noticed now:

mail.skyhub.de (SuperMail on ZX Spectrum 128k)

that's a tough little machine.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/