Fix migration races in rmap_walk() V2 [Kernel]

Prev: netpoll: add generic support for bridge and bonding devices
Next: Hello

From: Christoph Lameter on 27 Apr 2010 18:30

On Tue, 27 Apr 2010, Mel Gorman wrote:

> The problem is that there are some races that either allow migration PTEs to
> be copied or a migration PTE to be left behind. Migration still completes and
> the page is unlocked but later a fault will call migration_entry_to_page()
> and BUG() because the page is not locked. This series aims to close some
> of these races.

In general migration ptes were designed to cause the code encountering
them to go to sleep until migration is finished and a regular pte is
available. Looks like we are tolerating the handling of migration entries.

Never imagined copying page table with this. There could be recursion
issues of various kinds because page migration requires operations on the
page tables while performing migration. A simple fix would be to *not*
migrate page table tables.

> Patch 1 alters fork() to restart page table copying when a migration PTE is
> encountered.

Can we simply wait like in the fault path?

> Patch 3 notes that while a VMA is moved under the anon_vma lock, the page
> tables are not similarly protected. Where migration PTEs are
> encountered, they are cleaned up.

This means they are copied / moved etc and "cleaned" up in a state when
the page was unlocked. Migration entries are not supposed to exist when
a page is not locked.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andrea Arcangeli on 27 Apr 2010 18:40

On Tue, Apr 27, 2010 at 05:27:36PM -0500, Christoph Lameter wrote:
> Can we simply wait like in the fault path?

There is no bug there, no need to wait either. I already audited it
before, and I didn't see any bug. Unless you can show a bug with CPU A
running the rmap_walk on process1 before process2, there is no bug to
fix there.

>
> > Patch 3 notes that while a VMA is moved under the anon_vma lock, the page
> > tables are not similarly protected. Where migration PTEs are
> > encountered, they are cleaned up.
>
> This means they are copied / moved etc and "cleaned" up in a state when
> the page was unlocked. Migration entries are not supposed to exist when
> a page is not locked.

patch 3 is real, and the first thought I had was to lock down the page
before running vma_adjust and unlock after move_page_tables. But these
are virtual addresses. Maybe there's a simpler way to keep migration
away while we run those two operations.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: KAMEZAWA Hiroyuki on 27 Apr 2010 20:20

On Wed, 28 Apr 2010 00:32:42 +0200
Andrea Arcangeli <aarcange(a)redhat.com> wrote:

> On Tue, Apr 27, 2010 at 05:27:36PM -0500, Christoph Lameter wrote:
> > Can we simply wait like in the fault path?
>
> There is no bug there, no need to wait either. I already audited it
> before, and I didn't see any bug. Unless you can show a bug with CPU A
> running the rmap_walk on process1 before process2, there is no bug to
> fix there.
>
I think there is no bug, either. But that safety is fragile.

> >
> > > Patch 3 notes that while a VMA is moved under the anon_vma lock, the page
> > > tables are not similarly protected. Where migration PTEs are
> > > encountered, they are cleaned up.
> >
> > This means they are copied / moved etc and "cleaned" up in a state when
> > the page was unlocked. Migration entries are not supposed to exist when
> > a page is not locked.
>
> patch 3 is real, and the first thought I had was to lock down the page
> before running vma_adjust and unlock after move_page_tables. But these
> are virtual addresses. Maybe there's a simpler way to keep migration
> away while we run those two operations.
>

Doing some check in move_ptes() after vma_adjust() is not safe.
IOW, when vma's information and information in page-table is incosistent...objrmap
is broken and migartion will cause panic.

Then...I think there are 2 ways.
1. use seqcounter in "mm_struct" as previous patch and lock it at mremap.
or
2. get_user_pages_fast() when do remap.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Mel Gorman on 28 Apr 2010 05:20

On Wed, Apr 28, 2010 at 12:32:42AM +0200, Andrea Arcangeli wrote:
> On Tue, Apr 27, 2010 at 05:27:36PM -0500, Christoph Lameter wrote:
> > Can we simply wait like in the fault path?
>
> There is no bug there, no need to wait either. I already audited it
> before, and I didn't see any bug. Unless you can show a bug with CPU A
> running the rmap_walk on process1 before process2, there is no bug to
> fix there.
>

Yes, this patch is now dropped.

> >
> > > Patch 3 notes that while a VMA is moved under the anon_vma lock, the page
> > > tables are not similarly protected. Where migration PTEs are
> > > encountered, they are cleaned up.
> >
> > This means they are copied / moved etc and "cleaned" up in a state when
> > the page was unlocked. Migration entries are not supposed to exist when
> > a page is not locked.
>
> patch 3 is real, and the first thought I had was to lock down the page
> before running vma_adjust and unlock after move_page_tables. But these
> are virtual addresses. Maybe there's a simpler way to keep migration
> away while we run those two operations.
>

I see there is a large discussion on that patch so I'll read that rather
than commenting here.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andrea Arcangeli on 28 Apr 2010 09:30

On Wed, Apr 28, 2010 at 09:13:45AM +0900, KAMEZAWA Hiroyuki wrote:
> Doing some check in move_ptes() after vma_adjust() is not safe.
> IOW, when vma's information and information in page-table is incosistent...objrmap
> is broken and migartion will cause panic.
>
> Then...I think there are 2 ways.
> 1. use seqcounter in "mm_struct" as previous patch and lock it at mremap.
> or
> 2. get_user_pages_fast() when do remap.

3 take the anon_vma->lock
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2 3
Prev: netpoll: add generic support for bridge and bonding devices
Next: Hello