From: Andrea Arcangeli on
Hi Mel,

did you see my proposed fix? I'm running with it applied, I'd be
interested if you can test it. Surely it will also work for new
anon-vma code in upstream, because at that point there's just 1
anon-vma and nothing else attached to the vma.

http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=commit;h=6efa1dfa5152ef8d7f26beb188d6877525a9dd03

I think it's wrong to try to handle the race in rmap walk by making
magic checks on vm_flags VM_GROWSDOWN|GROWSUP and
vma->vm_mm->map_count == 1, when we can fix it fully and simply in
exec.c by indexing two vmas in the same anon-vma with a different
vm_start so the pages will be found at all times by the rmap_walk.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli on
I'm building a mergeable THP+memory compaction tree ready for mainline
merging based on new anon-vma code, so I'm integrating your patch1 and
this below should be the port of my alternate fix to your patch2 to
fix the longstanding crash in migrate (not a bug in new anon-vma code
but longstanding). patch1 is instead about the bugs introduced by the
new anon-vma code that might crash migrate (even without memory
compaction and/or THP) the same way as the bug fixed by the below.

==
Subject: fix race between shift_arg_pages and rmap_walk

From: Andrea Arcangeli <aarcange(a)redhat.com>

migrate.c requires rmap to be able to find all ptes mapping a page at
all times, otherwise the migration entry can be instantiated, but it
can't be removed if the second rmap_walk fails to find the page.

And split_huge_page() will have the same requirements as migrate.c
already has.

Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com>
---

diff --git a/fs/exec.c b/fs/exec.c
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -55,6 +55,7 @@
#include <linux/fsnotify.h>
#include <linux/fs_struct.h>
#include <linux/pipe_fs_i.h>
+#include <linux/rmap.h>

#include <asm/uaccess.h>
#include <asm/mmu_context.h>
@@ -503,7 +504,9 @@ static int shift_arg_pages(struct vm_are
unsigned long length = old_end - old_start;
unsigned long new_start = old_start - shift;
unsigned long new_end = old_end - shift;
+ unsigned long moved_length;
struct mmu_gather *tlb;
+ struct vm_area_struct *tmp_vma;

BUG_ON(new_start > new_end);

@@ -515,17 +518,46 @@ static int shift_arg_pages(struct vm_are
return -EFAULT;

/*
+ * We need to create a fake temporary vma and index it in the
+ * anon_vma list in order to allow the pages to be reachable
+ * at all times by the rmap walk for migrate, while
+ * move_page_tables() is running.
+ */
+ tmp_vma = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
+ if (!tmp_vma)
+ return -ENOMEM;
+ *tmp_vma = *vma;
+
+ if (unlikely(anon_vma_clone(tmp_vma, vma))) {
+ kmem_cache_free(vm_area_cachep, tmp_vma);
+ return -ENOMEM;
+ }
+
+ /*
* cover the whole range: [new_start, old_end)
+ *
+ * The vma is attached only to vma->anon_vma so one lock is
+ * enough. Even this lock might be removed and we could run it
+ * out of order but it's nicer to make atomic updates to
+ * vm_start and so we won't need a smb_wmb() before calling
+ * move_page_tables.
*/
- if (vma_adjust(vma, new_start, old_end, vma->vm_pgoff, NULL))
- return -ENOMEM;
+ spin_lock(&vma->anon_vma->lock);
+ vma->vm_start = new_start;
+ spin_unlock(&vma->anon_vma->lock);

/*
* move the page tables downwards, on failure we rely on
* process cleanup to remove whatever mess we made.
*/
- if (length != move_page_tables(vma, old_start,
- vma, new_start, length))
+ moved_length = move_page_tables(vma, old_start,
+ vma, new_start, length);
+
+ /* rmap walk will already find all pages using the new_start */
+ unlink_anon_vmas(tmp_vma);
+ kmem_cache_free(vm_area_cachep, tmp_vma);
+
+ if (length != moved_length)
return -ENOMEM;

lru_add_drain();
@@ -551,7 +583,7 @@ static int shift_arg_pages(struct vm_are
/*
* Shrink the vma to just the new range. Always succeeds.
*/
- vma_adjust(vma, new_start, new_end, vma->vm_pgoff, NULL);
+ vma->vm_end = new_end;

return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel on
On 04/30/2010 03:22 PM, Andrea Arcangeli wrote:
> I'm building a mergeable THP+memory compaction tree ready for mainline
> merging based on new anon-vma code, so I'm integrating your patch1 and
> this below should be the port of my alternate fix to your patch2 to
> fix the longstanding crash in migrate (not a bug in new anon-vma code
> but longstanding). patch1 is instead about the bugs introduced by the
> new anon-vma code that might crash migrate (even without memory
> compaction and/or THP) the same way as the bug fixed by the below.
>
> ==
> Subject: fix race between shift_arg_pages and rmap_walk
>
> From: Andrea Arcangeli<aarcange(a)redhat.com>

Reviewed-by: Rik van Riel <riel(a)redhat.com>

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Arcangeli on
On Fri, Apr 30, 2010 at 09:22:35PM +0200, Andrea Arcangeli wrote:
> I'm building a mergeable THP+memory compaction tree ready for mainline
> merging based on new anon-vma code, so I'm integrating your patch1 and
> this below should be the port of my alternate fix to your patch2 to
> fix the longstanding crash in migrate (not a bug in new anon-vma code
> but longstanding). patch1 is instead about the bugs introduced by the
> new anon-vma code that might crash migrate (even without memory
> compaction and/or THP) the same way as the bug fixed by the below.

I noticed I didn't qrefresh to sync the patch to the working dir and I
had some change in working dir not picked up in the patch. A
INIT_LIST_HEAD was missing and this was the patch I actually tested.

===
Subject: fix race between shift_arg_pages and rmap_walk

From: Andrea Arcangeli <aarcange(a)redhat.com>

migrate.c requires rmap to be able to find all ptes mapping a page at
all times, otherwise the migration entry can be instantiated, but it
can't be removed if the second rmap_walk fails to find the page.

And split_huge_page() will have the same requirements as migrate.c
already has.

Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com>
---

diff --git a/fs/exec.c b/fs/exec.c
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -55,6 +55,7 @@
#include <linux/fsnotify.h>
#include <linux/fs_struct.h>
#include <linux/pipe_fs_i.h>
+#include <linux/rmap.h>

#include <asm/uaccess.h>
#include <asm/mmu_context.h>
@@ -503,7 +504,9 @@ static int shift_arg_pages(struct vm_are
unsigned long length = old_end - old_start;
unsigned long new_start = old_start - shift;
unsigned long new_end = old_end - shift;
+ unsigned long moved_length;
struct mmu_gather *tlb;
+ struct vm_area_struct *tmp_vma;

BUG_ON(new_start > new_end);

@@ -515,17 +518,46 @@ static int shift_arg_pages(struct vm_are
return -EFAULT;

/*
+ * We need to create a fake temporary vma and index it in the
+ * anon_vma list in order to allow the pages to be reachable
+ * at all times by the rmap walk for migrate, while
+ * move_page_tables() is running.
+ */
+ tmp_vma = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
+ if (!tmp_vma)
+ return -ENOMEM;
+ *tmp_vma = *vma;
+ INIT_LIST_HEAD(&tmp_vma->anon_vma_chain);
+ if (unlikely(anon_vma_clone(tmp_vma, vma))) {
+ kmem_cache_free(vm_area_cachep, tmp_vma);
+ return -ENOMEM;
+ }
+
+ /*
* cover the whole range: [new_start, old_end)
+ *
+ * The vma is attached only to vma->anon_vma so one lock is
+ * enough. Even this lock might be removed and we could run it
+ * out of order but it's nicer to make atomic updates to
+ * vm_start and so we won't need a smb_wmb() before calling
+ * move_page_tables.
*/
- if (vma_adjust(vma, new_start, old_end, vma->vm_pgoff, NULL))
- return -ENOMEM;
+ spin_lock(&vma->anon_vma->lock);
+ vma->vm_start = new_start;
+ spin_unlock(&vma->anon_vma->lock);

/*
* move the page tables downwards, on failure we rely on
* process cleanup to remove whatever mess we made.
*/
- if (length != move_page_tables(vma, old_start,
- vma, new_start, length))
+ moved_length = move_page_tables(vma, old_start,
+ vma, new_start, length);
+
+ /* rmap walk will already find all pages using the new_start */
+ unlink_anon_vmas(tmp_vma);
+ kmem_cache_free(vm_area_cachep, tmp_vma);
+
+ if (length != moved_length)
return -ENOMEM;

lru_add_drain();
@@ -551,7 +583,7 @@ static int shift_arg_pages(struct vm_are
/*
* Shrink the vma to just the new range. Always succeeds.
*/
- vma_adjust(vma, new_start, new_end, vma->vm_pgoff, NULL);
+ vma->vm_end = new_end;

return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel on
On 05/01/2010 05:39 AM, Andrea Arcangeli wrote:

> ===
> Subject: fix race between shift_arg_pages and rmap_walk
>
> From: Andrea Arcangeli<aarcange(a)redhat.com>
>
> migrate.c requires rmap to be able to find all ptes mapping a page at
> all times, otherwise the migration entry can be instantiated, but it
> can't be removed if the second rmap_walk fails to find the page.
>
> And split_huge_page() will have the same requirements as migrate.c
> already has.
>
> Signed-off-by: Andrea Arcangeli<aarcange(a)redhat.com>

Reviewed-by: Rik van Riel <riel(a)redhat.com>

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/