From: Linus Torvalds on


On Thu, 8 Apr 2010, KOSAKI Motohiro wrote:
>
> Now pagefault don't insert anon_vma anymore, right? if so, SIGBUS is better.
> Now SIGBUS and VM_FAULT_OOM make different result.
>
> SIGBUS -> kill current task
> VM_FAULT_OOM -> invoke oom-killer (see pagefault_out_of_memory())

Yeah, maybe VM_FAULT_SIGBUS works ok instead of VM_FAULT_OOM. But the
cause of it is the system having been oom when themappign was created, so
I think either is fine.

> If current task can't recover proper anon_vma. we should just kill current
> instead random highest badness process. otherwise !anon_vma process continue
> to randomly invoke oom-killer.

Yes, that is a good point.

Anyway, I think it might be interesting to test my anon_vma_prepare()
locking change patch together with Rik's _first_ version of his "fix
anon_vma_prepare" thing (the one without the spinlock). They should apply
independently of each other, and maybe it all even works together.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Thu, 8 Apr 2010, Borislav Petkov wrote:
>
> There are still issues: vma_adjust() grabs mapping->i_mmap_lock for file
> mappings while we might sleep in anon_vma_prepare():

Ahh. Good catch. So I can't actually do that anon_vma_prepare() thing in
__insert_vm_struct.

It should be simple enough to just move it into the caller, just after it
releases that lock. There's only one user of that __insert_vm_struct()
anyway. You can do it yourself, or you can replace my previous patch with
this..

[ The patch below also makes it warn once and return SIGBUS for the case
where there is no anon_vma. I decided I still want to hear about it if
there might be some path that tries to insert a vma on its own ]

Linus

---
mm/memory.c | 12 +++---------
mm/mmap.c | 17 ++++-------------
2 files changed, 7 insertions(+), 22 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 833952d..08d4423 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2223,9 +2223,6 @@ reuse:
gotten:
pte_unmap_unlock(page_table, ptl);

- if (unlikely(anon_vma_prepare(vma)))
- goto oom;
-
if (is_zero_pfn(pte_pfn(orig_pte))) {
new_page = alloc_zeroed_user_highpage_movable(vma, address);
if (!new_page)
@@ -2766,8 +2763,6 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
/* Allocate our own private page. */
pte_unmap(page_table);

- if (unlikely(anon_vma_prepare(vma)))
- goto oom;
page = alloc_zeroed_user_highpage_movable(vma, address);
if (!page)
goto oom;
@@ -2863,10 +2858,6 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
if (flags & FAULT_FLAG_WRITE) {
if (!(vma->vm_flags & VM_SHARED)) {
anon = 1;
- if (unlikely(anon_vma_prepare(vma))) {
- ret = VM_FAULT_OOM;
- goto out;
- }
page = alloc_page_vma(GFP_HIGHUSER_MOVABLE,
vma, address);
if (!page) {
@@ -3115,6 +3106,9 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
pmd_t *pmd;
pte_t *pte;

+ if (WARN_ONCE(!vma->anon_vma, "Mapping with no anon_vma"))
+ return VM_FAULT_SIGBUS;
+
__set_current_state(TASK_RUNNING);

count_vm_event(PGFAULT);
diff --git a/mm/mmap.c b/mm/mmap.c
index 75557c6..82392c2 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -463,6 +463,8 @@ static void vma_link(struct mm_struct *mm, struct vm_area_struct *vma,

mm->map_count++;
validate_mm(mm);
+
+ anon_vma_prepare(vma);
}

/*
@@ -628,6 +630,8 @@ again: remove_next = 1 + (end > next->vm_end);
if (mapping)
spin_unlock(&mapping->i_mmap_lock);

+ anon_vma_prepare(vma);
+
if (remove_next) {
if (file) {
fput(file);
@@ -1674,12 +1678,6 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
if (!(vma->vm_flags & VM_GROWSUP))
return -EFAULT;

- /*
- * We must make sure the anon_vma is allocated
- * so that the anon_vma locking is not a noop.
- */
- if (unlikely(anon_vma_prepare(vma)))
- return -ENOMEM;
anon_vma_lock(vma);

/*
@@ -1720,13 +1718,6 @@ static int expand_downwards(struct vm_area_struct *vma,
{
int error;

- /*
- * We must make sure the anon_vma is allocated
- * so that the anon_vma locking is not a noop.
- */
- if (unlikely(anon_vma_prepare(vma)))
- return -ENOMEM;
-
address &= PAGE_MASK;
error = security_file_mmap(NULL, 0, 0, 0, address, 1);
if (error)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel on
On 04/08/2010 10:11 AM, Linus Torvalds wrote:
>
>
> On Thu, 8 Apr 2010, Borislav Petkov wrote:
>>
>> There are still issues: vma_adjust() grabs mapping->i_mmap_lock for file
>> mappings while we might sleep in anon_vma_prepare():
>
> Ahh. Good catch. So I can't actually do that anon_vma_prepare() thing in
> __insert_vm_struct.
>
> It should be simple enough to just move it into the caller, just after it
> releases that lock. There's only one user of that __insert_vm_struct()
> anyway. You can do it yourself, or you can replace my previous patch with
> this..
>
> [ The patch below also makes it warn once and return SIGBUS for the case
> where there is no anon_vma. I decided I still want to hear about it if
> there might be some path that tries to insert a vma on its own ]

Reviewed-by: Rik van Riel <riel(a)redhat.com>

I haven't seen any places that insert VMAs by itself.
Several strange places that allocate them, but they
all appear to use the standard functions to insert them.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Thu, 8 Apr 2010, Rik van Riel wrote:
>
> Reviewed-by: Rik van Riel <riel(a)redhat.com>

Yeah, I think I'll commit it as-is, assuming we get confirmation that it
(along with your patch) actually ends up fixing the original problem.

I had actually had lockdep etc on with that patch, but for some reason I'd
overlooked the SPINLOCK_SLEEP debugging, so I hadn't seen the stupid issue
that Borislav pointed out. I wonder if LOCKDEP or spinlock debugging hould
just select it. Small detail, but I should have caught that obvious bug
myself.

> I haven't seen any places that insert VMAs by itself.
> Several strange places that allocate them, but they
> all appear to use the standard functions to insert them.

Yeah, it's complicated enough to add a vma with all the rbtree etc stuff
that I hope nobody actually cooks their own. But I too grepped for vma
allocations, and there were more of them than I expected, so...

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Thu, 8 Apr 2010, Borislav Petkov wrote:
>
> And this happens quite often - I changed the WARN_ONCE to WARN and can't
> start kvm, iceowl (mozilla calendar) and the console-kit-daemon craps up
> upon boot too:

Hmm. I tried console-kit-daemon, which I had installed, but didn't get
anything like that. Probably some setup difference.

I also went through every user of 'vm_area_cachep', and saw nothing
suspicious at least for the mmu case (I didn't check the nommu.c code). I
must have missed something.

One thing you could do is to add some more debugging info when that "no
anon_vma" warning happens. In particular, if you still have the SLUB
debugging on, you could try to do that

page = virt_to_head_page(vma);
object_err(vm_area_cachep, page, (void *)vma, "NULL anon_vma");

and it should give you _which_ routine did the kmem_cache_alloc() for the
vma that doesn't have an anon_vma.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/