From: Minchan Kim on
Hi, Mel.
On Sat, Mar 13, 2010 at 1:41 AM, Mel Gorman <mel(a)csn.ul.ie> wrote:
> rmap_walk_anon() was triggering errors in memory compaction that looks like
> use-after-free errors in anon_vma. The problem appears to be that between
> the page being isolated from the LRU and rcu_read_lock() being taken, the
> mapcount of the page dropped to 0 and the anon_vma was freed. This patch
> skips the migration of anon pages that are not mapped by anyone.
>
> Signed-off-by: Mel Gorman <mel(a)csn.ul.ie>
> Acked-by: Rik van Riel <riel(a)redhat.com>
> ---
>  mm/migrate.c |   10 ++++++++++
>  1 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 98eaaf2..3c491e3 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -602,6 +602,16 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
>         * just care Anon page here.
>         */
>        if (PageAnon(page)) {
> +               /*
> +                * If the page has no mappings any more, just bail. An
> +                * unmapped anon page is likely to be freed soon but worse,
> +                * it's possible its anon_vma disappeared between when
> +                * the page was isolated and when we reached here while
> +                * the RCU lock was not held
> +                */
> +               if (!page_mapcount(page))

As looking code about mapcount of page, I got confused.
I think mapcount of page is protected by pte lock.
But I can't find pte lock in unmap_and_move.
If I am right, what protects race between this condition check and
rcu_read_lock?
This patch makes race window very small but It can't remove race totally.

I think I am missing something.
Pz, point me out. :)


> +                       goto uncharge;
> +
>                rcu_read_lock();
>                rcu_locked = 1;
>                anon_vma = page_anon_vma(page);
> --
> 1.6.5
>




--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Mon, Mar 15, 2010 at 2:34 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
> On Mon, 15 Mar 2010 09:28:08 +0900
> Minchan Kim <minchan.kim(a)gmail.com> wrote:
>
>> Hi, Mel.
>> On Sat, Mar 13, 2010 at 1:41 AM, Mel Gorman <mel(a)csn.ul.ie> wrote:
>> > rmap_walk_anon() was triggering errors in memory compaction that looks like
>> > use-after-free errors in anon_vma. The problem appears to be that between
>> > the page being isolated from the LRU and rcu_read_lock() being taken, the
>> > mapcount of the page dropped to 0 and the anon_vma was freed. This patch
>> > skips the migration of anon pages that are not mapped by anyone.
>> >
>> > Signed-off-by: Mel Gorman <mel(a)csn.ul.ie>
>> > Acked-by: Rik van Riel <riel(a)redhat.com>
>> > ---
>> >  mm/migrate.c |   10 ++++++++++
>> >  1 files changed, 10 insertions(+), 0 deletions(-)
>> >
>> > diff --git a/mm/migrate.c b/mm/migrate.c
>> > index 98eaaf2..3c491e3 100644
>> > --- a/mm/migrate.c
>> > +++ b/mm/migrate.c
>> > @@ -602,6 +602,16 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
>> >         * just care Anon page here.
>> >         */
>> >        if (PageAnon(page)) {
>> > +               /*
>> > +                * If the page has no mappings any more, just bail. An
>> > +                * unmapped anon page is likely to be freed soon but worse,
>> > +                * it's possible its anon_vma disappeared between when
>> > +                * the page was isolated and when we reached here while
>> > +                * the RCU lock was not held
>> > +                */
>> > +               if (!page_mapcount(page))
>>
>> As looking code about mapcount of page, I got confused.
>> I think mapcount of page is protected by pte lock.
>> But I can't find pte lock in unmap_and_move.
> There is no pte_lock.
>
>> If I am right, what protects race between this condition check and
>> rcu_read_lock?
>> This patch makes race window very small but It can't remove race totally.
>>
>> I think I am missing something.
>> Pz, point me out. :)
>>
>
> Hmm. This is my understanding of old story.
>
> At migration.
>  1. we increase page_count().
>  2. isolate it from LRU.
>  3. call try_to_unmap() under rcu_read_lock(). Then,
>  4. replace pte with swp_entry_t made by PFN. under pte_lock.
>  5. do migarate
>  6. remap new pages. under pte_lock()>
>  7. release rcu_read_lock().
>
> Here, we don't care whether page->mapping holds valid anon_vma or not.
>
> Assume a racy threads which calls zap_pte_range() (or some other)
>
> a) When the thread finds valid pte under pte_lock and successfully call
>   page_remove_rmap().
>   In this case, migration thread finds try_to_unmap doesn't unmap any pte.
>   Then, at 6, remap pte will not work.
> b) When the thread finds migrateion PTE(as swap entry) in zap_page_range().
>   In this case, migration doesn't find migrateion PTE and remap fails.
>
> Why rcu_read_lock() is necessary..
>  - When page_mapcount() goes to 0, we shouldn't trust page->mapping is valid.
>  - Possible cases are
>        i) anon_vma (= page->mapping) is freed and used for other object.
>        ii) anon_vma (= page->mapping) is freed
>        iii) anon_vma (= page->mapping) is freed and used as anon_vma again.
>
> Here, anon_vma_cachep is created  by SLAB_DESTROY_BY_RCU. Then, possible cases
> are only ii) and iii). While anon_vma is anon_vma, try_to_unmap and remap_page
> can work well because of the list of vmas and address check. IOW, remap routine
> just do nothing if anon_vma is freed.
>
> I'm not sure by what logic "use-after-free anon_vma" is caught. But yes,
> there will be case, "anon_vma is touched after freed.", I think.
>
> Thanks,
> -Kame
>

Thanks for detail explanation, Kame.
But it can't understand me enough, Sorry.

Mel said he met "use-after-free errors in anon_vma".
So added the check in unmap_and_move.

if (PageAnon(page)) {
....
if (!page_mapcount(page))
goto uncharge;
rcu_read_lock();

My concern what protects racy mapcount of the page?
For example,

CPU A CPU B
unmap_and_move
page_mapcount check pass zap_pte_range
<-- some stall --> pte_lock
<-- some stall --> page_remove_rmap(map_count is zero!)
<-- some stall --> pte_unlock
<-- some stall --> anon_vma_unlink
<-- some stall --> anon_vma free !!!!
rcu_read_lock
anon_vma has gone!!

I think above scenario make error "use-after-free", again.
What prevent above scenario?


--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Mon, Mar 15, 2010 at 3:44 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu(a)jp.fujitsu.com>
>> Thanks for detail explanation, Kame.
>> But it can't understand me enough, Sorry.
>>
>> Mel said he met "use-after-free errors in anon_vma".
>> So added the check in unmap_and_move.
>>
>> if (PageAnon(page)) {
>>  ....
>>  if (!page_mapcount(page))
>>    goto uncharge;
>>  rcu_read_lock();
>>
>> My concern what protects racy mapcount of the page?
>> For example,
>>
>> CPU A                                 CPU B
>> unmap_and_move
>> page_mapcount check pass    zap_pte_range
>> <-- some stall -->                   pte_lock
>> <-- some stall -->                   page_remove_rmap(map_count is zero!)
>> <-- some stall -->                   pte_unlock
>> <-- some stall -->                   anon_vma_unlink
>> <-- some stall -->                   anon_vma free !!!!
>> rcu_read_lock
>> anon_vma has gone!!
>>
>> I think above scenario make error "use-after-free", again.
>> What prevent above scenario?
>>
> I think this patch is not complete.
> I guess this patch in [1/11] is trigger for the race.
> ==
> +
> +       /* Drop an anon_vma reference if we took one */
> +       if (anon_vma && atomic_dec_and_lock(&anon_vma->migrate_refcount, &anon_vma->lock)) {
> +               int empty = list_empty(&anon_vma->head);
> +               spin_unlock(&anon_vma->lock);
> +               if (empty)
> +                       anon_vma_free(anon_vma);
> +       }
> ==
> If my understainding in above is correct, this "modify" freed anon_vma.
> Then, use-after-free happens. (In old implementation, there are no refcnt,
> so, there is no use-after-free ops.)
>

I agree.
Let's wait Mel's response.

>
> So, what I can think of now is a patch like following is necessary.
>
> ==
> static inline struct anon_vma *anon_vma_alloc(void)
> {
>        struct anon_vma *anon_vma;
>        anon_vma = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
>        atomic_set(&anon_vma->refcnt, 1);
> }
>
> void anon_vma_free(struct anon_vma *anon_vma)
> {
>        /*
>         * This called when anon_vma is..
>         * - anon_vma->vma_list becomes empty.
>         * - incremetned refcnt while migration, ksm etc.. is dropped.
>         * - allocated but unused.
>         */
>        if (atomic_dec_and_test(&anon_vma->refcnt))
>                kmem_cache_free(anon_vma_cachep, anon_vma);
> }
> ==
> Then all things will go simple.
> Overhead is concern but list_empty() helps us much.

When they made things complicated without atomic_op,
there was reasonable reason, I think. :)

My opinion depends on you and server guys(Hugh, Rik, Andrea Arcangeli and so on)


>
> Thanks,
> -Kame
>
>
>
>
>



--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Mon, Mar 15, 2010 at 4:09 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
> On Mon, 15 Mar 2010 15:44:59 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
>
>> On Mon, 15 Mar 2010 15:28:15 +0900
>> Minchan Kim <minchan.kim(a)gmail.com> wrote:
>>
>> > On Mon, Mar 15, 2010 at 2:34 PM, KAMEZAWA Hiroyuki
>> > <kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
>> > > On Mon, 15 Mar 2010 09:28:08 +0900
>> > > Minchan Kim <minchan.kim(a)gmail.com> wrote:
>
>> > I think above scenario make error "use-after-free", again.
>> > What prevent above scenario?
>> >
>> I think this patch is not complete.
>> I guess this patch in [1/11] is trigger for the race.
>> ==
>> +
>> +     /* Drop an anon_vma reference if we took one */
>> +     if (anon_vma && atomic_dec_and_lock(&anon_vma->migrate_refcount, &anon_vma->lock)) {
>> +             int empty = list_empty(&anon_vma->head);
>> +             spin_unlock(&anon_vma->lock);
>> +             if (empty)
>> +                     anon_vma_free(anon_vma);
>> +     }
>> ==
>> If my understainding in above is correct, this "modify" freed anon_vma.
>> Then, use-after-free happens. (In old implementation, there are no refcnt,
>> so, there is no use-after-free ops.)
>>
> Sorry, about above, my understanding was wrong. anon_vma->lock is modifed even
> in old code. Sorry for noise.

Nope. Such your kindness always helps and cheer up others people.
In addition, give others good time to consider seriously something.

Thanks, Kame.


--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Mon, Mar 15, 2010 at 11:21 PM, Mel Gorman <mel(a)csn.ul.ie> wrote:
> On Mon, Mar 15, 2010 at 09:48:49PM +0900, Minchan Kim wrote:
>> On Mon, 2010-03-15 at 11:28 +0000, Mel Gorman wrote:
>> > The use after free looks like
>> >
>> > 1. page_mapcount(page) was zero so anon_vma was no longer reliable
>> > 2. rcu lock taken but the anon_vma at this point can already be garbage because the
>> >    process exited
>> > 3. call try_to_unmap, looks up tha anon_vma and locks it. This causes problems
>> >
>> > I thought the race would be closed but there is still a very tiny window there all
>> > right. The following alternative should close it. What do you think?
>> >
>> >         if (PageAnon(page)) {
>> >             rcu_read_lock();
>> >
>> >                 /*
>> >                  * If the page has no mappings any more, just bail. An
>> >                  * unmapped anon page is likely to be freed soon but worse,
>> >                  * it's possible its anon_vma disappeared between when
>> >                  * the page was isolated and when we reached here while
>> >                  * the RCU lock was not held
>> >                  */
>> >                 if (!page_mapcount(page)) {
>> >                     rcu_read_unlock();
>> >                         goto uncharge;
>> >             }
>> >
>> >                 rcu_locked = 1;
>> >                 anon_vma = page_anon_vma(page);
>> >                 atomic_inc(&anon_vma->external_refcount);
>> >         }
>> >
>> > The rcu_unlock label is not used here because the reference counts were not taken in
>> > the case where page_mapcount == 0.
>> >
>>
>> Please, repost above code with your use-after-free scenario comment.
>>
>
> This will be the replacement patch so.
>
> ==== CUT HERE ====
> mm,migration: Do not try to migrate unmapped anonymous pages
>
> rmap_walk_anon() was triggering errors in memory compaction that look like
> use-after-free errors. The problem is that between the page being isolated
> from the LRU and rcu_read_lock() being taken, the mapcount of the page
> dropped to 0 and the anon_vma gets freed. This can happen during memory
> compaction if pages being migrated belong to a process that exits before
> migration completes. Hence, the use-after-free race looks like
>
>  1. Page isolated for migration
>  2. Process exits
>  3. page_mapcount(page) drops to zero so anon_vma was no longer reliable
>  4. unmap_and_move() takes the rcu_lock but the anon_vma is already garbage
>  4. call try_to_unmap, looks up tha anon_vma and "locks" it but the lock
>    is garbage.
>
> This patch checks the mapcount after the rcu lock is taken. If the
> mapcount is zero, the anon_vma is assumed to be freed and no further
> action is taken.
>
> Signed-off-by: Mel Gorman <mel(a)csn.ul.ie>
> Acked-by: Rik van Riel <riel(a)redhat.com>
Reviewed-by: Minchan Kim <minchan.kim(a)gmail.com>


--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/