From: Minchan Kim on
On Wed, Mar 17, 2010 at 11:12 AM, KAMEZAWA Hiroyuki
> BTW, I doubt freeing anon_vma can happen even when we check mapcount.
>
> "unmap" is 2-stage operation.
>        1. unmap_vmas() => modify ptes, free pages, etc.
>        2. free_pgtables() => free pgtables, unlink vma and free it.
>
> Then, if migration is enough slow.
>
>        Migration():                            Exit():
>        check mapcount
>        rcu_read_lock
>        pte_lock
>        replace pte with migration pte
>        pte_unlock
>                                                pte_lock
>        copy page etc...                        zap pte (clear pte)
>                                                pte_unlock
>                                                free_pgtables
>                                                ->free vma
>                                                ->free anon_vma
>        pte_lock
>        remap pte with new pfn(fail)
>        pte_unlock
>
>        lock anon_vma->lock             # modification after free.
>        check list is empty

check list is empty?
Do you mean anon_vma->head?

If it is, is it possible that that list isn't empty since anon_vma is
used by others due to
SLAB_DESTROY_BY_RCU?

but such case is handled by page_check_address, vma_address, I think.

>        unlock anon_vma->lock
>        free anon_vma
>        rcu_read_unlock
>
>
> Hmm. IIUC, anon_vma is allocated as SLAB_DESTROY_BY_RCU. Then, while
> rcu_read_lock() is taken, anon_vma is anon_vma even if freed. But it
> may reused as anon_vma for someone else.
> (IOW, it may be reused but never pushed back to general purpose memory
>  until RCU grace period.)
> Then, touching anon_vma->lock never cause any corruption.
>
> Does use-after-free check for SLAB_DESTROY_BY_RCU correct behavior ?

Could you elaborate your point?

> Above case is not use-after-free. It's safe and expected sequence.
>
> Thanks,
> -Kame
>
>
>
>> > ---
>> >  mm/migrate.c |   13 +++++++++++++
>> >  1 files changed, 13 insertions(+), 0 deletions(-)
>> >
>> > diff --git a/mm/migrate.c b/mm/migrate.c
>> > index 98eaaf2..6eb1efe 100644
>> > --- a/mm/migrate.c
>> > +++ b/mm/migrate.c
>> > @@ -603,6 +603,19 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
>> >      */
>> >     if (PageAnon(page)) {
>> >             rcu_read_lock();
>> > +
>> > +           /*
>> > +            * If the page has no mappings any more, just bail. An
>> > +            * unmapped anon page is likely to be freed soon but worse,
>> > +            * it's possible its anon_vma disappeared between when
>> > +            * the page was isolated and when we reached here while
>> > +            * the RCU lock was not held
>> > +            */
>> > +           if (!page_mapcount(page)) {
>> > +                   rcu_read_unlock();
>> > +                   goto uncharge;
>> > +           }
>> > +
>> >             rcu_locked = 1;
>> >             anon_vma = page_anon_vma(page);
>> >             atomic_inc(&anon_vma->migrate_refcount);
>> >
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo(a)kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont(a)kvack.org"> email(a)kvack.org </a>
>>
>
>



--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Wed, Mar 17, 2010 at 12:15 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
> On Wed, 17 Mar 2010 12:00:15 +0900
> Minchan Kim <minchan.kim(a)gmail.com> wrote:
>
>> On Wed, Mar 17, 2010 at 11:12 AM, KAMEZAWA Hiroyuki
>> > BTW, I doubt freeing anon_vma can happen even when we check mapcount.
>> >
>> > "unmap" is 2-stage operation.
>> >        1. unmap_vmas() => modify ptes, free pages, etc.
>> >        2. free_pgtables() => free pgtables, unlink vma and free it.
>> >
>> > Then, if migration is enough slow.
>> >
>> >        Migration():                            Exit():
>> >        check mapcount
>> >        rcu_read_lock
>> >        pte_lock
>> >        replace pte with migration pte
>> >        pte_unlock
>> >                                                pte_lock
>> >        copy page etc...                        zap pte (clear pte)
>> >                                                pte_unlock
>> >                                                free_pgtables
>> >                                                ->free vma
>> >                                                ->free anon_vma
>> >        pte_lock
>> >        remap pte with new pfn(fail)
>> >        pte_unlock
>> >
>> >        lock anon_vma->lock             # modification after free.
>> >        check list is empty
>>
>> check list is empty?
>> Do you mean anon_vma->head?
>>
> yes.
>
>> If it is, is it possible that that list isn't empty since anon_vma is
>> used by others due to
>> SLAB_DESTROY_BY_RCU?
>>
> There are 4 cases.
>        A) anon_vma->list is not empty because anon_vma is not freed.
>        B) anon_vma->list is empty because it's freed.
>        C) anon_vma->list is empty but it's reused.
>        D) anon_vma->list is not empty but it's reused.

E) anon_vma is used for other object.

That's because we don't hold rcu_read_lock.
I think Mel met this E) situation.

AFAIU, even slab page of SLAB_BY_RCU can be freed after grace period.
Do I miss something?

>
>> but such case is handled by page_check_address, vma_address, I think.
>>
> yes. Then, this corrupt nothing, as I wrote. We just modify anon_vma->lock
> and it's safe because of SLAB_DESTROY_BY_RCU.
>
>
>> >        unlock anon_vma->lock
>> >        free anon_vma
>> >        rcu_read_unlock
>> >
>> >
>> > Hmm. IIUC, anon_vma is allocated as SLAB_DESTROY_BY_RCU. Then, while
>> > rcu_read_lock() is taken, anon_vma is anon_vma even if freed. But it
>> > may reused as anon_vma for someone else.
>> > (IOW, it may be reused but never pushed back to general purpose memory
>> >  until RCU grace period.)
>> > Then, touching anon_vma->lock never cause any corruption.
>> >
>> > Does use-after-free check for SLAB_DESTROY_BY_RCU correct behavior ?
>>
>> Could you elaborate your point?
>>
>
> Ah, my point is "how use-after-free is detected ?"
>
> If use-after-free is detected by free_pages() (DEBUG_PGALLOC), it seems
> strange because DESTROY_BY_RCU guarantee that never happens.
>
> So, I assume use-after-free is detected in SLAB layer. If so,
> in above B), C), D) case, it seems there is use-after free in slab's point
> of view but it works as expected, no corruption.
>
> Then, my question is
> "Does use-after-free check for SLAB_DESTROY_BY_RCU work correctly ?"
>

I am not sure Mel found that by DEBUG_PGALLOC.
But, E) case can be founded by DEBUG_PGALLOC.

> and implies we need this patch ?
> (But this will prevent unnecessary page copy etc. by easy check.)
>
> Thanks,
> -Kame
>
>
>
>



--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Fri, Mar 26, 2010 at 9:58 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
> On Fri, 26 Mar 2010 00:29:01 +0900
> Minchan Kim <minchan.kim(a)gmail.com> wrote:
>
>> Hi, Kame.
> <snip>
>
>> Which case do we have PageAnon && (page_mapcount == 0) && PageSwapCache ?
>> With looking over code which add_to_swap_cache, I found somewhere.
>>
>> 1) shrink_page_list
>> I think this case doesn't matter by isolate_lru_xxx.
>>
>> 2) shmem_swapin
>> It seems to be !PageAnon
>>
>> 3) shmem_writepage
>> It seems to be !PageAnon.
>>
>> 4) do_swap_page
>> page_add_anon_rmap increases _mapcount before setting page->mapping to anon_vma.
>> So It doesn't matter.
>
>>
>>
>> I think following codes in unmap_and_move seems to handle 3) case.
>>
>> ---
>>          * Corner case handling:
>>          * 1. When a new swap-cache page is read into, it is added to the LRU
>>          * and treated as swapcache but it has no rmap yet.
>>         ...
>>         if (!page->mapping) {
>>                 if (!PageAnon(page) && page_has_private(page)) {
>>                 ....
>>                 }
>>                 goto skip_unmap;
>>         }
>>
>> ---
>>
>> Do we really check PageSwapCache in there?
>> Do I miss any case?
>>
>
> When a page is fully unmapped, page->mapping is not cleared.
>
> from rmap.c
> ==
>  734 void page_remove_rmap(struct page *page)
>  735 {
>        ....
>  758         /*
>  759          * It would be tidy to reset the PageAnon mapping here,
>  760          * but that might overwrite a racing page_add_anon_rmap
>  761          * which increments mapcount after us but sets mapping
>  762          * before us: so leave the reset to free_hot_cold_page,
>  763          * and remember that it's only reliable while mapped.
>  764          * Leaving it set also helps swapoff to reinstate ptes
>  765          * faster for those pages still in swapcache.
>  766          */
>  767 }
> ==
>
> What happens at memory reclaim is...
>
>        the first vmscan
>        1. isolate a page from LRU.
>        2. add_to_swap_cache it.
>        3. try_to_unmap it
>        4. pageout it (PG_reclaim && PG_writeback)
>        5. move page to the tail of LRU.
>        .....<after some time>
>        6. I/O ends and PG_writeback is cleared.
>
> Here, in above cycle, the page is not freed. Still in LRU list.
>        next vmscan
>        7. isolate a page from LRU.
>        8. finds a unmapped clean SwapCache
>        9. drop it.
>
> So, to _free_ unmapped SwapCache, sequence 7-9 should happen.
> If enough memory is freed by the first itelation of vmscan before I/O end,
> next vmscan doesn't happen. Then, we have "unmmaped clean Swapcache which has
> anon_vma pointer on page->mapping" on LRU.

Thanks for open my eye. Kame. :)



--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/