From: Minchan Kim on
On Thu, Apr 1, 2010 at 8:57 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu(a)jp.fujitsu.com>
>
> rmap_walk_anon() is called against unmapped pages.
> Then, !page_mapped() is always true. So, the behavior will not be different from
> the last one.
>

rmap_walk_anon can be also called in case of failing try_to_unmap.
Then, In unmap_and_move, page_mapped is true and
remove_migration_ptes can be called.

But I am not sure this Mel's patch about this part is really needed.

> Thanks,
> -Kame
>
>



--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Wed, Mar 31, 2010 at 2:26 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
> On Tue, 30 Mar 2010 10:14:49 +0100
> Mel Gorman <mel(a)csn.ul.ie> wrote:
>
>> PageAnon pages that are unmapped may or may not have an anon_vma so
>> are not currently migrated. However, a swap cache page can be migrated
>> and fits this description. This patch identifies page swap caches and
>> allows them to be migrated.
>>
>
> Some comments.
>
>> Signed-off-by: Mel Gorman <mel(a)csn.ul.ie>
>> ---
>>  mm/migrate.c |   15 ++++++++++-----
>>  mm/rmap.c    |    6 ++++--
>>  2 files changed, 14 insertions(+), 7 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 35aad2a..f9bf37e 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -203,6 +203,9 @@ static int migrate_page_move_mapping(struct address_space *mapping,
>>       void **pslot;
>>
>>       if (!mapping) {
>> +             if (PageSwapCache(page))
>> +                     SetPageSwapCache(newpage);
>> +
>
> Migration of SwapCache requires radix-tree replacement, IOW,
>  mapping == NULL && PageSwapCache is BUG.
>
> So, this never happens.
>
>
>>               /* Anonymous page without mapping */
>>               if (page_count(page) != 1)
>>                       return -EAGAIN;
>> @@ -607,11 +610,13 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
>>                * the page was isolated and when we reached here while
>>                * the RCU lock was not held
>>                */
>> -             if (!page_mapped(page))
>> -                     goto rcu_unlock;
>> -
>> -             anon_vma = page_anon_vma(page);
>> -             atomic_inc(&anon_vma->external_refcount);
>> +             if (!page_mapped(page)) {
>> +                     if (!PageSwapCache(page))
>> +                             goto rcu_unlock;
>> +             } else {
>> +                     anon_vma = page_anon_vma(page);
>> +                     atomic_inc(&anon_vma->external_refcount);
>> +             }
>>       }
>>
>>       /*
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index af35b75..d5ea1f2 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1394,9 +1394,11 @@ int rmap_walk(struct page *page, int (*rmap_one)(struct page *,
>>
>>       if (unlikely(PageKsm(page)))
>>               return rmap_walk_ksm(page, rmap_one, arg);
>> -     else if (PageAnon(page))
>> +     else if (PageAnon(page)) {
>> +             if (PageSwapCache(page))
>> +                     return SWAP_AGAIN;
>>               return rmap_walk_anon(page, rmap_one, arg);
>
> SwapCache has a condition as (PageSwapCache(page) && page_mapped(page) == true.
>

In case of tmpfs, page has swapcache but not mapped.

> Please see do_swap_page(), PageSwapCache bit is cleared only when
>
> do_swap_page()...
>       swap_free(entry);
>        if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
>                try_to_free_swap(page);
>
> Then, PageSwapCache is cleared only when swap is freeable even if mapped.
>
> rmap_walk_anon() should be called and the check is not necessary.

Frankly speaking, I don't understand what is Mel's problem, why he added
Swapcache check in rmap_walk, and why do you said we don't need it.

Could you explain more detail if you don't mind?

>
> Thanks,
> -Kame
>
>



--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Thu, Apr 1, 2010 at 12:01 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
> On Thu, 1 Apr 2010 11:43:18 +0900
> Minchan Kim <minchan.kim(a)gmail.com> wrote:
>
>> On Wed, Mar 31, 2010 at 2:26 PM, KAMEZAWA Hiroyuki       /*
>> >> diff --git a/mm/rmap.c b/mm/rmap.c
>> >> index af35b75..d5ea1f2 100644
>> >> --- a/mm/rmap.c
>> >> +++ b/mm/rmap.c
>> >> @@ -1394,9 +1394,11 @@ int rmap_walk(struct page *page, int (*rmap_one)(struct page *,
>> >>
>> >>       if (unlikely(PageKsm(page)))
>> >>               return rmap_walk_ksm(page, rmap_one, arg);
>> >> -     else if (PageAnon(page))
>> >> +     else if (PageAnon(page)) {
>> >> +             if (PageSwapCache(page))
>> >> +                     return SWAP_AGAIN;
>> >>               return rmap_walk_anon(page, rmap_one, arg);
>> >
>> > SwapCache has a condition as (PageSwapCache(page) && page_mapped(page) == true.
>> >
>>
>> In case of tmpfs, page has swapcache but not mapped.
>>
>> > Please see do_swap_page(), PageSwapCache bit is cleared only when
>> >
>> > do_swap_page()...
>> >       swap_free(entry);
>> >        if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
>> >                try_to_free_swap(page);
>> >
>> > Then, PageSwapCache is cleared only when swap is freeable even if mapped.
>> >
>> > rmap_walk_anon() should be called and the check is not necessary.
>>
>> Frankly speaking, I don't understand what is Mel's problem, why he added
>> Swapcache check in rmap_walk, and why do you said we don't need it.
>>
>> Could you explain more detail if you don't mind?
>>
> I may miss something.
>
> unmap_and_move()
>  1. try_to_unmap(TTU_MIGRATION)
>  2. move_to_newpage
>  3. remove_migration_ptes
>        -> rmap_walk()
>
> Then, to map a page back we unmapped we call rmap_walk().
>
> Assume a SwapCache which is mapped, then, PageAnon(page) == true.
>
>  At 1. try_to_unmap() will rewrite pte with swp_entry of SwapCache.
>       mapcount goes to 0.
>  At 2. SwapCache is copied to a new page.
>  At 3. The new page is mapped back to the place. Now, newpage's mapcount is 0.
>       Before patch, the new page is mapped back to all ptes.
>       After patch, the new page is not mapped back because its mapcount is 0.
>
> I don't think shared SwapCache of anon is not an usual behavior, so, the logic
> before patch is more attractive.
>
> If SwapCache is not mapped before "1", we skip "1" and rmap_walk will do nothing
> because page->mapping is NULL.
>

Thanks. I agree. We don't need the check.
Then, my question is why Mel added the check in rmap_walk.
He mentioned some BUG trigger and fixed things after this patch.
What's it?
Is it really related to this logic?
I don't think so or we are missing something.

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Thu, Apr 1, 2010 at 6:30 PM, Mel Gorman <mel(a)csn.ul.ie> wrote:
> On Thu, Apr 01, 2010 at 01:44:29PM +0900, Minchan Kim wrote:
>> On Thu, Apr 1, 2010 at 12:01 PM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
>> > On Thu, 1 Apr 2010 11:43:18 +0900
>> > Minchan Kim <minchan.kim(a)gmail.com> wrote:
>> >
>> >> On Wed, Mar 31, 2010 at 2:26 PM, KAMEZAWA Hiroyuki       /*
>> >> >> diff --git a/mm/rmap.c b/mm/rmap.c
>> >> >> index af35b75..d5ea1f2 100644
>> >> >> --- a/mm/rmap.c
>> >> >> +++ b/mm/rmap.c
>> >> >> @@ -1394,9 +1394,11 @@ int rmap_walk(struct page *page, int (*rmap_one)(struct page *,
>> >> >>
>> >> >>       if (unlikely(PageKsm(page)))
>> >> >>               return rmap_walk_ksm(page, rmap_one, arg);
>> >> >> -     else if (PageAnon(page))
>> >> >> +     else if (PageAnon(page)) {
>> >> >> +             if (PageSwapCache(page))
>> >> >> +                     return SWAP_AGAIN;
>> >> >>               return rmap_walk_anon(page, rmap_one, arg);
>> >> >
>> >> > SwapCache has a condition as (PageSwapCache(page) && page_mapped(page) == true.
>> >> >
>> >>
>> >> In case of tmpfs, page has swapcache but not mapped.
>> >>
>> >> > Please see do_swap_page(), PageSwapCache bit is cleared only when
>> >> >
>> >> > do_swap_page()...
>> >> >       swap_free(entry);
>> >> >        if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
>> >> >                try_to_free_swap(page);
>> >> >
>> >> > Then, PageSwapCache is cleared only when swap is freeable even if mapped.
>> >> >
>> >> > rmap_walk_anon() should be called and the check is not necessary.
>> >>
>> >> Frankly speaking, I don't understand what is Mel's problem, why he added
>> >> Swapcache check in rmap_walk, and why do you said we don't need it.
>> >>
>> >> Could you explain more detail if you don't mind?
>> >>
>> > I may miss something.
>> >
>> > unmap_and_move()
>> >  1. try_to_unmap(TTU_MIGRATION)
>> >  2. move_to_newpage
>> >  3. remove_migration_ptes
>> >        -> rmap_walk()
>> >
>> > Then, to map a page back we unmapped we call rmap_walk().
>> >
>> > Assume a SwapCache which is mapped, then, PageAnon(page) == true.
>> >
>> >  At 1. try_to_unmap() will rewrite pte with swp_entry of SwapCache.
>> >       mapcount goes to 0.
>> >  At 2. SwapCache is copied to a new page.
>> >  At 3. The new page is mapped back to the place. Now, newpage's mapcount is 0.
>> >       Before patch, the new page is mapped back to all ptes.
>> >       After patch, the new page is not mapped back because its mapcount is 0.
>> >
>> > I don't think shared SwapCache of anon is not an usual behavior, so, the logic
>> > before patch is more attractive.
>> >
>> > If SwapCache is not mapped before "1", we skip "1" and rmap_walk will do nothing
>> > because page->mapping is NULL.
>> >
>>
>> Thanks. I agree. We don't need the check.
>> Then, my question is why Mel added the check in rmap_walk.
>> He mentioned some BUG trigger and fixed things after this patch.
>> What's it?
>
> If I remove the check for (PageSwapCache(page) && !page_mapped(page))
> in rmap_walk(), then the bug below occurs. The first one is lockdep going
> bad because it's accessing a bad lock implying that anon_vma->lock is
> already invalid. The bug that triggers after it is the list walk.

Thanks. I think it's possible. It's subtle problem.
Assume !page_mapped && PageAnon(page) && PageSwapCache

0. PageAnon check
1. race window <---- anon_vma free!!!!
2. rcu_read_lock()
3. skip_unmap
4. move_to_new_page
5. newpage->mapping = page->mapping <--- !!!! It's invalid
6. mapping->a_ops->migratepage
7. radix tree change, copy page (still new page anon is NULL)
8. remove_migrate_ptes
9. rmap_walk
10. PageAnon is true --> we are deceived.
11. rmap_walk_anon -> go bomb!

Does it make sense?
--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Thu, Apr 1, 2010 at 2:42 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
> On Thu, 1 Apr 2010 13:44:29 +0900
> Minchan Kim <minchan.kim(a)gmail.com> wrote:
>
>> On Thu, Apr 1, 2010 at 12:01 PM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
>> > On Thu, 1 Apr 2010 11:43:18 +0900
>> > Minchan Kim <minchan.kim(a)gmail.com> wrote:
>> >
>> >> On Wed, Mar 31, 2010 at 2:26 PM, KAMEZAWA Hiroyuki       /*
>> >> >> diff --git a/mm/rmap.c b/mm/rmap.c
>> >> >> index af35b75..d5ea1f2 100644
>> >> >> --- a/mm/rmap.c
>> >> >> +++ b/mm/rmap.c
>> >> >> @@ -1394,9 +1394,11 @@ int rmap_walk(struct page *page, int (*rmap_one)(struct page *,
>> >> >>
>> >> >>       if (unlikely(PageKsm(page)))
>> >> >>               return rmap_walk_ksm(page, rmap_one, arg);
>> >> >> -     else if (PageAnon(page))
>> >> >> +     else if (PageAnon(page)) {
>> >> >> +             if (PageSwapCache(page))
>> >> >> +                     return SWAP_AGAIN;
>> >> >>               return rmap_walk_anon(page, rmap_one, arg);
>> >> >
>> >> > SwapCache has a condition as (PageSwapCache(page) && page_mapped(page) == true.
>> >> >
>> >>
>> >> In case of tmpfs, page has swapcache but not mapped.
>> >>
>> >> > Please see do_swap_page(), PageSwapCache bit is cleared only when
>> >> >
>> >> > do_swap_page()...
>> >> >       swap_free(entry);
>> >> >        if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
>> >> >                try_to_free_swap(page);
>> >> >
>> >> > Then, PageSwapCache is cleared only when swap is freeable even if mapped.
>> >> >
>> >> > rmap_walk_anon() should be called and the check is not necessary.
>> >>
>> >> Frankly speaking, I don't understand what is Mel's problem, why he added
>> >> Swapcache check in rmap_walk, and why do you said we don't need it.
>> >>
>> >> Could you explain more detail if you don't mind?
>> >>
>> > I may miss something.
>> >
>> > unmap_and_move()
>> >  1. try_to_unmap(TTU_MIGRATION)
>> >  2. move_to_newpage
>> >  3. remove_migration_ptes
>> >        -> rmap_walk()
>> >
>> > Then, to map a page back we unmapped we call rmap_walk().
>> >
>> > Assume a SwapCache which is mapped, then, PageAnon(page) == true.
>> >
>> >  At 1. try_to_unmap() will rewrite pte with swp_entry of SwapCache.
>> >       mapcount goes to 0.
>> >  At 2. SwapCache is copied to a new page.
>> >  At 3. The new page is mapped back to the place. Now, newpage's mapcount is 0.
>> >       Before patch, the new page is mapped back to all ptes.
>> >       After patch, the new page is not mapped back because its mapcount is 0.
>> >
>> > I don't think shared SwapCache of anon is not an usual behavior, so, the logic
>> > before patch is more attractive.
>> >
>> > If SwapCache is not mapped before "1", we skip "1" and rmap_walk will do nothing
>> > because page->mapping is NULL.
>> >
>>
>> Thanks. I agree. We don't need the check.
>> Then, my question is why Mel added the check in rmap_walk.
>> He mentioned some BUG trigger and fixed things after this patch.
>> What's it?
>> Is it really related to this logic?
>> I don't think so or we are missing something.
>>
> Hmm. Consiering again.
>
> Now.
>        if (PageAnon(page)) {
>                rcu_locked = 1;
>                rcu_read_lock();
>                if (!page_mapped(page)) {
>                        if (!PageSwapCache(page))
>                                goto rcu_unlock;
>                } else {
>                        anon_vma = page_anon_vma(page);
>                        atomic_inc(&anon_vma->external_refcount);
>                }
>
>
> Maybe this is a fix.
>
> ==
>        skip_remap = 0;
>        if (PageAnon(page)) {
>                rcu_read_lock();
>                if (!page_mapped(page)) {
>                        if (!PageSwapCache(page))
>                                goto rcu_unlock;
>                        /*
>                         * We can't convice this anon_vma is valid or not because
>                         * !page_mapped(page). Then, we do migration(radix-tree replacement)
>                         * but don't remap it which touches anon_vma in page->mapping.
>                         */
>                        skip_remap = 1;
>                        goto skip_unmap;
>                } else {
>                        anon_vma = page_anon_vma(page);
>                        atomic_inc(&anon_vma->external_refcount);
>                }
>        }
>        .....copy page, radix-tree replacement,....
>

It's not enough.
we uses remove_migration_ptes in move_to_new_page, too.
We have to prevent it.
We can check PageSwapCache(page) in move_to_new_page and then
skip remove_migration_ptes.

ex)
static int move_to_new_page(....)
{
int swapcache = PageSwapCache(page);
...
if (!swapcache)
if(!rc)
remove_migration_ptes
else
newpage->mapping = NULL;
}

And we have to close race between PageAnon(page) and rcu_read_lock.
If we don't do it, anon_vma could be free in the middle of operation.
I means

* of migration. File cache pages are no problem because of page_lock()
* File Caches may use write_page() or lock_page() in migration, then,
* just care Anon page here.
*/
if (PageAnon(page)) {
!!! RACE !!!!
rcu_read_lock();
rcu_locked = 1;

+
+ /*
+ * If the page has no mappings any more, just bail. An
+ * unmapped anon page is likely to be freed soon but worse,


--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/