From: Andi Kleen on
On Fri, Jul 02, 2010 at 02:47:25PM +0900, Naoya Horiguchi wrote:
> diff --git v2.6.35-rc3-hwpoison/mm/migrate.c v2.6.35-rc3-hwpoison/mm/migrate.c
> index e4a381c..e7af148 100644
> --- v2.6.35-rc3-hwpoison/mm/migrate.c
> +++ v2.6.35-rc3-hwpoison/mm/migrate.c
> @@ -32,6 +32,7 @@
> #include <linux/security.h>
> #include <linux/memcontrol.h>
> #include <linux/syscalls.h>
> +#include <linux/hugetlb.h>
> #include <linux/gfp.h>
>
> #include "internal.h"
> @@ -74,6 +75,8 @@ void putback_lru_pages(struct list_head *l)
> struct page *page2;
>
> list_for_each_entry_safe(page, page2, l, lru) {
> + if (PageHuge(page))
> + break;

Why is this a break and not a continue? Couldn't you have small and large
pages in the same list?

There's more code that handles LRU in this file. Do they all handle huge pages
correctly?

I also noticed we do not always lock all sub pages in the huge page. Now if
IO happens it will lock on subpages, not the head page. But this code
handles all subpages as a unit. Could this cause locking problems?
Perhaps it would be safer to lock all sub pages always? Or would
need to audit other page users to make sure they always lock on the head
and do the same here.

Hmm page reference counts may have the same issue?

> @@ -95,6 +98,12 @@ static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
> pte_t *ptep, pte;
> spinlock_t *ptl;
>
> + if (unlikely(PageHuge(new))) {
> + ptep = huge_pte_offset(mm, addr);
> + ptl = &mm->page_table_lock;
> + goto check;
> + }
> +
> pgd = pgd_offset(mm, addr);
> if (!pgd_present(*pgd))
> goto out;
> @@ -115,6 +124,7 @@ static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
> }
>
> ptl = pte_lockptr(mm, pmd);
> +check:

I think I would prefer a proper if else over a goto here.

The lookup should probably just call a helper to make this function more readable
(like lookup_address(), unfortunately that's x86 specific right now)


> @@ -284,7 +308,17 @@ static int migrate_page_move_mapping(struct address_space *mapping,
> */
> static void migrate_page_copy(struct page *newpage, struct page *page)
> {
> - copy_highpage(newpage, page);
> + int i;
> + struct hstate *h;
> + if (!PageHuge(newpage))
> + copy_highpage(newpage, page);
> + else {
> + h = page_hstate(newpage);
> + for (i = 0; i < pages_per_huge_page(h); i++) {
> + cond_resched();
> + copy_highpage(newpage + i, page + i);

Better reuse copy_huge_page() instead of open coding.


-Andi
--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Naoya Horiguchi on
On Mon, Jul 05, 2010 at 11:59:28AM +0200, Andi Kleen wrote:
> On Fri, Jul 02, 2010 at 02:47:25PM +0900, Naoya Horiguchi wrote:
> > diff --git v2.6.35-rc3-hwpoison/mm/migrate.c v2.6.35-rc3-hwpoison/mm/migrate.c
> > index e4a381c..e7af148 100644
> > --- v2.6.35-rc3-hwpoison/mm/migrate.c
> > +++ v2.6.35-rc3-hwpoison/mm/migrate.c
> > @@ -32,6 +32,7 @@
> > #include <linux/security.h>
> > #include <linux/memcontrol.h>
> > #include <linux/syscalls.h>
> > +#include <linux/hugetlb.h>
> > #include <linux/gfp.h>
> >
> > #include "internal.h"
> > @@ -74,6 +75,8 @@ void putback_lru_pages(struct list_head *l)
> > struct page *page2;
> >
> > list_for_each_entry_safe(page, page2, l, lru) {
> > + if (PageHuge(page))
> > + break;
>
> Why is this a break and not a continue? Couldn't you have small and large
> pages in the same list?

Hmm, this chunk need to be fixed because I had too specific assumption.
The list passed to migrate_pages() has only one page or one hugepage in
page migration kicked by soft offline, but it's not the case in general case.
Since hugepage is not linked to LRU list, we had better simply skip
putback_lru_pages().

> There's more code that handles LRU in this file. Do they all handle huge pages
> correctly?
>
> I also noticed we do not always lock all sub pages in the huge page. Now if
> IO happens it will lock on subpages, not the head page. But this code
> handles all subpages as a unit. Could this cause locking problems?
> Perhaps it would be safer to lock all sub pages always? Or would
> need to audit other page users to make sure they always lock on the head
> and do the same here.
>
> Hmm page reference counts may have the same issue?

If we try to implement paging out of hugepage in the future, we need to
solve all these problems straightforwardly. But at least for now we can
skirt them by not touching LRU code for hugepage extension.

> > @@ -95,6 +98,12 @@ static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
> > pte_t *ptep, pte;
> > spinlock_t *ptl;
> >
> > + if (unlikely(PageHuge(new))) {
> > + ptep = huge_pte_offset(mm, addr);
> > + ptl = &mm->page_table_lock;
> > + goto check;
> > + }
> > +
> > pgd = pgd_offset(mm, addr);
> > if (!pgd_present(*pgd))
> > goto out;
> > @@ -115,6 +124,7 @@ static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
> > }
> >
> > ptl = pte_lockptr(mm, pmd);
> > +check:
>
> I think I would prefer a proper if else over a goto here.
>
> The lookup should probably just call a helper to make this function more readable
> (like lookup_address(), unfortunately that's x86 specific right now)

OK.
I'll move common code to helper function.

> > @@ -284,7 +308,17 @@ static int migrate_page_move_mapping(struct address_space *mapping,
> > */
> > static void migrate_page_copy(struct page *newpage, struct page *page)
> > {
> > - copy_highpage(newpage, page);
> > + int i;
> > + struct hstate *h;
> > + if (!PageHuge(newpage))
> > + copy_highpage(newpage, page);
> > + else {
> > + h = page_hstate(newpage);
> > + for (i = 0; i < pages_per_huge_page(h); i++) {
> > + cond_resched();
> > + copy_highpage(newpage + i, page + i);
>
> Better reuse copy_huge_page() instead of open coding.

Agreed.

Thanks,
Naoya Horiguchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on
On Tue, Jul 06, 2010 at 12:33:42PM +0900, Naoya Horiguchi wrote:
> > There's more code that handles LRU in this file. Do they all handle huge pages
> > correctly?
> >
> > I also noticed we do not always lock all sub pages in the huge page. Now if
> > IO happens it will lock on subpages, not the head page. But this code
> > handles all subpages as a unit. Could this cause locking problems?
> > Perhaps it would be safer to lock all sub pages always? Or would
> > need to audit other page users to make sure they always lock on the head
> > and do the same here.
> >
> > Hmm page reference counts may have the same issue?
>
> If we try to implement paging out of hugepage in the future, we need to
> solve all these problems straightforwardly. But at least for now we can
> skirt them by not touching LRU code for hugepage extension.

We need the page lock to avoid migrating pages that are currently
under IO. This can happen even without swapping when the process
manually starts IO.

-Andi
--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Lameter on
On Tue, 6 Jul 2010, Naoya Horiguchi wrote:

> Hmm, this chunk need to be fixed because I had too specific assumption.
> The list passed to migrate_pages() has only one page or one hugepage in
> page migration kicked by soft offline, but it's not the case in general case.
> Since hugepage is not linked to LRU list, we had better simply skip
> putback_lru_pages().

Maybe write a migrate_huge_page() function instead? The functionality is
materially different since we are not juggling things with the lru.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Naoya Horiguchi on
On Tue, Jul 06, 2010 at 09:13:37AM +0200, Andi Kleen wrote:
> On Tue, Jul 06, 2010 at 12:33:42PM +0900, Naoya Horiguchi wrote:
> > > There's more code that handles LRU in this file. Do they all handle huge pages
> > > correctly?
> > >
> > > I also noticed we do not always lock all sub pages in the huge page. Now if
> > > IO happens it will lock on subpages, not the head page. But this code
> > > handles all subpages as a unit. Could this cause locking problems?
> > > Perhaps it would be safer to lock all sub pages always? Or would
> > > need to audit other page users to make sure they always lock on the head
> > > and do the same here.
> > >
> > > Hmm page reference counts may have the same issue?
> >
> > If we try to implement paging out of hugepage in the future, we need to
> > solve all these problems straightforwardly. But at least for now we can
> > skirt them by not touching LRU code for hugepage extension.
>
> We need the page lock to avoid migrating pages that are currently
> under IO. This can happen even without swapping when the process
> manually starts IO.

I see. I understood we should work on locking problem in now.
I digged and learned hugepage IO can happen in direct IO from/to
hugepage or coredump of hugepage user.

We can resolve race between memory failure and IO by checking
page lock and writeback flag, right?

BTW I surveyed direct IO code, but page lock seems not to be taken.
Am I missing something?
(Before determining whether we lock all subpages or only headpage,
I want to clarify how current code for non-hugepage resolves this problem.)

Thanks,
Naoya Horiguchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/