From: Balbir Singh on
* Hugh Dickins <hughd(a)google.com> [2010-07-29 11:44:31]:

> On Thu, 29 Jul 2010, KAMEZAWA Hiroyuki wrote:
> > On Thu, 29 Jul 2010 14:23:33 +0900 (JST)
> > KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com> wrote:
> >
> > > Can you please add explicit commenting in the code?
> > >
> > How about this ?
> > ==
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com>
> >
> > At hibernation, all pages-should-be-saved are written into a image (here, swap).
> > Then, swap_map[], memmap etcs are also saved into disks.
> >
> > But, swap allocation happens one by one. So, the final image of swap_map[] is
> > different from saved one and the commit c9e444103b5e7a5a3519f9913f59767f92e33baf
> > changes page's state while assiging swap. Because memory can be modified in
> > hibernation is only not-to-be-save memory. it's a breakage.
> >
> > This patch fixes it by disabling swap entry reuse at hibernation.
> >
> >
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com>
> > ---
> > mm/swapfile.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > Index: linux-2.6.34.org/mm/swapfile.c
> > ===================================================================
> > --- linux-2.6.34.org.orig/mm/swapfile.c
> > +++ linux-2.6.34.org/mm/swapfile.c
> > @@ -315,8 +315,15 @@ checks:
> > if (offset > si->highest_bit)
> > scan_base = offset = si->lowest_bit;
> >
> > - /* reuse swap entry of cache-only swap if not busy. */
> > - if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) {
> > + /*
> > + * reuse swap entry of cache-only swap if not busy &&
> > + * when we're called via pageout(). At hibernation, swap-reuse
> > + * is harmful because it changes memory status...which may
> > + * be saved already.
> > + */
> > + if (vm_swap_full()
> > + && usage == SWAP_HAS_CACHE
> > + && si->swap_map[offset] == SWAP_HAS_CACHE) {
> > int swap_was_freed;
> > spin_unlock(&swap_lock);
> > swap_was_freed = __try_to_reclaim_swap(si, offset);
> >
> > --
>
> KAMEZAWA-San, that is a brilliant realization, I salute you.
>
> So, in between snapshotting the image and actually hibernating, we have
> two parallel universes, one frozen in the image, the other writing that
> out to swap: with the danger that the latter (which is about to die)
> will introduce fatal inconsistencies in the former by placing pages in
> swap locations innocently reallocated from it. (Excuse me while I go
> write the movie script.)
>
> I'd never got to think about that before.
>
> Your fix is neat though hacky (it's somewhat of a coincidence that the
> usage arg happens to distinguish the hibernation case), and should be
> enough to fix "your" regression, but is it really enough?
>
> I'm worrying about the try_to_free_swap() calls in shrink_page_list():
> can't those get called from direct reclaim (even if GFP_NOIO), and can't
> direct reclaim get invoked from somewhere in the I/O path below
> swap_writepage(), used for writing out the hibernation image?
>
> Direct reclaim because kswapd does set_freezable(), so should itself
> be out of the picture. But we cannot freeze writing the hibernation
> image, and its occasional need for memory, so maybe a different approach
> is required.
>
> I've CC'ed Andrea because we were having an offline conversation about
> whether ksmd (and his khugepaged) need to set_freezable(); and I wonder
> if this swap bug underlies his interest, though he was mainly worrying
> about I/O in progress.
>
> Despite reading Documentation/power/freezing-of-tasks.txt, I have no
> clear idea of what really needs freezing, and whether freezing can
> fully handle the issues. Rafael, please can you advise?
>

Couldn't we reuse PF_* flags to differentiate between the paths, if
that is what it eventually boils down to? On an unrelated note, I was
looking at shrink_all_memory() and wondering if swappiness really
mattered there.

--
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Hugh Dickins on
On Thu, Jul 29, 2010 at 9:18 PM, Balbir Singh <balbir(a)linux.vnet.ibm.com> wrote:
>
> Couldn't we reuse PF_* flags to differentiate between the paths, if
> that is what it eventually boils down to? On an unrelated note, I was
> looking at shrink_all_memory() and wondering if swappiness really
> mattered there.

So far as the swap-reuse issue goes, I don't see that a PF_ flag
would help: the threads that already worry about such issues do
the set_freezable()/try_to_freeze() thing, and won't get into
trouble anyway; we don't want to force every other thread to
have to do something special now, better just check global state
in the very few places its needed.

On the unrelated note: better in an unrelated thread!

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Hugh Dickins on
On Thu, Jul 29, 2010 at 9:14 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
> I'll prepare a routine not-quick-fix.

Thank you!

> BTW, after reading mail between you and Andrea, kernel threads are not guaranteed
> to be stopped when memory-dump is running ?

Depends on what you mean by memory-dump, I think.

The hibernation sequence is quite confusing to follow, but Rafael can
correct me if I'm wrong. create_image() in kernel/power/snapshot.c
does disable_nonboot_cpus() and local_irq_disable() and
sysdev_suspend() and save_processor_state() and dives into
swsusp_arch_suspend() to create the image in memory with everything
else stopped; and it's from there that the system will later resume.
But it also emerges from there in the parallel universe that goes on
to allocate swap and copy the memory image to disk - at that time,
you'll not be surprised to learn that interrupts are enabled, and I
suppose other threads may be running, in a rather futile way.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Balbir Singh on
* Hugh Dickins <hughd(a)google.com> [2010-07-29 21:32:35]:

> On Thu, Jul 29, 2010 at 9:18 PM, Balbir Singh <balbir(a)linux.vnet.ibm.com> wrote:
> >
> > Couldn't we reuse PF_* flags to differentiate between the paths, if
> > that is what it eventually boils down to? On an unrelated note, I was
> > looking at shrink_all_memory() and wondering if swappiness really
> > mattered there.
>
> So far as the swap-reuse issue goes, I don't see that a PF_ flag
> would help: the threads that already worry about such issues do
> the set_freezable()/try_to_freeze() thing, and won't get into
> trouble anyway; we don't want to force every other thread to
> have to do something special now, better just check global state
> in the very few places its needed.
>

We already do that with PF_MEMALLOC in several places. If the goal is
to avoid resuing the swap entry at all times after hibernate, the
solution needs to be a global state solution like you suggest.

> On the unrelated note: better in an unrelated thread!
>

Sure :)

--
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on
On Thu, 29 Jul 2010 21:46:10 -0700
Hugh Dickins <hughd(a)google.com> wrote:

> On Thu, Jul 29, 2010 at 9:14 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
> > I'll prepare a routine not-quick-fix.
>
> Thank you!
>

plz wait until the next week..I found I can't post today ;(

Maybe trying my quick hack patch (posted) will be enough small for
backporting etc.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/