Memory corruption during hibernation since 2.6.31 [Kernel]

Prev: [PATCH 0/5] use idr kernel library instead of handles
Next: [PATCH v2] cpuidle: extend cpuidle and menu governor to handle dynamic states

From: KAMEZAWA Hiroyuki on 29 Jul 2010 23:50

On Fri, 30 Jul 2010 01:29:33 +0200
"Rafael J. Wysocki" <rjw(a)sisk.pl> wrote:

> Well, the rule of thumb (if it can be called this way) is that the contents of
> the image has to be consistent with whatever is stored in permanent storage.
> So, for example, filesystems that were mounted before creating the image
> cannot be modified until the image is restored. Consequently, if there are
> any kernel threads that might cause that to happen, they need to be frozen.
>
> Now, if I understand it correctly, the failure mode is that certain page had
> been swapped out before the image was created and then it was swapped in
> while we were writing the image out and the slot occupied by it was re-used.
> In that case the image would contain wrong information on the state of the
> swap and that would result in wrong data being loaded into memory on an attempt
> to swap that page in after resume.
>
> So, generally, we have to avoid doing things that would result in swapping
> memory pages out after we have created a hibernation image. If that can be
> achieved by freezing certain kernel threads, that probably is the simplest
> approach.
>

BTW, I can't understand what happens in following situation.

1. start saving image.
2. at saving image, swap entry is used and swap_map[off] += 1.
3. At some point, swap_map[] itself is saved.
....

At restore, IIUC, hibernation don't call any swap_free().
Where does the +1 goes ?
I'm sorry if I misunderstand.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Hugh Dickins on 30 Jul 2010 00:00

On Thu, Jul 29, 2010 at 4:29 PM, Rafael J. Wysocki <rjw(a)sisk.pl> wrote:
> On Thursday, July 29, 2010, Hugh Dickins wrote:
>>
>> Despite reading Documentation/power/freezing-of-tasks.txt, I have no
>> clear idea of what really needs freezing, and whether freezing can
>> fully handle the issues. Rafael, please can you advise?
>
> Well, the rule of thumb (if it can be called this way) is that the contents of
> the image has to be consistent with whatever is stored in permanent storage.
> So, for example, filesystems that were mounted before creating the image
> cannot be modified until the image is restored. Consequently, if there are
> any kernel threads that might cause that to happen, they need to be frozen.

Right, the filesystem part of it is easy to understand and to handle, I think.
But now we're worrying about potential for I/O to be interrupted by suspend
to RAM (or is that well handled by driver suspend methods?), and swap
getting misallocated during hibernation: what measures do we have
to prevent those?

In particular, is there or should there be some state global or test function
endangered code could use to for protection, without having to freeze?
For many threads, freezing would be easiest, but not possible for all.

>
> Now, if I understand it correctly, the failure mode is that certain page had
> been swapped out before the image was created and then it was swapped in
> while we were writing the image out and the slot occupied by it was re-used.

Not quite. At some point in the past that certain page had been swapped out
and later swapped back in: it's correctly there in the image as swapped in,
but there's some code coming into play when allocating swap to write the
image, that might free its swap and reuse it for an unrelated page of image,
leaving a danger after resume that the original owner page might get freed
then wrong data swapped back in for it later.

> In that case the image would contain wrong information on the state of the
> swap and that would result in wrong data being loaded into memory on an attempt
> to swap that page in after resume.

Yes.

>
> So, generally, we have to avoid doing things that would result in swapping
> memory pages out after we have created a hibernation image. If that can be
> achieved by freezing certain kernel threads, that probably is the simplest
> approach.

The vulnerable page isn't swapped out or in during or after creating the image,
yet it can still be vulnerable. KAMEZAWA-san's patch should fix the recent
regression here, but I believe there remains a vulnerability, from swap cleanup
code in vmscan.c which page reclaim might pass through. If there's some
"heading for hibernation" state we can test there, we can avoid it in those
cases.

I realize that snapshot.c does a lot of preparatory memory freeing e.g.
shrink_all_memory(), and that should make the chance of mis-reuse of
swap very tiny; but nonetheless your swap.c is doing memory allocations,
with the __GFP_WAIT flag, so could conceivably enter page reclaim.

We cannot freeze the hibernation, but we ought to make it swap-safe.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Hugh Dickins on 30 Jul 2010 00:10

On Thu, Jul 29, 2010 at 4:40 PM, Rafael J. Wysocki <rjw(a)sisk.pl> wrote:
> On Thursday, July 29, 2010, Andrea Arcangeli wrote:
>
> I'm afraid that would be difficult to achieve in general. Besides, there's
> no reason why kernel threads that need not be frozen should care about the
> freezing thing at all. It's much simpler to require the ones that need to be
> frozen to cooperate.

But it's looking as if any thread which might have to allocate some memory
(and ksmd is on balance a freer of memory, but nonetheless has to do
slab allocations to get there) would need to be frozen.

Except that does not go far enough, because freezing the hibernating
thread (which itself makes memory allocations) won't work out well!

>
>> But for now I'm afraid the below is needed (only ksm.c part applies to
>> upstream).
>
> Looks good to me.
>
> Can you please prepare a patch against mainline for Ondrej to try?

Andrea wasn't proposing his patch for Ondrej's corruption, and I don't
suppose Ondrej even has ksmd running. Any confusion there is my
fault, for linking Andrea's concerns about suspend and hibernation
with this particular thread on Ondrej's bug.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Hugh Dickins on 30 Jul 2010 00:20

On Thu, Jul 29, 2010 at 5:01 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu(a)jp.fujitsu.com> wrote:

> I think the best way is kexec(). But maybe rollback from hibernation failure
> will be difficult. Considering how crash-dump works well and under maintainance
> by many enterprise guys, hibernation-by-kexec is a choice. I think. It can make
> reuse of kdump code, ...or, hibernation-resume code can eat kdump image
> directly. Maybe the problem will be the speed of dump.

I've no appetite for a total rework of hibernation, and I don't see
how that would
address the issue: I'm just looking for some protection against swap
reuse danger.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: KAMEZAWA Hiroyuki on 30 Jul 2010 00:20

On Thu, 29 Jul 2010 21:10:10 -0700
Hugh Dickins <hughd(a)google.com> wrote:

> On Thu, Jul 29, 2010 at 5:01 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
>
> > I think the best way is kexec(). But maybe rollback from hibernation failure
> > will be difficult. Considering how crash-dump works well and under maintainance
> > by many enterprise guys, hibernation-by-kexec is a choice. I think. It can make
> > reuse of kdump code, ...or, hibernation-resume code can eat kdump image
> > directly. Maybe the problem will be the speed of dump.
>
> I've no appetite for a total rework of hibernation, and I don't see
> how that would
> address the issue: I'm just looking for some protection against swap
> reuse danger.
>
Okay ;) (And I forget that kexec has to prepare memory at boot time for 2nd kernel.
It will be harmful for small device guys.)

I'll prepare a routine not-quick-fix.
BTW, after reading mail between you and Andrea, kernel threads are not guaranteed
to be stopped when memory-dump is running ?

-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: [PATCH 0/5] use idr kernel library instead of handles
Next: [PATCH v2] cpuidle: extend cpuidle and menu governor to handle dynamic states