oom: avoid oom killer for lowmem allocations [Kernel]

Prev: X25: Fix x25_create errors for bad protocol and ENOBUFS
Next: Remove unused macro, VM_MIN_READAHEAD.

From: Rik van Riel on 10 Feb 2010 23:20

On 02/10/2010 11:32 AM, David Rientjes wrote:

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1914,6 +1914,9 @@ rebalance:
> * running out of options and have to consider going OOM
> */
> if (!did_some_progress) {
> + /* The oom killer won't necessarily free lowmem */
> + if (high_zoneidx< ZONE_NORMAL)
> + goto nopage;
> if ((gfp_mask& __GFP_FS)&& !(gfp_mask& __GFP_NORETRY)) {
> if (oom_killer_disabled)
> goto nopage;

Are there architectures that only have one memory zone?

s390 or one of the other virtualized-only architectures perhaps?

--
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: David Rientjes on 11 Feb 2010 04:30

On Wed, 10 Feb 2010, Rik van Riel wrote:

> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1914,6 +1914,9 @@ rebalance:
> > * running out of options and have to consider going OOM
> > */
> > if (!did_some_progress) {
> > + /* The oom killer won't necessarily free lowmem */
> > + if (high_zoneidx< ZONE_NORMAL)
> > + goto nopage;
> > if ((gfp_mask& __GFP_FS)&& !(gfp_mask& __GFP_NORETRY)) {
> > if (oom_killer_disabled)
> > goto nopage;
>
> Are there architectures that only have one memory zone?
>

It actually ends up not to matter because of how gfp_zone() is implemented
(and you can do it with mem= on architectures with larger ZONE_DMA zones
such as ia64). ZONE_NORMAL is always guaranteed to be defined regardless
of architecture or configuration because it's the default zone for memory
allocation unless specified by a bit in GFP_ZONEMASK, it doesn't matter
whether it actually has memory or not. high_zoneidx in this case is just
gfp_zone(gfp_flags) which always defaults to ZONE_NORMAL when one of the
GFP_ZONEMASK bits is not set. Thus, the only way to for the conditional
in this patch to be true is when __GFP_DMA, or __GFP_DMA32 for x86_64, is
passed to the page allocator and CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 is
enabled, respectively.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Rik van Riel on 11 Feb 2010 09:10

On 02/11/2010 04:19 AM, David Rientjes wrote:
> On Wed, 10 Feb 2010, Rik van Riel wrote:
>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -1914,6 +1914,9 @@ rebalance:
>>> * running out of options and have to consider going OOM
>>> */
>>> if (!did_some_progress) {
>>> + /* The oom killer won't necessarily free lowmem */
>>> + if (high_zoneidx< ZONE_NORMAL)
>>> + goto nopage;
>>> if ((gfp_mask& __GFP_FS)&& !(gfp_mask& __GFP_NORETRY)) {
>>> if (oom_killer_disabled)
>>> goto nopage;
>>
>> Are there architectures that only have one memory zone?
>>
>
> It actually ends up not to matter because of how gfp_zone() is implemented
> (and you can do it with mem= on architectures with larger ZONE_DMA zones
> such as ia64). ZONE_NORMAL is always guaranteed to be defined regardless
> of architecture or configuration because it's the default zone for memory
> allocation unless specified by a bit in GFP_ZONEMASK, it doesn't matter
> whether it actually has memory or not. high_zoneidx in this case is just
> gfp_zone(gfp_flags) which always defaults to ZONE_NORMAL when one of the
> GFP_ZONEMASK bits is not set. Thus, the only way to for the conditional
> in this patch to be true is when __GFP_DMA, or __GFP_DMA32 for x86_64, is
> passed to the page allocator and CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 is
> enabled, respectively.

Fair enough.

Acked-by: Rik van Riel <riel(a)redhat.com>

--
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: KAMEZAWA Hiroyuki on 11 Feb 2010 20:40

On Wed, 10 Feb 2010 08:32:21 -0800 (PST)
David Rientjes <rientjes(a)google.com> wrote:

> If memory has been depleted in lowmem zones even with the protection
> afforded to it by /proc/sys/vm/lowmem_reserve_ratio, it is unlikely that
> killing current users will help. The memory is either reclaimable (or
> migratable) already, in which case we should not invoke the oom killer at
> all, or it is pinned by an application for I/O. Killing such an
> application may leave the hardware in an unspecified state and there is
> no guarantee that it will be able to make a timely exit.
>
> Lowmem allocations are now failed in oom conditions so that the task can
> perhaps recover or try again later. Killing current is an unnecessary
> result for simply making a GFP_DMA or GFP_DMA32 page allocation and no
> lowmem allocations use the now-deprecated __GFP_NOFAIL bit so retrying is
> unnecessary.
>
> Previously, the heuristic provided some protection for those tasks with
> CAP_SYS_RAWIO, but this is no longer necessary since we will not be
> killing tasks for the purposes of ISA allocations.
>
> Signed-off-by: David Rientjes <rientjes(a)google.com>

From viewpoint of panic-on-oom lover, this patch seems to cause regression.
please do this check after sysctl_panic_on_oom == 2 test.
I think it's easy. So, temporary Nack to this patch itself.

And I think calling notifier is not very bad in the situation.
==
void out_of_memory()
..snip..
blocking_notifier_call_chain(&oom_notify_list, 0, &freed);

So,

if (sysctl_panic_on_oom == 2) {
dump_header(NULL, gfp_mask, order, NULL);
panic("out of memory. Compulsory panic_on_oom is selected.\n");
}

if (gfp_zone(gfp_mask) < ZONE_NORMAL) /* oom-kill is useless if lowmem is exhausted. */
return;

is better. I think.

Thanks,
-Kame

> ---
> mm/page_alloc.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1914,6 +1914,9 @@ rebalance:
> * running out of options and have to consider going OOM
> */
> if (!did_some_progress) {
> + /* The oom killer won't necessarily free lowmem */
> + if (high_zoneidx < ZONE_NORMAL)
> + goto nopage;
> if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
> if (oom_killer_disabled)
> goto nopage;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: David Rientjes on 12 Feb 2010 05:10

On Fri, 12 Feb 2010, KAMEZAWA Hiroyuki wrote:

> From viewpoint of panic-on-oom lover, this patch seems to cause regression.
> please do this check after sysctl_panic_on_oom == 2 test.
> I think it's easy. So, temporary Nack to this patch itself.
>
>
> And I think calling notifier is not very bad in the situation.
> ==
> void out_of_memory()
> ..snip..
> blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
>
>
> So,
>
> if (sysctl_panic_on_oom == 2) {
> dump_header(NULL, gfp_mask, order, NULL);
> panic("out of memory. Compulsory panic_on_oom is selected.\n");
> }
>
> if (gfp_zone(gfp_mask) < ZONE_NORMAL) /* oom-kill is useless if lowmem is exhausted. */
> return;
>
> is better. I think.
>

I can't agree with that assessment, I don't think it's a desired result to
ever panic the machine regardless of what /proc/sys/vm/panic_on_oom is set
to because a lowmem page allocation fails especially considering, as
mentioned in the changelog, these allocations are never __GFP_NOFAIL and
returning NULL is acceptable.

I've always disliked panicking the machine when a cpuset or mempolicy
allocation fails and panic_on_oom is set to 2. Since both such
constraints now force an iteration of the tasklist when oom_kill_quick is
not enabled and we strictly prohibit the consideration of tasks with
disjoint cpuset mems or mempolicy nodes, I think I'll take this
opportunity to get rid of the panic_on_oom == 2 behavior and ask that
users who really do want to panic the entire machine for cpuset or
mempolicy constrained ooms to simply set all such tasks to OOM_DISABLE.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2 3 4
Prev: X25: Fix x25_create errors for bad protocol and ENOBUFS
Next: Remove unused macro, VM_MIN_READAHEAD.