oom: avoid oom killer for lowmem allocations [Kernel]

Prev: X25: Fix x25_create errors for bad protocol and ENOBUFS
Next: Remove unused macro, VM_MIN_READAHEAD.

From: KAMEZAWA Hiroyuki on 14 Feb 2010 19:20

On Fri, 12 Feb 2010 02:06:49 -0800 (PST)
David Rientjes <rientjes(a)google.com> wrote:

> On Fri, 12 Feb 2010, KAMEZAWA Hiroyuki wrote:
>
> > From viewpoint of panic-on-oom lover, this patch seems to cause regression.
> > please do this check after sysctl_panic_on_oom == 2 test.
> > I think it's easy. So, temporary Nack to this patch itself.
> >
> >
> > And I think calling notifier is not very bad in the situation.
> > ==
> > void out_of_memory()
> > ..snip..
> > blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
> >
> >
> > So,
> >
> > if (sysctl_panic_on_oom == 2) {
> > dump_header(NULL, gfp_mask, order, NULL);
> > panic("out of memory. Compulsory panic_on_oom is selected.\n");
> > }
> >
> > if (gfp_zone(gfp_mask) < ZONE_NORMAL) /* oom-kill is useless if lowmem is exhausted. */
> > return;
> >
> > is better. I think.
> >
>
> I can't agree with that assessment, I don't think it's a desired result to
> ever panic the machine regardless of what /proc/sys/vm/panic_on_oom is set
> to because a lowmem page allocation fails especially considering, as
> mentioned in the changelog, these allocations are never __GFP_NOFAIL and
> returning NULL is acceptable.
>
please add
WARN_ON((high_zoneidx < ZONE_NORMAL) && (gfp_mask & __GFP_NOFAIL))
somewhere. Then, it seems your patch makes sense.

I don't like the "possibility" of inifinte loops.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: KOSAKI Motohiro on 15 Feb 2010 03:30

> If memory has been depleted in lowmem zones even with the protection
> afforded to it by /proc/sys/vm/lowmem_reserve_ratio, it is unlikely that
> killing current users will help. The memory is either reclaimable (or
> migratable) already, in which case we should not invoke the oom killer at
> all, or it is pinned by an application for I/O. Killing such an
> application may leave the hardware in an unspecified state and there is
> no guarantee that it will be able to make a timely exit.
>
> Lowmem allocations are now failed in oom conditions so that the task can
> perhaps recover or try again later. Killing current is an unnecessary
> result for simply making a GFP_DMA or GFP_DMA32 page allocation and no
> lowmem allocations use the now-deprecated __GFP_NOFAIL bit so retrying is
> unnecessary.
>
> Previously, the heuristic provided some protection for those tasks with
> CAP_SYS_RAWIO, but this is no longer necessary since we will not be
> killing tasks for the purposes of ISA allocations.

The main difference of Kamezawasan's patch is, his patch treated DMA
zone is filled by mlocked page too.
but I personally think such case should be solved auto page migration
mechanism. (probably, mel's memory compaction patch provide its base
infrastructure). So this patch seems enough and proper.

Reviewed-by: KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com>

>
> Signed-off-by: David Rientjes <rientjes(a)google.com>
> ---
> mm/page_alloc.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1914,6 +1914,9 @@ rebalance:
> * running out of options and have to consider going OOM
> */
> if (!did_some_progress) {
> + /* The oom killer won't necessarily free lowmem */
> + if (high_zoneidx < ZONE_NORMAL)
> + goto nopage;
> if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
> if (oom_killer_disabled)
> goto nopage;
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo(a)kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont(a)kvack.org"> email(a)kvack.org </a>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: David Rientjes on 15 Feb 2010 17:10

On Mon, 15 Feb 2010, KAMEZAWA Hiroyuki wrote:

> > I can't agree with that assessment, I don't think it's a desired result to
> > ever panic the machine regardless of what /proc/sys/vm/panic_on_oom is set
> > to because a lowmem page allocation fails especially considering, as
> > mentioned in the changelog, these allocations are never __GFP_NOFAIL and
> > returning NULL is acceptable.
> >
> please add
> WARN_ON((high_zoneidx < ZONE_NORMAL) && (gfp_mask & __GFP_NOFAIL))
> somewhere. Then, it seems your patch makes sense.
>

high_zoneidx < ZONE_NORMAL is not the only case where this exists: it
exists for __GFP_NOFAIL allocations that are not __GFP_FS as well and has
for years, no special handling is now needed.

There should be no cases of either (GFP_DMA | __GFP_NOFAIL, or
GFP_NOFS | __GFP_NOFAIL) in my audit of the kernel code. And since
__GFP_NOFAIL is not to be added anymore (see Andrew's dab48dab), there's
no real reason to add a WARN_ON() here.

> I don't like the "possibility" of inifinte loops.
>

The possibility of infinite loops has always existed in the page allocator
for __GFP_NOFAIL allocations, that's precisely why it's deprecated and
eventually we seek to remove it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: KAMEZAWA Hiroyuki on 15 Feb 2010 19:10

On Mon, 15 Feb 2010 14:20:21 -0800 (PST)
David Rientjes <rientjes(a)google.com> wrote:

> If memory has been depleted in lowmem zones even with the protection
> afforded to it by /proc/sys/vm/lowmem_reserve_ratio, it is unlikely that
> killing current users will help. The memory is either reclaimable (or
> migratable) already, in which case we should not invoke the oom killer at
> all, or it is pinned by an application for I/O. Killing such an
> application may leave the hardware in an unspecified state and there is
> no guarantee that it will be able to make a timely exit.
>
> Lowmem allocations are now failed in oom conditions so that the task can
> perhaps recover or try again later. Killing current is an unnecessary
> result for simply making a GFP_DMA or GFP_DMA32 page allocation and no
> lowmem allocations use the now-deprecated __GFP_NOFAIL bit so retrying is
> unnecessary.
>
> Previously, the heuristic provided some protection for those tasks with
> CAP_SYS_RAWIO, but this is no longer necessary since we will not be
> killing tasks for the purposes of ISA allocations.
>
> high_zoneidx is gfp_zone(gfp_flags), meaning that ZONE_NORMAL will be the
> default for all allocations that are not __GFP_DMA, __GFP_DMA32,
> __GFP_HIGHMEM, and __GFP_MOVABLE on kernels configured to support those
> flags. Testing for high_zoneidx being less than ZONE_NORMAL will only
> return true for allocations that have either __GFP_DMA or __GFP_DMA32.
>
> Acked-by: Rik van Riel <riel(a)redhat.com>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com>
> Signed-off-by: David Rientjes <rientjes(a)google.com>
> ---
> mm/page_alloc.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1914,6 +1914,9 @@ rebalance:
> * running out of options and have to consider going OOM
> */
> if (!did_some_progress) {
> + /* The oom killer won't necessarily free lowmem */
> + if (high_zoneidx < ZONE_NORMAL)
> + goto nopage;
> if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
> if (oom_killer_disabled)
> goto nopage;

WARN_ON((high_zoneidx < ZONE_NORMAL) && (gfp_mask & __GFP_NOFAIL))
plz.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: KAMEZAWA Hiroyuki on 15 Feb 2010 19:30

On Mon, 15 Feb 2010 16:10:15 -0800 (PST)
David Rientjes <rientjes(a)google.com> wrote:

> On Tue, 16 Feb 2010, KAMEZAWA Hiroyuki wrote:
>
> > > If memory has been depleted in lowmem zones even with the protection
> > > afforded to it by /proc/sys/vm/lowmem_reserve_ratio, it is unlikely that
> > > killing current users will help. The memory is either reclaimable (or
> > > migratable) already, in which case we should not invoke the oom killer at
> > > all, or it is pinned by an application for I/O. Killing such an
> > > application may leave the hardware in an unspecified state and there is
> > > no guarantee that it will be able to make a timely exit.
> > >
> > > Lowmem allocations are now failed in oom conditions so that the task can
> > > perhaps recover or try again later. Killing current is an unnecessary
> > > result for simply making a GFP_DMA or GFP_DMA32 page allocation and no
> > > lowmem allocations use the now-deprecated __GFP_NOFAIL bit so retrying is
> > > unnecessary.
> > >
> > > Previously, the heuristic provided some protection for those tasks with
> > > CAP_SYS_RAWIO, but this is no longer necessary since we will not be
> > > killing tasks for the purposes of ISA allocations.
> > >
> > > high_zoneidx is gfp_zone(gfp_flags), meaning that ZONE_NORMAL will be the
> > > default for all allocations that are not __GFP_DMA, __GFP_DMA32,
> > > __GFP_HIGHMEM, and __GFP_MOVABLE on kernels configured to support those
> > > flags. Testing for high_zoneidx being less than ZONE_NORMAL will only
> > > return true for allocations that have either __GFP_DMA or __GFP_DMA32.
> > >
> > > Acked-by: Rik van Riel <riel(a)redhat.com>
> > > Reviewed-by: KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com>
> > > Signed-off-by: David Rientjes <rientjes(a)google.com>
> > > ---
> > > mm/page_alloc.c | 3 +++
> > > 1 files changed, 3 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1914,6 +1914,9 @@ rebalance:
> > > * running out of options and have to consider going OOM
> > > */
> > > if (!did_some_progress) {
> > > + /* The oom killer won't necessarily free lowmem */
> > > + if (high_zoneidx < ZONE_NORMAL)
> > > + goto nopage;
> > > if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
> > > if (oom_killer_disabled)
> > > goto nopage;
> >
> > WARN_ON((high_zoneidx < ZONE_NORMAL) && (gfp_mask & __GFP_NOFAIL))
> > plz.
> >
>
> As I already explained when you first brought this up, the possibility of
> not invoking the oom killer is not unique to GFP_DMA, it is also possible
> for GFP_NOFS. Since __GFP_NOFAIL is deprecated and there are no current
> users of GFP_DMA | __GFP_NOFAIL, that warning is completely unnecessary.
> We're not adding any additional __GFP_NOFAIL allocations.
>

Please add documentation about that to gfp.h before doing this.
Doing this without writing any documenation is laziness.
(WARNING is a style of documentation.)

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: X25: Fix x25_create errors for bad protocol and ENOBUFS
Next: Remove unused macro, VM_MIN_READAHEAD.