Prev: oom: make oom_unkillable_task() helper function
Next: [PATCH] fs, ext4: Fix potential memory leak in ext4_fill_super
From: Minchan Kim on 16 Jun 2010 11:40 On Wed, Jun 16, 2010 at 08:36:29PM +0900, KOSAKI Motohiro wrote: > > From: Luis Claudio R. Goncalves <lclaudio(a)uudg.org> > > In a system under heavy load it was observed that even after the > oom-killer selects a task to die, the task may take a long time to die. > > Right after sending a SIGKILL to the task selected by the oom-killer > this task has it's priority increased so that it can exit() exit soon, > freeing memory. That is accomplished by: > > /* > * We give our sacrificial lamb high priority and access to > * all the memory it needs. That way it should be able to > * exit() and clear out its resources quickly... > */ > p->rt.time_slice = HZ; > set_tsk_thread_flag(p, TIF_MEMDIE); > > It sounds plausible giving the dying task an even higher priority to be > sure it will be scheduled sooner and free the desired memory. It was > suggested on LKML using SCHED_FIFO:1, the lowest RT priority so that > this task won't interfere with any running RT task. > > If the dying task is already an RT task, leave it untouched. > Another good suggestion, implemented here, was to avoid boosting the > dying task priority in case of mem_cgroup OOM. > > Signed-off-by: Luis Claudio R. Goncalves <lclaudio(a)uudg.org> > Cc: Minchan Kim <minchan.kim(a)gmail.com> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com> > --- > mm/oom_kill.c | 38 +++++++++++++++++++++++++++++++++++--- > 1 files changed, 35 insertions(+), 3 deletions(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 7e9942d..1ecfc7a 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -82,6 +82,28 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk, > #endif /* CONFIG_NUMA */ > > /* > + * If this is a system OOM (not a memcg OOM) and the task selected to be > + * killed is not already running at high (RT) priorities, speed up the > + * recovery by boosting the dying task to the lowest FIFO priority. > + * That helps with the recovery and avoids interfering with RT tasks. > + */ > +static void boost_dying_task_prio(struct task_struct *p, > + struct mem_cgroup *mem) > +{ > + struct sched_param param = { .sched_priority = 1 }; > + > + if (mem) > + return; > + > + if (rt_task(p)) { > + p->rt.time_slice = HZ; > + return; I have a question from long time ago. If we change rt.time_slice _without_ setscheduler, is it effective? I mean scheduler pick up the task faster than other normal task? > + } > + > + sched_setscheduler_nocheck(p, SCHED_FIFO, ¶m); > +} > + > +/* -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Luis Claudio R. Goncalves on 16 Jun 2010 16:00 On Thu, Jun 17, 2010 at 12:31:20AM +0900, Minchan Kim wrote: | > /* | > * We give our sacrificial lamb high priority and access to | > * all the memory it needs. That way it should be able to | > * exit() and clear out its resources quickly... | > */ | > p->rt.time_slice = HZ; | > set_tsk_thread_flag(p, TIF_MEMDIE); .... | > + if (rt_task(p)) { | > + p->rt.time_slice = HZ; | > + return; I am not sure the code above will have any real effect for an RT task. Kosaki-san, was this change motivated by test results or was it just a code cleanup? I ask that out of curiosity. | I have a question from long time ago. | If we change rt.time_slice _without_ setscheduler, is it effective? | I mean scheduler pick up the task faster than other normal task? $ git log --pretty=oneline -Stime_slice mm/oom_kill.c 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2 This code ("time_slice = HZ;") is around for quite a while and probably comes from a time where having a big time slice was enough to be sure you would be the next on the line. I would say sched_setscheduler is indeed necessary. Regards, Luis -- [ Luis Claudio R. Goncalves Red Hat - Realtime Team ] [ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on 16 Jun 2010 22:00 > On Thu, Jun 17, 2010 at 12:31:20AM +0900, Minchan Kim wrote: > | > /* > | > * We give our sacrificial lamb high priority and access to > | > * all the memory it needs. That way it should be able to > | > * exit() and clear out its resources quickly... > | > */ > | > p->rt.time_slice = HZ; > | > set_tsk_thread_flag(p, TIF_MEMDIE); > ... > | > + if (rt_task(p)) { > | > + p->rt.time_slice = HZ; > | > + return; > > I am not sure the code above will have any real effect for an RT task. > Kosaki-san, was this change motivated by test results or was it just a code > cleanup? I ask that out of curiosity. just cleanup. ok, I remove this dubious code. > > | I have a question from long time ago. > | If we change rt.time_slice _without_ setscheduler, is it effective? > | I mean scheduler pick up the task faster than other normal task? > > $ git log --pretty=oneline -Stime_slice mm/oom_kill.c > 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2 > > This code ("time_slice = HZ;") is around for quite a while and > probably comes from a time where having a big time slice was enough to be > sure you would be the next on the line. I would say sched_setscheduler is > indeed necessary. ok -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on 16 Jun 2010 22:00 > > + struct sched_param param = { .sched_priority = 1 }; > > + > > + if (mem) > > + return; > > + > > + if (rt_task(p)) { > > + p->rt.time_slice = HZ; > > + return; > > I have a question from long time ago. > If we change rt.time_slice _without_ setscheduler, is it effective? > I mean scheduler pick up the task faster than other normal task? if p is SCHED_OTHER, no effective. if my understand is correct, that's only meaningfull if p is SCHED_RR. that's the reason why I moved this check into "if (rt_task())". but honestly I haven't observed this works effectively. so, I agree this can be removed as Luis mentioned. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on 30 Jun 2010 05:40
Sorry, I forgot to cc Luis. resend. (intentional full quote) > From: Luis Claudio R. Goncalves <lclaudio(a)uudg.org> > > In a system under heavy load it was observed that even after the > oom-killer selects a task to die, the task may take a long time to die. > > Right after sending a SIGKILL to the task selected by the oom-killer > this task has it's priority increased so that it can exit() exit soon, > freeing memory. That is accomplished by: > > /* > * We give our sacrificial lamb high priority and access to > * all the memory it needs. That way it should be able to > * exit() and clear out its resources quickly... > */ > p->rt.time_slice = HZ; > set_tsk_thread_flag(p, TIF_MEMDIE); > > It sounds plausible giving the dying task an even higher priority to be > sure it will be scheduled sooner and free the desired memory. It was > suggested on LKML using SCHED_FIFO:1, the lowest RT priority so that > this task won't interfere with any running RT task. > > If the dying task is already an RT task, leave it untouched. > Another good suggestion, implemented here, was to avoid boosting the > dying task priority in case of mem_cgroup OOM. > > Signed-off-by: Luis Claudio R. Goncalves <lclaudio(a)uudg.org> > Cc: Minchan Kim <minchan.kim(a)gmail.com> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com> > --- > mm/oom_kill.c | 34 +++++++++++++++++++++++++++++++--- > 1 files changed, 31 insertions(+), 3 deletions(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index b5678bf..0858b18 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -82,6 +82,24 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk, > #endif /* CONFIG_NUMA */ > > /* > + * If this is a system OOM (not a memcg OOM) and the task selected to be > + * killed is not already running at high (RT) priorities, speed up the > + * recovery by boosting the dying task to the lowest FIFO priority. > + * That helps with the recovery and avoids interfering with RT tasks. > + */ > +static void boost_dying_task_prio(struct task_struct *p, > + struct mem_cgroup *mem) > +{ > + struct sched_param param = { .sched_priority = 1 }; > + > + if (mem) > + return; > + > + if (!rt_task(p)) > + sched_setscheduler_nocheck(p, SCHED_FIFO, ¶m); > +} > + > +/* > * The process p may have detached its own ->mm while exiting or through > * use_mm(), but one or more of its subthreads may still have a valid > * pointer. Return p, or any of its subthreads with a valid ->mm, with > @@ -421,7 +439,7 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order, > } > > #define K(x) ((x) << (PAGE_SHIFT-10)) > -static int oom_kill_task(struct task_struct *p) > +static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem) > { > p = find_lock_task_mm(p); > if (!p) { > @@ -434,9 +452,17 @@ static int oom_kill_task(struct task_struct *p) > K(get_mm_counter(p->mm, MM_FILEPAGES))); > task_unlock(p); > > - p->rt.time_slice = HZ; > + > set_tsk_thread_flag(p, TIF_MEMDIE); > force_sig(SIGKILL, p); > + > + /* > + * We give our sacrificial lamb high priority and access to > + * all the memory it needs. That way it should be able to > + * exit() and clear out its resources quickly... > + */ > + boost_dying_task_prio(p, mem); > + > return 0; > } > #undef K > @@ -460,6 +486,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, > */ > if (p->flags & PF_EXITING) { > set_tsk_thread_flag(p, TIF_MEMDIE); > + boost_dying_task_prio(p, mem); > return 0; > } > > @@ -489,7 +516,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, > } > } while_each_thread(p, t); > > - return oom_kill_task(victim); > + return oom_kill_task(victim, mem); > } > > /* > @@ -670,6 +697,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, > */ > if (fatal_signal_pending(current)) { > set_thread_flag(TIF_MEMDIE); > + boost_dying_task_prio(current, NULL); > return; > } > > -- > 1.6.5.2 > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |