From: Minchan Kim on
On Fri, May 28, 2010 at 01:48:26PM -0300, Luis Claudio R. Goncalves wrote:
> On Sat, May 29, 2010 at 12:45:49AM +0900, Minchan Kim wrote:
> | On Fri, May 28, 2010 at 12:28:42PM -0300, Luis Claudio R. Goncalves wrote:
> | > On Sat, May 29, 2010 at 12:12:49AM +0900, Minchan Kim wrote:
> ...
> | > | I think highest RT proirity ins't good solution.
> | > | As I mentiond, Some RT functions don't want to be preempted by other processes
> | > | which cause memory pressure. It makes RT task broken.
> | >
> | > For the RT case, if you reached a system OOM situation, your determinism has
> | > already been hurt. If the memcg OOM happens on the same memcg your RT task
> | > is - what will probably be the case most of time - again, the determinism
> | > has deteriorated. For both these cases, giving the dying task SCHED_FIFO
> | > MAX_RT_PRIO-1 means a faster recovery.
> |
> | What I want to say is that determinisic has no relation with OOM.
> | Why is some RT task affected by other process's OOM?
> |
> | Of course, if system has no memory, it is likely to slow down RT task.
> | But it's just only thought. If some task scheduled just is exit, we don't need
> | to raise OOMed task's priority.
> |
> | But raising min rt priority on your patch was what I want.
> | It doesn't preempt any RT task.
> |
> | So until now, I have made noise about your patch.
> | Really, sorry for that.
> | I don't have any objection on raising priority part from now on.
>
> This is the third version of the patch, factoring in your input along with
> Peter's comment. Basically the same patch, but using the lowest RT priority
> to boost the dying task.
>
> Thanks again for reviewing and commenting.
> Luis
>
> oom-killer: give the dying task rt priority (v3)
>
> Give the dying task RT priority so that it can be scheduled quickly and die,
> freeing needed memory.
>
> Signed-off-by: Luis Claudio R. Gon�alves <lgoncalv(a)redhat.com>
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 84bbba2..2b0204f 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
> */
> static void __oom_kill_task(struct task_struct *p, int verbose)
> {
> + struct sched_param param;
> +
> if (is_global_init(p)) {
> WARN_ON(1);
> printk(KERN_WARNING "tried to kill init!\n");
> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
> * exit() and clear out its resources quickly...
> */
> p->time_slice = HZ;
> + param.sched_priority = MAX_RT_PRIO-10;

I can't understand your point, still.
Why you put the priority as "MAX_RT_PRIO - 10"?
What I and peter mentioned was "1" which is lowest RT priority.

> + sched_setscheduler(p, SCHED_FIFO, &param);

Why do you change sched_setscheduler_nocheck with sched_set_scheduler?
It means you can't boost prioity if current context doesn't have permission.
Is it a your intention?

> set_tsk_thread_flag(p, TIF_MEMDIE);
>
> force_sig(SIGKILL, p);
> --
> [ Luis Claudio R. Goncalves Bass - Gospel - RT ]
> [ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]
>
--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on
On Fri, 28 May 2010 13:48:26 -0300
"Luis Claudio R. Goncalves" <lclaudio(a)uudg.org> wrote:

> On Sat, May 29, 2010 at 12:45:49AM +0900, Minchan Kim wrote:
> | On Fri, May 28, 2010 at 12:28:42PM -0300, Luis Claudio R. Goncalves wrote:
> | > On Sat, May 29, 2010 at 12:12:49AM +0900, Minchan Kim wrote:
> ...
> | > | I think highest RT proirity ins't good solution.
> | > | As I mentiond, Some RT functions don't want to be preempted by other processes
> | > | which cause memory pressure. It makes RT task broken.
> | >
> | > For the RT case, if you reached a system OOM situation, your determinism has
> | > already been hurt. If the memcg OOM happens on the same memcg your RT task
> | > is - what will probably be the case most of time - again, the determinism
> | > has deteriorated. For both these cases, giving the dying task SCHED_FIFO
> | > MAX_RT_PRIO-1 means a faster recovery.
> |
> | What I want to say is that determinisic has no relation with OOM.
> | Why is some RT task affected by other process's OOM?
> |
> | Of course, if system has no memory, it is likely to slow down RT task.
> | But it's just only thought. If some task scheduled just is exit, we don't need
> | to raise OOMed task's priority.
> |
> | But raising min rt priority on your patch was what I want.
> | It doesn't preempt any RT task.
> |
> | So until now, I have made noise about your patch.
> | Really, sorry for that.
> | I don't have any objection on raising priority part from now on.
>
> This is the third version of the patch, factoring in your input along with
> Peter's comment. Basically the same patch, but using the lowest RT priority
> to boost the dying task.
>
> Thanks again for reviewing and commenting.
> Luis
>
> oom-killer: give the dying task rt priority (v3)
>
> Give the dying task RT priority so that it can be scheduled quickly and die,
> freeing needed memory.
>
> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv(a)redhat.com>
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 84bbba2..2b0204f 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
> */
> static void __oom_kill_task(struct task_struct *p, int verbose)
> {
> + struct sched_param param;
> +
> if (is_global_init(p)) {
> WARN_ON(1);
> printk(KERN_WARNING "tried to kill init!\n");
> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
> * exit() and clear out its resources quickly...
> */
> p->time_slice = HZ;
> + param.sched_priority = MAX_RT_PRIO-10;
> + sched_setscheduler(p, SCHED_FIFO, &param);
> set_tsk_thread_flag(p, TIF_MEMDIE);
>

BTW, how about the other threads which share mm_struct ?

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Luis Claudio R. Goncalves on
On Sat, May 29, 2010 at 12:59:09PM +0900, KOSAKI Motohiro wrote:
| Hi
|
| > oom-killer: give the dying task rt priority (v3)
| >
| > Give the dying task RT priority so that it can be scheduled quickly and die,
| > freeing needed memory.
| >
| > Signed-off-by: Luis Claudio R. Gon�alves <lgoncalv(a)redhat.com>
|
| Almostly acceptable to me. but I have two requests,
|
| - need 1) force_sig() 2)sched_setscheduler() order as Oleg mentioned
| - don't boost priority if it's in mem_cgroup_out_of_memory()
|
| Can you accept this? if not, can you please explain the reason?
|
| Thanks.

The last patch I posted was the wrong patch from my queue. Sorry for the
confusion. Here is the last version of the patch, including the suggestions
from Oleg, Peter and Kosaki Motohiro:


oom-kill: give the dying task a higher priority (v4)

In a system under heavy load it was observed that even after the
oom-killer selects a task to die, the task may take a long time to die.

Right before sending a SIGKILL to the task selected by the oom-killer
this task has it's priority increased so that it can exit() exit soon,
freeing memory. That is accomplished by:

/*
* We give our sacrificial lamb high priority and access to
* all the memory it needs. That way it should be able to
* exit() and clear out its resources quickly...
*/
p->rt.time_slice = HZ;
set_tsk_thread_flag(p, TIF_MEMDIE);

It sounds plausible giving the dying task an even higher priority to be
sure it will be scheduled sooner and free the desired memory. It was
suggested on LKML using SCHED_FIFO:1, the lowest RT priority so that this
task won't interfere with any running RT task.

Another good suggestion, implemented here, was to avoid boosting the dying
task priority in case of mem_cgroup OOM.

Signed-off-by: Luis Claudio R. Gon�alves <lclaudio(a)uudg.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com>

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 709aedf..6a25293 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -380,7 +380,8 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
* flag though it's unlikely that we select a process with CAP_SYS_RAW_IO
* set.
*/
-static void __oom_kill_task(struct task_struct *p, int verbose)
+static void __oom_kill_task(struct task_struct *p, struct mem_cgroup *mem,
+ int verbose)
{
if (is_global_init(p)) {
WARN_ON(1);
@@ -413,11 +414,20 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
*/
p->rt.time_slice = HZ;
set_tsk_thread_flag(p, TIF_MEMDIE);
-
force_sig(SIGKILL, p);
+ /*
+ * If this is a system OOM (not a memcg OOM), speed up the recovery
+ * by boosting the dying task priority to the lowest FIFO priority.
+ * That helps with the recovery and avoids interfering with RT tasks.
+ */
+ if (mem == NULL) {
+ struct sched_param param;
+ param.sched_priority = 1;
+ sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
+ }
}

-static int oom_kill_task(struct task_struct *p)
+static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
{
/* WARNING: mm may not be dereferenced since we did not obtain its
* value from get_task_mm(p). This is OK since all we need to do is
@@ -430,7 +440,7 @@ static int oom_kill_task(struct task_struct *p)
if (!p->mm || p->signal->oom_adj == OOM_DISABLE)
return 1;

- __oom_kill_task(p, 1);
+ __oom_kill_task(p, mem, 1);

return 0;
}
@@ -449,7 +459,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
* its children or threads, just set TIF_MEMDIE so it can die quickly
*/
if (p->flags & PF_EXITING) {
- __oom_kill_task(p, 0);
+ __oom_kill_task(p, mem, 0);
return 0;
}

@@ -462,10 +472,10 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
continue;
if (mem && !task_in_mem_cgroup(c, mem))
continue;
- if (!oom_kill_task(c))
+ if (!oom_kill_task(c, mem))
return 0;
}
- return oom_kill_task(p);
+ return oom_kill_task(p, mem);
}

#ifdef CONFIG_CGROUP_MEM_RES_CTLR

--
[ Luis Claudio R. Goncalves Bass - Gospel - RT ]
[ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
Hi, Kame.

On Mon, May 31, 2010 at 9:21 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
> On Fri, 28 May 2010 13:48:26 -0300
> "Luis Claudio R. Goncalves" <lclaudio(a)uudg.org> wrote:
>>
>> oom-killer: give the dying task rt priority (v3)
>>
>> Give the dying task RT priority so that it can be scheduled quickly and die,
>> freeing needed memory.
>>
>> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv(a)redhat.com>
>>
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 84bbba2..2b0204f 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
>>   */
>>  static void __oom_kill_task(struct task_struct *p, int verbose)
>>  {
>> +     struct sched_param param;
>> +
>>       if (is_global_init(p)) {
>>               WARN_ON(1);
>>               printk(KERN_WARNING "tried to kill init!\n");
>> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
>>        * exit() and clear out its resources quickly...
>>        */
>>       p->time_slice = HZ;
>> +     param.sched_priority = MAX_RT_PRIO-10;
>> +     sched_setscheduler(p, SCHED_FIFO, &param);
>>       set_tsk_thread_flag(p, TIF_MEMDIE);
>>
>
> BTW, how about the other threads which share mm_struct ?

Could you elaborate your intention? :)

>
> Thanks,
> -Kame
>
>



--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on
On Mon, 31 May 2010 14:01:03 +0900
Minchan Kim <minchan.kim(a)gmail.com> wrote:

> Hi, Kame.
>
> On Mon, May 31, 2010 at 9:21 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
> > On Fri, 28 May 2010 13:48:26 -0300
> > "Luis Claudio R. Goncalves" <lclaudio(a)uudg.org> wrote:
> >>
> >> oom-killer: give the dying task rt priority (v3)
> >>
> >> Give the dying task RT priority so that it can be scheduled quickly and die,
> >> freeing needed memory.
> >>
> >> Signed-off-by: Luis Claudio R. Gonçalves <lgoncalv(a)redhat.com>
> >>
> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> >> index 84bbba2..2b0204f 100644
> >> --- a/mm/oom_kill.c
> >> +++ b/mm/oom_kill.c
> >> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
> >>   */
> >>  static void __oom_kill_task(struct task_struct *p, int verbose)
> >>  {
> >> +     struct sched_param param;
> >> +
> >>       if (is_global_init(p)) {
> >>               WARN_ON(1);
> >>               printk(KERN_WARNING "tried to kill init!\n");
> >> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
> >>        * exit() and clear out its resources quickly...
> >>        */
> >>       p->time_slice = HZ;
> >> +     param.sched_priority = MAX_RT_PRIO-10;
> >> +     sched_setscheduler(p, SCHED_FIFO, &param);
> >>       set_tsk_thread_flag(p, TIF_MEMDIE);
> >>
> >
> > BTW, how about the other threads which share mm_struct ?
>
> Could you elaborate your intention? :)
>

IIUC, the purpose of rising priority is to accerate dying thread to exit()
for freeing memory AFAP. But to free memory, exit, all threads which share
mm_struct should exit, too. I'm sorry if I miss something.

Thanks,
-Kame



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/