From: Minchan Kim on
On Sat, May 29, 2010 at 12:21 AM, Peter Zijlstra <peterz(a)infradead.org> wrote:
> On Sat, 2010-05-29 at 00:12 +0900, Minchan Kim wrote:
>> I think highest RT proirity ins't good solution.
>> As I mentiond, Some RT functions don't want to be preempted by other processes
>> which cause memory pressure. It makes RT task broken.
>
> All the patches I've seen use MAX_RT_PRIO-1, which is actually FIFO-1,
> which is the lowest RT priority.

Stupid me. I confused that until now.
That's exactly what I want.


--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Fri, May 28, 2010 at 12:28:42PM -0300, Luis Claudio R. Goncalves wrote:
> On Sat, May 29, 2010 at 12:12:49AM +0900, Minchan Kim wrote:
> | On Fri, May 28, 2010 at 11:36:17AM -0300, Luis Claudio R. Goncalves wrote:
> | > On Fri, May 28, 2010 at 11:06:23PM +0900, Minchan Kim wrote:
> | > | On Fri, May 28, 2010 at 09:53:05AM -0300, Luis Claudio R. Goncalves wrote:
> | > | > On Fri, May 28, 2010 at 02:59:02PM +0900, KOSAKI Motohiro wrote:
> | > ...
> | > | > | As far as my observation, RT-function always have some syscall. because pure
> | > | > | calculation doesn't need deterministic guarantee. But _if_ you are really
> | > | > | using such priority design. I'm ok maximum NonRT priority instead maximum
> | > | > | RT priority too.
> | > | >
> | > | > I confess I failed to distinguish memcg OOM and system OOM and used "in
> | > | > case of OOM kill the selected task the faster you can" as the guideline.
> | > | > If the exit code path is short that shouldn't be a problem.
> | > | >
> | > | > Maybe the right way to go would be giving the dying task the biggest
> | > | > priority inside that memcg to be sure that it will be the next process from
> | > | > that memcg to be scheduled. Would that be reasonable?
> | > |
> | > | Hmm. I can't understand your point.
> | > | What do you mean failing distinguish memcg and system OOM?
> | > |
> | > | We already have been distinguish it by mem_cgroup_out_of_memory.
> | > | (but we have to enable CONFIG_CGROUP_MEM_RES_CTLR).
> | > | So task selected in select_bad_process is one out of memcg's tasks when
> | > | memcg have a memory pressure.
> | >
> | > The approach of giving the highest priority to the dying task makes sense
> | > in a system wide OOM situation. I though that would also be good for the
> | > memcg OOM case.
> | >
> | > After Balbir Singh's comment, I understand that in a memcg OOM the dying
> | > task should have a priority just above the priority of the main task of
> | > that memcg, in order to avoid interfering in the rest of the system.
> | >
> | > That is the point where I failed to distinguish between memcg and system OOM.
> | >
> | > Should I pursue that new idea of looking for the right priority inside the
> | > memcg or is it overkill? I really don't have a clear view of the impact of
> | > a memcg OOM on system performance - don't know if it is better to solve the
> | > issue sooner (highest RT priority) or leave it to be solved later (highest
> | > prio on the memcg). I have the impression the general case points to the
> | > simpler solution.
> |
> | I think highest RT proirity ins't good solution.
> | As I mentiond, Some RT functions don't want to be preempted by other processes
> | which cause memory pressure. It makes RT task broken.
>
> For the RT case, if you reached a system OOM situation, your determinism has
> already been hurt. If the memcg OOM happens on the same memcg your RT task
> is - what will probably be the case most of time - again, the determinism
> has deteriorated. For both these cases, giving the dying task SCHED_FIFO
> MAX_RT_PRIO-1 means a faster recovery.

What I want to say is that determinisic has no relation with OOM.
Why is some RT task affected by other process's OOM?

Of course, if system has no memory, it is likely to slow down RT task.
But it's just only thought. If some task scheduled just is exit, we don't need
to raise OOMed task's priority.

But raising min rt priority on your patch was what I want.
It doesn't preempt any RT task.

So until now, I have made noise about your patch.
Really, sorry for that.
I don't have any objection on raising priority part from now on.

Thanks, Luis.
--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Fri, 2010-05-28 at 00:51 -0300, Luis Claudio R. Goncalves wrote:
> + param.sched_priority = MAX_RT_PRIO-1;
> + sched_setscheduler_nocheck(p, SCHED_FIFO, &param);


Argh, so you got me confused as well.

the sched_param ones are userspace values, so you should be using 1.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Luis Claudio R. Goncalves on
On Sat, May 29, 2010 at 12:45:49AM +0900, Minchan Kim wrote:
| On Fri, May 28, 2010 at 12:28:42PM -0300, Luis Claudio R. Goncalves wrote:
| > On Sat, May 29, 2010 at 12:12:49AM +0900, Minchan Kim wrote:
....
| > | I think highest RT proirity ins't good solution.
| > | As I mentiond, Some RT functions don't want to be preempted by other processes
| > | which cause memory pressure. It makes RT task broken.
| >
| > For the RT case, if you reached a system OOM situation, your determinism has
| > already been hurt. If the memcg OOM happens on the same memcg your RT task
| > is - what will probably be the case most of time - again, the determinism
| > has deteriorated. For both these cases, giving the dying task SCHED_FIFO
| > MAX_RT_PRIO-1 means a faster recovery.
|
| What I want to say is that determinisic has no relation with OOM.
| Why is some RT task affected by other process's OOM?
|
| Of course, if system has no memory, it is likely to slow down RT task.
| But it's just only thought. If some task scheduled just is exit, we don't need
| to raise OOMed task's priority.
|
| But raising min rt priority on your patch was what I want.
| It doesn't preempt any RT task.
|
| So until now, I have made noise about your patch.
| Really, sorry for that.
| I don't have any objection on raising priority part from now on.

This is the third version of the patch, factoring in your input along with
Peter's comment. Basically the same patch, but using the lowest RT priority
to boost the dying task.

Thanks again for reviewing and commenting.
Luis

oom-killer: give the dying task rt priority (v3)

Give the dying task RT priority so that it can be scheduled quickly and die,
freeing needed memory.

Signed-off-by: Luis Claudio R. Gon�alves <lgoncalv(a)redhat.com>

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 84bbba2..2b0204f 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
*/
static void __oom_kill_task(struct task_struct *p, int verbose)
{
+ struct sched_param param;
+
if (is_global_init(p)) {
WARN_ON(1);
printk(KERN_WARNING "tried to kill init!\n");
@@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
* exit() and clear out its resources quickly...
*/
p->time_slice = HZ;
+ param.sched_priority = MAX_RT_PRIO-10;
+ sched_setscheduler(p, SCHED_FIFO, &param);
set_tsk_thread_flag(p, TIF_MEMDIE);

force_sig(SIGKILL, p);
--
[ Luis Claudio R. Goncalves Bass - Gospel - RT ]
[ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on
Hi

> oom-killer: give the dying task rt priority (v3)
>
> Give the dying task RT priority so that it can be scheduled quickly and die,
> freeing needed memory.
>
> Signed-off-by: Luis Claudio R. Gon�alves <lgoncalv(a)redhat.com>

Almostly acceptable to me. but I have two requests,

- need 1) force_sig() 2)sched_setscheduler() order as Oleg mentioned
- don't boost priority if it's in mem_cgroup_out_of_memory()

Can you accept this? if not, can you please explain the reason?

Thanks.

>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 84bbba2..2b0204f 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -266,6 +266,8 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
> */
> static void __oom_kill_task(struct task_struct *p, int verbose)
> {
> + struct sched_param param;
> +
> if (is_global_init(p)) {
> WARN_ON(1);
> printk(KERN_WARNING "tried to kill init!\n");
> @@ -288,6 +290,8 @@ static void __oom_kill_task(struct task_struct *p, int verbose)
> * exit() and clear out its resources quickly...
> */
> p->time_slice = HZ;
> + param.sched_priority = MAX_RT_PRIO-10;
> + sched_setscheduler(p, SCHED_FIFO, &param);
> set_tsk_thread_flag(p, TIF_MEMDIE);
>
> force_sig(SIGKILL, p);
> --
> [ Luis Claudio R. Goncalves Bass - Gospel - RT ]
> [ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]
>



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/