From: Balbir Singh on
* MinChan Kim <minchan.kim(a)gmail.com> [2010-05-28 23:06:23]:

> > I confess I failed to distinguish memcg OOM and system OOM and used "in
> > case of OOM kill the selected task the faster you can" as the guideline.
> > If the exit code path is short that shouldn't be a problem.
> >
> > Maybe the right way to go would be giving the dying task the biggest
> > priority inside that memcg to be sure that it will be the next process from
> > that memcg to be scheduled. Would that be reasonable?
>
> Hmm. I can't understand your point.
> What do you mean failing distinguish memcg and system OOM?
>
> We already have been distinguish it by mem_cgroup_out_of_memory.
> (but we have to enable CONFIG_CGROUP_MEM_RES_CTLR).
> So task selected in select_bad_process is one out of memcg's tasks when
> memcg have a memory pressure.
>

We have a routine to help figure out if the task belongs to the memory
cgroup that cause the OOM. The OOM entry from memory cgroup is
different from a regular one.

--
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Luis Claudio R. Goncalves on
On Fri, May 28, 2010 at 11:06:23PM +0900, Minchan Kim wrote:
| On Fri, May 28, 2010 at 09:53:05AM -0300, Luis Claudio R. Goncalves wrote:
| > On Fri, May 28, 2010 at 02:59:02PM +0900, KOSAKI Motohiro wrote:
....
| > | As far as my observation, RT-function always have some syscall. because pure
| > | calculation doesn't need deterministic guarantee. But _if_ you are really
| > | using such priority design. I'm ok maximum NonRT priority instead maximum
| > | RT priority too.
| >
| > I confess I failed to distinguish memcg OOM and system OOM and used "in
| > case of OOM kill the selected task the faster you can" as the guideline.
| > If the exit code path is short that shouldn't be a problem.
| >
| > Maybe the right way to go would be giving the dying task the biggest
| > priority inside that memcg to be sure that it will be the next process from
| > that memcg to be scheduled. Would that be reasonable?
|
| Hmm. I can't understand your point.
| What do you mean failing distinguish memcg and system OOM?
|
| We already have been distinguish it by mem_cgroup_out_of_memory.
| (but we have to enable CONFIG_CGROUP_MEM_RES_CTLR).
| So task selected in select_bad_process is one out of memcg's tasks when
| memcg have a memory pressure.

The approach of giving the highest priority to the dying task makes sense
in a system wide OOM situation. I though that would also be good for the
memcg OOM case.

After Balbir Singh's comment, I understand that in a memcg OOM the dying
task should have a priority just above the priority of the main task of
that memcg, in order to avoid interfering in the rest of the system.

That is the point where I failed to distinguish between memcg and system OOM.

Should I pursue that new idea of looking for the right priority inside the
memcg or is it overkill? I really don't have a clear view of the impact of
a memcg OOM on system performance - don't know if it is better to solve the
issue sooner (highest RT priority) or leave it to be solved later (highest
prio on the memcg). I have the impression the general case points to the
simpler solution.

Luis
--
[ Luis Claudio R. Goncalves Bass - Gospel - RT ]
[ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Fri, May 28, 2010 at 07:50:48PM +0530, Balbir Singh wrote:
> * MinChan Kim <minchan.kim(a)gmail.com> [2010-05-28 23:06:23]:
>
> > > I confess I failed to distinguish memcg OOM and system OOM and used "in
> > > case of OOM kill the selected task the faster you can" as the guideline.
> > > If the exit code path is short that shouldn't be a problem.
> > >
> > > Maybe the right way to go would be giving the dying task the biggest
> > > priority inside that memcg to be sure that it will be the next process from
> > > that memcg to be scheduled. Would that be reasonable?
> >
> > Hmm. I can't understand your point.
> > What do you mean failing distinguish memcg and system OOM?
> >
> > We already have been distinguish it by mem_cgroup_out_of_memory.
> > (but we have to enable CONFIG_CGROUP_MEM_RES_CTLR).
> > So task selected in select_bad_process is one out of memcg's tasks when
> > memcg have a memory pressure.
> >
>
> We have a routine to help figure out if the task belongs to the memory
> cgroup that cause the OOM. The OOM entry from memory cgroup is
> different from a regular one.

I meant it.
My english is poor. "out of" isn't proper.

>
> --
> Three Cheers,
> Balbir

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Sat, 2010-05-29 at 00:12 +0900, Minchan Kim wrote:
> I think highest RT proirity ins't good solution.
> As I mentiond, Some RT functions don't want to be preempted by other processes
> which cause memory pressure. It makes RT task broken.

All the patches I've seen use MAX_RT_PRIO-1, which is actually FIFO-1,
which is the lowest RT priority.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Luis Claudio R. Goncalves on
On Sat, May 29, 2010 at 12:12:49AM +0900, Minchan Kim wrote:
| On Fri, May 28, 2010 at 11:36:17AM -0300, Luis Claudio R. Goncalves wrote:
| > On Fri, May 28, 2010 at 11:06:23PM +0900, Minchan Kim wrote:
| > | On Fri, May 28, 2010 at 09:53:05AM -0300, Luis Claudio R. Goncalves wrote:
| > | > On Fri, May 28, 2010 at 02:59:02PM +0900, KOSAKI Motohiro wrote:
| > ...
| > | > | As far as my observation, RT-function always have some syscall. because pure
| > | > | calculation doesn't need deterministic guarantee. But _if_ you are really
| > | > | using such priority design. I'm ok maximum NonRT priority instead maximum
| > | > | RT priority too.
| > | >
| > | > I confess I failed to distinguish memcg OOM and system OOM and used "in
| > | > case of OOM kill the selected task the faster you can" as the guideline.
| > | > If the exit code path is short that shouldn't be a problem.
| > | >
| > | > Maybe the right way to go would be giving the dying task the biggest
| > | > priority inside that memcg to be sure that it will be the next process from
| > | > that memcg to be scheduled. Would that be reasonable?
| > |
| > | Hmm. I can't understand your point.
| > | What do you mean failing distinguish memcg and system OOM?
| > |
| > | We already have been distinguish it by mem_cgroup_out_of_memory.
| > | (but we have to enable CONFIG_CGROUP_MEM_RES_CTLR).
| > | So task selected in select_bad_process is one out of memcg's tasks when
| > | memcg have a memory pressure.
| >
| > The approach of giving the highest priority to the dying task makes sense
| > in a system wide OOM situation. I though that would also be good for the
| > memcg OOM case.
| >
| > After Balbir Singh's comment, I understand that in a memcg OOM the dying
| > task should have a priority just above the priority of the main task of
| > that memcg, in order to avoid interfering in the rest of the system.
| >
| > That is the point where I failed to distinguish between memcg and system OOM.
| >
| > Should I pursue that new idea of looking for the right priority inside the
| > memcg or is it overkill? I really don't have a clear view of the impact of
| > a memcg OOM on system performance - don't know if it is better to solve the
| > issue sooner (highest RT priority) or leave it to be solved later (highest
| > prio on the memcg). I have the impression the general case points to the
| > simpler solution.
|
| I think highest RT proirity ins't good solution.
| As I mentiond, Some RT functions don't want to be preempted by other processes
| which cause memory pressure. It makes RT task broken.

For the RT case, if you reached a system OOM situation, your determinism has
already been hurt. If the memcg OOM happens on the same memcg your RT task
is - what will probably be the case most of time - again, the determinism
has deteriorated. For both these cases, giving the dying task SCHED_FIFO
MAX_RT_PRIO-1 means a faster recovery.

I don't know what is the system-wide latency effect of a memcg OOM, if any,
or if it would affect an RT task running on another memcg. That is the case
where a more careful priority selection could be necessary.

| On the other hand, normal processes don't have a requirement of RT.
| But it isn't a big problem that it lost little time slice, I think.
| So how about raising max normal priority?
| but I am not sure this is right solution.
| Let's listen other's opinion.
| I believe Peter have a good idea.

Thanks again for helping to discuss this idea.

Luis
--
[ Luis Claudio R. Goncalves Bass - Gospel - RT ]
[ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/