From: KOSAKI Motohiro on
> > @@ -291,9 +309,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
> > * Otherwise we could get an easy OOM deadlock.
> > */
> > if (p->flags & PF_EXITING) {
> > - if (p != current)
> > + if (p != current) {
> > + boost_dying_task_prio(p, mem);
> > return ERR_PTR(-1UL);
> > -
> > + }
> > chosen = p;
> > *ppoints = ULONG_MAX;
> > }
>
> This has the potential to actually make it harder to free memory if p is
> waiting to acquire a writelock on mm->mmap_sem in the exit path while the
> thread holding mm->mmap_sem is trying to run.

if p is waiting, changing prio have no effect. It continue tol wait to release mmap_sem.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Luis Claudio R. Goncalves on
On Wed, Jun 02, 2010 at 10:54:01PM +0900, KOSAKI Motohiro wrote:
| > > @@ -291,9 +309,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
| > > * Otherwise we could get an easy OOM deadlock.
| > > */
| > > if (p->flags & PF_EXITING) {
| > > - if (p != current)
| > > + if (p != current) {
| > > + boost_dying_task_prio(p, mem);
| > > return ERR_PTR(-1UL);
| > > -
| > > + }
| > > chosen = p;
| > > *ppoints = ULONG_MAX;
| > > }
| >
| > This has the potential to actually make it harder to free memory if p is
| > waiting to acquire a writelock on mm->mmap_sem in the exit path while the
| > thread holding mm->mmap_sem is trying to run.
|
| if p is waiting, changing prio have no effect. It continue tol wait to release mmap_sem.

Ok, that was not a good idea after all :)

But I understand the !rt_task(p) test is necessary to avoid decrementing
the priority of an eventual RT task selected to die. Though it may also be
a corner case in badness().

Luis
--
[ Luis Claudio R. Goncalves Bass - Gospel - RT ]
[ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Rientjes on
On Wed, 2 Jun 2010, KOSAKI Motohiro wrote:

> > > @@ -291,9 +309,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
> > > * Otherwise we could get an easy OOM deadlock.
> > > */
> > > if (p->flags & PF_EXITING) {
> > > - if (p != current)
> > > + if (p != current) {
> > > + boost_dying_task_prio(p, mem);
> > > return ERR_PTR(-1UL);
> > > -
> > > + }
> > > chosen = p;
> > > *ppoints = ULONG_MAX;
> > > }
> >
> > This has the potential to actually make it harder to free memory if p is
> > waiting to acquire a writelock on mm->mmap_sem in the exit path while the
> > thread holding mm->mmap_sem is trying to run.
>
> if p is waiting, changing prio have no effect. It continue tol wait to release mmap_sem.
>

And that can reduce the runtime of the thread holding a writelock on
mm->mmap_sem, making the exit actually take longer than without the patch
if its priority is significantly higher, especially on smaller machines.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on
> On Wed, 2 Jun 2010, KOSAKI Motohiro wrote:
>
> > > > @@ -291,9 +309,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
> > > > * Otherwise we could get an easy OOM deadlock.
> > > > */
> > > > if (p->flags & PF_EXITING) {
> > > > - if (p != current)
> > > > + if (p != current) {
> > > > + boost_dying_task_prio(p, mem);
> > > > return ERR_PTR(-1UL);
> > > > -
> > > > + }
> > > > chosen = p;
> > > > *ppoints = ULONG_MAX;
> > > > }
> > >
> > > This has the potential to actually make it harder to free memory if p is
> > > waiting to acquire a writelock on mm->mmap_sem in the exit path while the
> > > thread holding mm->mmap_sem is trying to run.
> >
> > if p is waiting, changing prio have no effect. It continue tol wait to release mmap_sem.
> >
>
> And that can reduce the runtime of the thread holding a writelock on
> mm->mmap_sem, making the exit actually take longer than without the patch
> if its priority is significantly higher, especially on smaller machines.

If p need mmap_sem, p is going to sleep to wait mmap_sem. if p doesn't,
quickly exit is good thing. In other word, task fairness is not our goal
when oom occur.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Thu, Jun 3, 2010 at 8:36 AM, KOSAKI Motohiro
<kosaki.motohiro(a)jp.fujitsu.com> wrote:
>> On Wed, 2 Jun 2010, KOSAKI Motohiro wrote:
>>
>> > > > @@ -291,9 +309,10 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
>> > > >                  * Otherwise we could get an easy OOM deadlock.
>> > > >                  */
>> > > >                 if (p->flags & PF_EXITING) {
>> > > > -                       if (p != current)
>> > > > +                       if (p != current) {
>> > > > +                               boost_dying_task_prio(p, mem);
>> > > >                                 return ERR_PTR(-1UL);
>> > > > -
>> > > > +                       }
>> > > >                         chosen = p;
>> > > >                         *ppoints = ULONG_MAX;
>> > > >                 }
>> > >
>> > > This has the potential to actually make it harder to free memory if p is
>> > > waiting to acquire a writelock on mm->mmap_sem in the exit path while the
>> > > thread holding mm->mmap_sem is trying to run.
>> >
>> > if p is waiting, changing prio have no effect. It continue tol wait to release mmap_sem.
>> >
>>
>> And that can reduce the runtime of the thread holding a writelock on
>> mm->mmap_sem, making the exit actually take longer than without the patch
>> if its priority is significantly higher, especially on smaller machines.
>
> If p need mmap_sem, p is going to sleep to wait mmap_sem. if p doesn't,
> quickly exit is good thing. In other word, task fairness is not our goal
> when oom occur.
>

Tend to agree. I didn't agree boosting of whole threads' priority.

Task fairness VS system hang is trade off. task fairness is best
effort but system hang is critical.
Also, we have tried to it.

/*
* We give our sacrificial lamb high priority and access to
* all the memory it needs. That way it should be able to
* exit() and clear out its resources quickly...
*/
p->rt.time_slice = HZ;
set_tsk_thread_flag(p, TIF_MEMDIE);

But I think above code is meaningless unless p use SCHED_RR.
So boosting of lowest RT priority with FIFO is to meet above comment's
goal, I think.

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/