From: KOSAKI Motohiro on
> On Thu, 2010-07-08 at 19:57 +0900, KOSAKI Motohiro wrote:
> > > On Thu, 2010-07-08 at 03:39 -0700, Michel Lespinasse wrote:
> > > >
> > > >
> > > > One way to fix this is to have T4 wake from the oom queue and return an
> > > > allocation failure instead of insisting on going oom itself when T1
> > > > decides to take down the task.
> > > >
> > > > How would you have T4 figure out the deadlock situation ? T1 is taking down T2, not T4...
> > >
> > > If T2 and T4 share a mmap_sem they belong to the same process. OOM takes
> > > down the whole process by sending around signals of sorts (SIGKILL?), so
> > > if T4 gets a fatal signal while it is waiting to enter the oom thingy,
> > > have it abort and return an allocation failure.
> > >
> > > That alloc failure (along with a pending fatal signal) will very likely
> > > lead to the release of its mmap_sem (if not, there's more things to
> > > cure).
> > >
> > > At which point the cycle is broken an stuff continues as it was
> > > intended.
> >
> > Now, I've reread current code. I think mmotm already have this.
>
> <snip code>
>
> [ small note on that we really should kill __GFP_NOFAIL, its utter
> deadlock potential ]

I disagree. __GFP_NOFAIL mean this allocation failure can makes really
dangerous result. Instead, OOM-Killer should try to kill next process.
I think.

> > Thought?
>
> So either its not working or google never tried that code?

Michel?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Thu, 2010-07-08 at 19:57 +0900, KOSAKI Motohiro wrote:
> > On Thu, 2010-07-08 at 03:39 -0700, Michel Lespinasse wrote:
> > >
> > >
> > > One way to fix this is to have T4 wake from the oom queue and return an
> > > allocation failure instead of insisting on going oom itself when T1
> > > decides to take down the task.
> > >
> > > How would you have T4 figure out the deadlock situation ? T1 is taking down T2, not T4...
> >
> > If T2 and T4 share a mmap_sem they belong to the same process. OOM takes
> > down the whole process by sending around signals of sorts (SIGKILL?), so
> > if T4 gets a fatal signal while it is waiting to enter the oom thingy,
> > have it abort and return an allocation failure.
> >
> > That alloc failure (along with a pending fatal signal) will very likely
> > lead to the release of its mmap_sem (if not, there's more things to
> > cure).
> >
> > At which point the cycle is broken an stuff continues as it was
> > intended.
>
> Now, I've reread current code. I think mmotm already have this.

<snip code>

[ small note on that we really should kill __GFP_NOFAIL, its utter
deadlock potential ]

> Thought?

So either its not working or google never tried that code?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Thu, 2010-07-08 at 20:06 +0900, KOSAKI Motohiro wrote:
> > [ small note on that we really should kill __GFP_NOFAIL, its utter
> > deadlock potential ]
>
> I disagree. __GFP_NOFAIL mean this allocation failure can makes really
> dangerous result. Instead, OOM-Killer should try to kill next process.
> I think.

Say _what_?! you think NOFAIL is a sane thing? Pretty much everybody has
been agreeing for years that the thing should die.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
Could you please educate your mailer to not send HTML garbage?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on
> On Thu, 2010-07-08 at 20:06 +0900, KOSAKI Motohiro wrote:
> > > [ small note on that we really should kill __GFP_NOFAIL, its utter
> > > deadlock potential ]
> >
> > I disagree. __GFP_NOFAIL mean this allocation failure can makes really
> > dangerous result. Instead, OOM-Killer should try to kill next process.
> > I think.
>
> Say _what_?! you think NOFAIL is a sane thing?

insane obviously ;)
but as far as my experience, some embedded system prefer to use NOFAIL.
So, I don't like to make big hammer crash. NOFAIL killing need long year
rather than you expected, I guess.


> Pretty much everybody has
> been agreeing for years that the thing should die.

I'm not against this at all. but until it die, it should works correctly.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/