From: Oleg Nesterov on
On 03/24, Peter Zijlstra wrote:
>
> On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote:
> >
> > - do_fork() clears PF_STARTING and then calls wake_up_new_task()
> > which finally does s/WAKING/RUNNING.
> >
> > But. Nobody can take rq->lock in between. This means a signal
> > from irq (quite possible with CLONE_THREAD) or another rt
> > thread which preempts us can lockup.
>
> Hmm, the signal case might indeed be a problem, however I cannot see how
> the RT thread can be a problem because until we do wake_up_new_task()
> the child will not be runnable and can thus not be preempted.

Indeed, but I meant the _parent_ can be preempted ;)

In short. TASK_WAKING acts as a spinlock in fact. And since ttwu() can
be called from any context, it should be irq-safe: any owner must disable
inerrupts and preemption.

> The reason we have that TASK_WAKING stuff for fork is because
> wake_up_new_task() needs p->cpus_allowed to be stable

Sure! But it is very easy to change wake_up_new_task() to set TASK_WAKING
like ttwu() does. Of course, this needs raw_spin_lock_irq(rq->lock) for
a moment, but afaics that is all?

> So the below patch makes select_task_rq_fair unlock the rq when needed,
> and then puts all ->select_task_rq() calls under rq->lock. This should
> allow us to remove the TASK_WAKING thing from fork which in turn allows
> us to remove the PF_STARTING check in task_is_waking.
>
> How does that look?

I'll try to read this patch tomorrow. But could you please consider
the suggestion above?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Wed, 2010-03-24 at 19:09 +0100, Oleg Nesterov wrote:
> On 03/24, Peter Zijlstra wrote:
> >
> > On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote:
> > >
> > > - do_fork() clears PF_STARTING and then calls wake_up_new_task()
> > > which finally does s/WAKING/RUNNING.
> > >
> > > But. Nobody can take rq->lock in between. This means a signal
> > > from irq (quite possible with CLONE_THREAD) or another rt
> > > thread which preempts us can lockup.
> >
> > Hmm, the signal case might indeed be a problem, however I cannot see how
> > the RT thread can be a problem because until we do wake_up_new_task()
> > the child will not be runnable and can thus not be preempted.
>
> Indeed, but I meant the _parent_ can be preempted ;)

I still can't see how that would be a problem..

> In short. TASK_WAKING acts as a spinlock in fact. And since ttwu() can
> be called from any context, it should be irq-safe: any owner must disable
> inerrupts and preemption.

Agreed, and I think that's corrected with my patch.

> > The reason we have that TASK_WAKING stuff for fork is because
> > wake_up_new_task() needs p->cpus_allowed to be stable
>
> Sure! But it is very easy to change wake_up_new_task() to set TASK_WAKING
> like ttwu() does. Of course, this needs raw_spin_lock_irq(rq->lock) for
> a moment, but afaics that is all?

My patch does that.

> > So the below patch makes select_task_rq_fair unlock the rq when needed,
> > and then puts all ->select_task_rq() calls under rq->lock. This should
> > allow us to remove the TASK_WAKING thing from fork which in turn allows
> > us to remove the PF_STARTING check in task_is_waking.
> >
> > How does that look?
>
> I'll try to read this patch tomorrow. But could you please consider
> the suggestion above?

I think I do all those :-)

I was still looking at removing the TASK_WAKING check from
task_rq_lock() since now we do set_task_cpu() under rq->lock again so it
should be good again.

Hmm, except for sched_fork() that still does set_task_cpu() without
holding rq->lock, but that is before the child gets exposed so there
should not be any concurrency.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Oleg Nesterov on
On 03/25, Peter Zijlstra wrote:
>
> On Wed, 2010-03-24 at 19:09 +0100, Oleg Nesterov wrote:
> > On 03/24, Peter Zijlstra wrote:
> > >
> > > On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote:
> > > >
> > > > - do_fork() clears PF_STARTING and then calls wake_up_new_task()
> > > > which finally does s/WAKING/RUNNING.
> > > >
> > > > But. Nobody can take rq->lock in between. This means a signal
> > > > from irq (quite possible with CLONE_THREAD) or another rt
> > > > thread which preempts us can lockup.
> > >
> > > Hmm, the signal case might indeed be a problem, however I cannot see how
> > > the RT thread can be a problem because until we do wake_up_new_task()
> > > the child will not be runnable and can thus not be preempted.
> >
> > Indeed, but I meant the _parent_ can be preempted ;)
>
> I still can't see how that would be a problem..


The parent P does do_fork(), copy_process returns the new child C with
TASK_WAKING at PF_STARTING set.

do_fork() clears PF_STARTING, but TASK_WAKING is still set, and C is
already visible to the user-space

An rt-thread X preempts P and calls ttwu() (say, it sends a signal to C)

ttwu() loops in task_rq_lock() "forever", because TASK_WAKING is still
set.

> > > The reason we have that TASK_WAKING stuff for fork is because
> > > wake_up_new_task() needs p->cpus_allowed to be stable
> >
> > Sure! But it is very easy to change wake_up_new_task() to set TASK_WAKING
> > like ttwu() does. Of course, this needs raw_spin_lock_irq(rq->lock) for
> > a moment, but afaics that is all?
>
> My patch does that.

OK, that is what I meant. Now, why sched_fork() can't just set TASK_RUNNING ?
This way the "spurious" wakeup can do nothing with the new child, and we
do not have problems with cpuset which needs rq->lock for set_cpus_allowed().

> > > So the below patch makes select_task_rq_fair unlock the rq when needed,
> > > and then puts all ->select_task_rq() calls under rq->lock.

Yes, I thought about this too. I tried to preserve the current
"->select_task_rq() is called without rq->lock" logic.

> This should
> > > allow us to remove the TASK_WAKING thing from fork

Confused. Why can't we simply remove TASK_WAKING from fork without any
changes except in wake_up_new_task() ?

> which in turn allows
> > > us to remove the PF_STARTING check in task_is_waking.

Even if I do not think I understand sched.c above, I'd say we must do
this in any case ;)

> I was still looking at removing the TASK_WAKING check from
> task_rq_lock()

Peter, I can't apply your patch due to rejects (will try again later)
but at first glance, it makes TASK_WAKING unneeded? Since we do not
drop the lock after we set TASK_WAKING, why do we need this state at
all ?

Aha... select_task_rq_fair() can drop the lock, yes? Well, in this
case probably select_task_rq_fair() can set TASK_WAKING before unlock?


I like the current idea to call select_task_rq() without rq->lock, but
of course this is up to you.

However, once again, can't we make a simpler patch?

- remove PF_STARTING from task_waking()

- change sched_fork() to set RUNNING instead of WAKING

- change wake_up_new_task() to set WAKING under rq->lock

This looks simpler to me, and allows to drop rq->lock in ttwu() right
after it sets WAKING.

Another change which seems reasonable is to allow ttwu() to take rq->lock
even if WAKING is set, it can do nothing but check task->state in this case.

What do you think?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Oleg Nesterov on
On 03/25, Oleg Nesterov wrote:
>
> I like the current idea to call select_task_rq() without rq->lock, but
> of course this is up to you.
>
> However, once again, can't we make a simpler patch?
>
> - remove PF_STARTING from task_waking()
>
> - change sched_fork() to set RUNNING instead of WAKING
>
> - change wake_up_new_task() to set WAKING under rq->lock
>
> This looks simpler to me, and allows to drop rq->lock in ttwu() right
> after it sets WAKING.

IOW, something like the (unchecked, uncompiled) patch below.

What do you think?

Oleg.

--- x/kernel/sched.c
+++ x/kernel/sched.c
@@ -912,7 +912,7 @@ static inline void finish_lock_switch(st
*/
static inline int task_is_waking(struct task_struct *p)
{
- return unlikely((p->state == TASK_WAKING) && !(p->flags & PF_STARTING));
+ return unlikely(p->state == TASK_WAKING);
}

/*
@@ -2568,11 +2568,10 @@ void sched_fork(struct task_struct *p, i

__sched_fork(p);
/*
- * We mark the process as waking here. This guarantees that
- * nobody will actually run it, and a signal or other external
- * event cannot wake it up and insert it on the runqueue either.
+ * We mark the process as running here. This guarantees that
+ * nobody will actually wake it up until wake_up_new_task().
*/
- p->state = TASK_WAKING;
+ p->state = TASK_RUNNING;

/*
* Revert to default priority/policy on fork if requested.
@@ -2638,15 +2637,18 @@ void wake_up_new_task(struct task_struct
struct rq *rq;
int cpu = get_cpu();

+ p->state = TASK_WAKING;
+ smp_mb();
+ raw_spin_unlock_wait(&rq->lock);
+
#ifdef CONFIG_SMP
/*
* Fork balancing, do it here and not earlier because:
* - cpus_allowed can change in the fork path
* - any previously selected cpu might disappear through hotplug
*
- * We still have TASK_WAKING but PF_STARTING is gone now, meaning
- * ->cpus_allowed is stable, we have preemption disabled, meaning
- * cpu_online_mask is stable.
+ * TASK_WAKING means ->cpus_allowed is stable, we have preemption
+ * disabled, meaning cpu_online_mask is stable.
*/
cpu = select_task_rq(p, SD_BALANCE_FORK, 0);
set_task_cpu(p, cpu);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Oleg Nesterov on
Argh, sorry for noise...

On 03/25, Oleg Nesterov wrote:
>
> On 03/25, Oleg Nesterov wrote:
> >
> > I like the current idea to call select_task_rq() without rq->lock, but
> > of course this is up to you.
> >
> > However, once again, can't we make a simpler patch?
> >
> > - remove PF_STARTING from task_waking()
> >
> > - change sched_fork() to set RUNNING instead of WAKING

When I reread this thread, suddenly finally I noticed you mentioned
_twice_ your patch does this too ;) Not to mention the patch itself
which I misread. Sorry.

> IOW, something like the (unchecked, uncompiled) patch below.

Still, what do you think?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/