From: Peter Zijlstra on
On Fri, 2010-04-16 at 07:32 -0700, Paul E. McKenney wrote:
> On Fri, Apr 16, 2010 at 04:23:39PM +0200, Peter Zijlstra wrote:
> > On Fri, 2010-04-16 at 07:17 -0700, Paul E. McKenney wrote:
> >
> > > > > Of course, with call_rcu_sched(), the corresponding RCU read-side critical
> > > > > sections are non-preemptible. Therefore, in CONFIG_PREEMPT_RT, these
> > > > > read-side critical sections must use raw spinlocks.
> > > >
> > > > OK, so if we fully remove CONFIG_TREE_PREEMPT_RCU (defaulting to y),
> > > > rename all the {call_rcu, rcu_read_lock, rcu_read_unlock,
> > > > synchronize_rcu} functions to {*}_preempt and then add a new
> > > > CONFIG_PREEMPT_RCU that simply maps {*} to either {*}_sched or
> > > > {*}_preempt, we've basically got what I've been asking for for a while,
> > > > no?
> > >
> > > What would rcu_read_lock_preempt() do in a !CONFIG_PREEMPT kernel?
> >
> > Same as for a preempt one, since you'd have to be able to schedule()
> > while holding it to be able to do things like mutex_lock().
>
> So what you really want is something like rcu_read_lock_sleep() rather
> than rcu_read_lock_preempt(), right? The point is that you want to do
> more than merely preempt, given that it is legal to do general blocking
> while holding a mutex, correct?

Right, but CONFIG_TREE_PREEMPT_RCU=y ends up being that. We could change
the name to _sleep, but we've been calling it preemptible-rcu for a long
while now.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Paul E. McKenney on
On Fri, Apr 16, 2010 at 04:56:50PM +0200, Peter Zijlstra wrote:
> On Fri, 2010-04-16 at 07:32 -0700, Paul E. McKenney wrote:
> > On Fri, Apr 16, 2010 at 04:23:39PM +0200, Peter Zijlstra wrote:
> > > On Fri, 2010-04-16 at 07:17 -0700, Paul E. McKenney wrote:
> > >
> > > > > > Of course, with call_rcu_sched(), the corresponding RCU read-side critical
> > > > > > sections are non-preemptible. Therefore, in CONFIG_PREEMPT_RT, these
> > > > > > read-side critical sections must use raw spinlocks.
> > > > >
> > > > > OK, so if we fully remove CONFIG_TREE_PREEMPT_RCU (defaulting to y),
> > > > > rename all the {call_rcu, rcu_read_lock, rcu_read_unlock,
> > > > > synchronize_rcu} functions to {*}_preempt and then add a new
> > > > > CONFIG_PREEMPT_RCU that simply maps {*} to either {*}_sched or
> > > > > {*}_preempt, we've basically got what I've been asking for for a while,
> > > > > no?
> > > >
> > > > What would rcu_read_lock_preempt() do in a !CONFIG_PREEMPT kernel?
> > >
> > > Same as for a preempt one, since you'd have to be able to schedule()
> > > while holding it to be able to do things like mutex_lock().
> >
> > So what you really want is something like rcu_read_lock_sleep() rather
> > than rcu_read_lock_preempt(), right? The point is that you want to do
> > more than merely preempt, given that it is legal to do general blocking
> > while holding a mutex, correct?
>
> Right, but CONFIG_TREE_PREEMPT_RCU=y ends up being that. We could change
> the name to _sleep, but we've been calling it preemptible-rcu for a long
> while now.

It is actually not permitted to do general blocking in a preemptible RCU
read-side critical section. Otherwise, someone is going to block waiting
for a network packet that never comes, thus OOMing the system.

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Fri, 2010-04-16 at 08:09 -0700, Paul E. McKenney wrote:
> On Fri, Apr 16, 2010 at 04:56:50PM +0200, Peter Zijlstra wrote:
> > On Fri, 2010-04-16 at 07:32 -0700, Paul E. McKenney wrote:
> > > On Fri, Apr 16, 2010 at 04:23:39PM +0200, Peter Zijlstra wrote:
> > > > On Fri, 2010-04-16 at 07:17 -0700, Paul E. McKenney wrote:
> > > >
> > > > > > > Of course, with call_rcu_sched(), the corresponding RCU read-side critical
> > > > > > > sections are non-preemptible. Therefore, in CONFIG_PREEMPT_RT, these
> > > > > > > read-side critical sections must use raw spinlocks.
> > > > > >
> > > > > > OK, so if we fully remove CONFIG_TREE_PREEMPT_RCU (defaulting to y),
> > > > > > rename all the {call_rcu, rcu_read_lock, rcu_read_unlock,
> > > > > > synchronize_rcu} functions to {*}_preempt and then add a new
> > > > > > CONFIG_PREEMPT_RCU that simply maps {*} to either {*}_sched or
> > > > > > {*}_preempt, we've basically got what I've been asking for for a while,
> > > > > > no?
> > > > >
> > > > > What would rcu_read_lock_preempt() do in a !CONFIG_PREEMPT kernel?
> > > >
> > > > Same as for a preempt one, since you'd have to be able to schedule()
> > > > while holding it to be able to do things like mutex_lock().
> > >
> > > So what you really want is something like rcu_read_lock_sleep() rather
> > > than rcu_read_lock_preempt(), right? The point is that you want to do
> > > more than merely preempt, given that it is legal to do general blocking
> > > while holding a mutex, correct?
> >
> > Right, but CONFIG_TREE_PREEMPT_RCU=y ends up being that. We could change
> > the name to _sleep, but we've been calling it preemptible-rcu for a long
> > while now.
>
> It is actually not permitted to do general blocking in a preemptible RCU
> read-side critical section. Otherwise, someone is going to block waiting
> for a network packet that never comes, thus OOMing the system.

Sure, something that guarantees progress seems like a sensible
restriction for any lock, and in particular RCU :-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Paul E. McKenney on
On Fri, Apr 16, 2010 at 05:14:15PM +0200, Peter Zijlstra wrote:
> On Fri, 2010-04-16 at 08:09 -0700, Paul E. McKenney wrote:
> > On Fri, Apr 16, 2010 at 04:56:50PM +0200, Peter Zijlstra wrote:
> > > On Fri, 2010-04-16 at 07:32 -0700, Paul E. McKenney wrote:
> > > > On Fri, Apr 16, 2010 at 04:23:39PM +0200, Peter Zijlstra wrote:
> > > > > On Fri, 2010-04-16 at 07:17 -0700, Paul E. McKenney wrote:
> > > > >
> > > > > > > > Of course, with call_rcu_sched(), the corresponding RCU read-side critical
> > > > > > > > sections are non-preemptible. Therefore, in CONFIG_PREEMPT_RT, these
> > > > > > > > read-side critical sections must use raw spinlocks.
> > > > > > >
> > > > > > > OK, so if we fully remove CONFIG_TREE_PREEMPT_RCU (defaulting to y),
> > > > > > > rename all the {call_rcu, rcu_read_lock, rcu_read_unlock,
> > > > > > > synchronize_rcu} functions to {*}_preempt and then add a new
> > > > > > > CONFIG_PREEMPT_RCU that simply maps {*} to either {*}_sched or
> > > > > > > {*}_preempt, we've basically got what I've been asking for for a while,
> > > > > > > no?
> > > > > >
> > > > > > What would rcu_read_lock_preempt() do in a !CONFIG_PREEMPT kernel?
> > > > >
> > > > > Same as for a preempt one, since you'd have to be able to schedule()
> > > > > while holding it to be able to do things like mutex_lock().
> > > >
> > > > So what you really want is something like rcu_read_lock_sleep() rather
> > > > than rcu_read_lock_preempt(), right? The point is that you want to do
> > > > more than merely preempt, given that it is legal to do general blocking
> > > > while holding a mutex, correct?
> > >
> > > Right, but CONFIG_TREE_PREEMPT_RCU=y ends up being that. We could change
> > > the name to _sleep, but we've been calling it preemptible-rcu for a long
> > > while now.
> >
> > It is actually not permitted to do general blocking in a preemptible RCU
> > read-side critical section. Otherwise, someone is going to block waiting
> > for a network packet that never comes, thus OOMing the system.
>
> Sure, something that guarantees progress seems like a sensible
> restriction for any lock, and in particular RCU :-)

Excellent point -- much of the issue really does center around
forward-progress guarantees. In fact the Linux kernel has a number of
locking primitives that require different degrees of forward-progress
guarantee from the code in their respective critical sections:

o spin_lock_irqsave(): Critical sections must guarantee forward
progress against everything except NMI handlers.

o raw_spin_lock(): Critical sections must guarantee forward
progress against everything except IRQ (including softirq)
and NMI handlers.

o spin_lock(): Critical sections must guarantee forward
progress against everything except IRQ (again including softirq)
and NMI handlers and (given CONFIG_PREEMPT_RT) higher-priority
realtime tasks.

o mutex_lock(): Critical sections need not guarantee
forward progress, as general blocking is permitted.

The other issue is the scope of the lock. The Linux kernel has
the following:

o BKL: global scope.

o Everything else: scope defined by the use of the underlying
lock variable.

One of the many reasons that we are trying to get rid of BKL is because
it combines global scope with relatively weak forward-progress guarantees.

So here is how the various RCU flavors stack up:

o rcu_read_lock_bh(): critical sections must guarantee forward
progress against everything except NMI handlers and IRQ handlers,
but not against softirq handlers. Global in scope, so that
violating the forward-progress guarantee risks OOMing the system.

o rcu_read_lock_sched(): critical sections must guarantee
forward progress against everything except NMI and IRQ handlers,
including softirq handlers. Global in scope, so that violating
the forward-progress guarantee risks OOMing the system.

o rcu_read_lock(): critical sections must guarantee forward
progress against everything except NMI handlers, IRQ handlers,
softirq handlers, and (in CONFIG_PREEMPT_RT) higher-priority
realtime tasks. Global in scope, so that violating the
forward-progress guarantee risks OOMing the system.

o srcu_read_lock(): critical sections need not guarantee forward
progress, as general blocking is permitted. Scope is controlled
by the use of the underlying srcu_struct structure.

As you say, one can block in rcu_read_lock() critical sections, but
the only blocking that is really safe is blocking that is subject to
priority inheritance. This prohibits mutexes, because although the
mutexes themselves are subject to priority inheritance, the mutexes'
critical sections might well not be.

So the easy response is "just use SRCU." Of course, SRCU has some
disadvantages at the moment:

o The return value from srcu_read_lock() must be passed to
srcu_read_unlock(). I believe that I can fix this.

o There is no call_srcu(). I believe that I can fix this.

o SRCU uses a flat per-CPU counter scheme that is not particularly
scalable. I believe that I can fix this.

o SRCU's current implementation makes it almost impossible to
implement priority boosting. I believe that I can fix this.

o SRCU requires explicit initialization of the underlying
srcu_struct. Unfortunately, I don't see a reasonable way
around this. Not yet, anyway.

So, is there anything else that you don't like about SRCU?

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Fri, 2010-04-16 at 09:45 -0700, Paul E. McKenney wrote:
> o mutex_lock(): Critical sections need not guarantee
> forward progress, as general blocking is permitted.
>
Right, I would argue that they should guarantee fwd progress, but due to
being able to schedule while holding them, its harder to enforce.

Anything that is waiting for uncertainty should do so without any locks
held and simply re-acquire them once such an event does occur.

> So the easy response is "just use SRCU." Of course, SRCU has some
> disadvantages at the moment:
>
> o The return value from srcu_read_lock() must be passed to
> srcu_read_unlock(). I believe that I can fix this.
>
> o There is no call_srcu(). I believe that I can fix this.
>
> o SRCU uses a flat per-CPU counter scheme that is not particularly
> scalable. I believe that I can fix this.
>
> o SRCU's current implementation makes it almost impossible to
> implement priority boosting. I believe that I can fix this.
>
> o SRCU requires explicit initialization of the underlying
> srcu_struct. Unfortunately, I don't see a reasonable way
> around this. Not yet, anyway.
>
> So, is there anything else that you don't like about SRCU?

No, I quite like SRCU when implemented as preemptible tree RCU, and I
don't at all mind that last point, all dynamic things need some sort of
init. All locks certainly have.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/