rcu: optionally leave lockdep enabled after RCU lockdep splat [Kernel]

Prev: [PATCH -next] usb gadget webcam: depends on VIDEO_DEV
Next: [PATCHv1 2.6.34-rc6 1/3] mxc: gadget: remove 60mhz clock requirement for freescale mx51 usb core

From: Mathieu Desnoyers on 5 May 2010 18:50

* Paul E. McKenney (paulmck(a)linux.vnet.ibm.com) wrote:
> From: Lai Jiangshan <laijs(a)cn.fujitsu.com>
>
> There is no need to disable lockdep after an RCU lockdep splat,
> so remove the debug_lockdeps_off() from lockdep_rcu_dereference().
> To avoid repeated lockdep splats, use a static variable in the inlined
> rcu_dereference_check() and rcu_dereference_protected() macros so that
> a given instance splats only once, but so that multiple instances can
> be detected per boot.
>
> This is controlled by a new config variable CONFIG_PROVE_RCU_REPEATEDLY,
> which is disabled by default. This provides the normal lockdep behavior
> by default, but permits people who want to find multiple RCU-lockdep
> splats per boot to easily do so.

I'll play the devil's advocate here. (just because that's so much fun)
;-)

If we look at:

include/linux/debug_locks.h:

static inline int __debug_locks_off(void)
{
return xchg(&debug_locks, 0);
}

We see that all code following a call to "debug_locks_off()" can assume
that it cannot possibly run concurrently with other code following
"debug_locks_off()". Now, I'm not saying that the code we currently have
will necessarily break, but I think it is worth asking if there is any
assumption in lockdep (or RCU lockdep more specifically) about mutual
exclusion after debug_locks_off() ?

Because if there is, then this patch is definitely breaking something by
not protecting lockdep against multiple concurrent executions.

Thanks,

Mathieu

>
> Requested-by: Eric Paris <eparis(a)redhat.com>
> Signed-off-by: Lai Jiangshan <laijs(a)cn.fujitsu.com>
> Tested-by: Eric Paris <eparis(a)redhat.com>
> Signed-off-by: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
> ---
> include/linux/rcupdate.h | 6 ++----
> kernel/lockdep.c | 2 ++
> lib/Kconfig.debug | 12 ++++++++++++
> 3 files changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index a8b2e03..4dca275 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -230,8 +230,7 @@ extern int rcu_my_thread_group_empty(void);
> */
> #define rcu_dereference_check(p, c) \
> ({ \
> - if (debug_lockdep_rcu_enabled() && !(c)) \
> - lockdep_rcu_dereference(__FILE__, __LINE__); \
> + __do_rcu_dereference_check(c); \
> rcu_dereference_raw(p); \
> })
>
> @@ -248,8 +247,7 @@ extern int rcu_my_thread_group_empty(void);
> */
> #define rcu_dereference_protected(p, c) \
> ({ \
> - if (debug_lockdep_rcu_enabled() && !(c)) \
> - lockdep_rcu_dereference(__FILE__, __LINE__); \
> + __do_rcu_dereference_check(c); \
> (p); \
> })
>
> diff --git a/kernel/lockdep.c b/kernel/lockdep.c
> index 2594e1c..73747b7 100644
> --- a/kernel/lockdep.c
> +++ b/kernel/lockdep.c
> @@ -3801,8 +3801,10 @@ void lockdep_rcu_dereference(const char *file, const int line)
> {
> struct task_struct *curr = current;
>
> +#ifndef CONFIG_PROVE_RCU_REPEATEDLY
> if (!debug_locks_off())
> return;
> +#endif /* #ifdef CONFIG_PROVE_RCU_REPEATEDLY */
> printk("\n===================================================\n");
> printk( "[ INFO: suspicious rcu_dereference_check() usage. ]\n");
> printk( "---------------------------------------------------\n");
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 935248b..94090b4 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -512,6 +512,18 @@ config PROVE_RCU
>
> Say N if you are unsure.
>
> +config PROVE_RCU_REPEATEDLY
> + bool "RCU debugging: don't disable PROVE_RCU on first splat"
> + depends on PROVE_RCU
> + default n
> + help
> + By itself, PROVE_RCU will disable checking upon issuing the
> + first warning (or "splat"). This feature prevents such
> + disabling, allowing multiple RCU-lockdep warnings to be printed
> + on a single reboot.
> +
> + Say N if you are unsure.
> +
> config LOCKDEP
> bool
> depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
> --
> 1.7.0
>

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Paul E. McKenney on 5 May 2010 19:10

On Wed, May 05, 2010 at 06:46:41PM -0400, Mathieu Desnoyers wrote:
> * Paul E. McKenney (paulmck(a)linux.vnet.ibm.com) wrote:
> > From: Lai Jiangshan <laijs(a)cn.fujitsu.com>
> >
> > There is no need to disable lockdep after an RCU lockdep splat,
> > so remove the debug_lockdeps_off() from lockdep_rcu_dereference().
> > To avoid repeated lockdep splats, use a static variable in the inlined
> > rcu_dereference_check() and rcu_dereference_protected() macros so that
> > a given instance splats only once, but so that multiple instances can
> > be detected per boot.
> >
> > This is controlled by a new config variable CONFIG_PROVE_RCU_REPEATEDLY,
> > which is disabled by default. This provides the normal lockdep behavior
> > by default, but permits people who want to find multiple RCU-lockdep
> > splats per boot to easily do so.
>
> I'll play the devil's advocate here. (just because that's so much fun)
> ;-)
>
> If we look at:
>
> include/linux/debug_locks.h:
>
> static inline int __debug_locks_off(void)
> {
> return xchg(&debug_locks, 0);
> }
>
> We see that all code following a call to "debug_locks_off()" can assume
> that it cannot possibly run concurrently with other code following
> "debug_locks_off()". Now, I'm not saying that the code we currently have
> will necessarily break, but I think it is worth asking if there is any
> assumption in lockdep (or RCU lockdep more specifically) about mutual
> exclusion after debug_locks_off() ?
>
> Because if there is, then this patch is definitely breaking something by
> not protecting lockdep against multiple concurrent executions.

So what in lockdep_rcu_dereference() needs to be protected from concurrent
execution? All that happens beyond that point is a bunch of printk()s,
printing the locks held by this task, and dumping this task's stack.

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Mathieu Desnoyers on 5 May 2010 19:30

* Paul E. McKenney (paulmck(a)linux.vnet.ibm.com) wrote:
> On Wed, May 05, 2010 at 06:46:41PM -0400, Mathieu Desnoyers wrote:
> > * Paul E. McKenney (paulmck(a)linux.vnet.ibm.com) wrote:
> > > From: Lai Jiangshan <laijs(a)cn.fujitsu.com>
> > >
> > > There is no need to disable lockdep after an RCU lockdep splat,
> > > so remove the debug_lockdeps_off() from lockdep_rcu_dereference().
> > > To avoid repeated lockdep splats, use a static variable in the inlined
> > > rcu_dereference_check() and rcu_dereference_protected() macros so that
> > > a given instance splats only once, but so that multiple instances can
> > > be detected per boot.
> > >
> > > This is controlled by a new config variable CONFIG_PROVE_RCU_REPEATEDLY,
> > > which is disabled by default. This provides the normal lockdep behavior
> > > by default, but permits people who want to find multiple RCU-lockdep
> > > splats per boot to easily do so.
> >
> > I'll play the devil's advocate here. (just because that's so much fun)
> > ;-)
> >
> > If we look at:
> >
> > include/linux/debug_locks.h:
> >
> > static inline int __debug_locks_off(void)
> > {
> > return xchg(&debug_locks, 0);
> > }
> >
> > We see that all code following a call to "debug_locks_off()" can assume
> > that it cannot possibly run concurrently with other code following
> > "debug_locks_off()". Now, I'm not saying that the code we currently have
> > will necessarily break, but I think it is worth asking if there is any
> > assumption in lockdep (or RCU lockdep more specifically) about mutual
> > exclusion after debug_locks_off() ?
> >
> > Because if there is, then this patch is definitely breaking something by
> > not protecting lockdep against multiple concurrent executions.
>
> So what in lockdep_rcu_dereference() needs to be protected from concurrent
> execution? All that happens beyond that point is a bunch of printk()s,
> printing the locks held by this task, and dumping this task's stack.
>
> Thanx, Paul

I agree with you that printing the current task information should be safe.
However, I am not sure that concurrent updates to the lock_class while printk()
is showing its information will end up doing what we expect it to do.

It could be acceptable to have unreliable information in these rare cases, but
the important thing would be to ensure that the kernel does not OOPS.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Paul E. McKenney on 5 May 2010 19:40

On Wed, May 05, 2010 at 07:24:57PM -0400, Mathieu Desnoyers wrote:
> * Paul E. McKenney (paulmck(a)linux.vnet.ibm.com) wrote:
> > On Wed, May 05, 2010 at 06:46:41PM -0400, Mathieu Desnoyers wrote:
> > > * Paul E. McKenney (paulmck(a)linux.vnet.ibm.com) wrote:
> > > > From: Lai Jiangshan <laijs(a)cn.fujitsu.com>
> > > >
> > > > There is no need to disable lockdep after an RCU lockdep splat,
> > > > so remove the debug_lockdeps_off() from lockdep_rcu_dereference().
> > > > To avoid repeated lockdep splats, use a static variable in the inlined
> > > > rcu_dereference_check() and rcu_dereference_protected() macros so that
> > > > a given instance splats only once, but so that multiple instances can
> > > > be detected per boot.
> > > >
> > > > This is controlled by a new config variable CONFIG_PROVE_RCU_REPEATEDLY,
> > > > which is disabled by default. This provides the normal lockdep behavior
> > > > by default, but permits people who want to find multiple RCU-lockdep
> > > > splats per boot to easily do so.
> > >
> > > I'll play the devil's advocate here. (just because that's so much fun)
> > > ;-)
> > >
> > > If we look at:
> > >
> > > include/linux/debug_locks.h:
> > >
> > > static inline int __debug_locks_off(void)
> > > {
> > > return xchg(&debug_locks, 0);
> > > }
> > >
> > > We see that all code following a call to "debug_locks_off()" can assume
> > > that it cannot possibly run concurrently with other code following
> > > "debug_locks_off()". Now, I'm not saying that the code we currently have
> > > will necessarily break, but I think it is worth asking if there is any
> > > assumption in lockdep (or RCU lockdep more specifically) about mutual
> > > exclusion after debug_locks_off() ?
> > >
> > > Because if there is, then this patch is definitely breaking something by
> > > not protecting lockdep against multiple concurrent executions.
> >
> > So what in lockdep_rcu_dereference() needs to be protected from concurrent
> > execution? All that happens beyond that point is a bunch of printk()s,
> > printing the locks held by this task, and dumping this task's stack.
> >
> > Thanx, Paul
>
> I agree with you that printing the current task information should be safe.
> However, I am not sure that concurrent updates to the lock_class while printk()
> is showing its information will end up doing what we expect it to do.
>
> It could be acceptable to have unreliable information in these rare cases, but
> the important thing would be to ensure that the kernel does not OOPS.

But any races other than the printk()s can already happen as follows:

o CPU 0 needs to update some information about the lock. It
checks debug_locks and finds that it is non-zero.

o CPU 1 detects a deadlock, and invokes __debug_locks_off(),
which atomically sets debug_locks to zero.

o CPU 1 then proceeds to printk() information that CPU 0
is concurrently modifying. Which looks to be OK in any case.

Or is there some other race that cannot already happen that I am
introducing?

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Mathieu Desnoyers on 5 May 2010 22:10

* Paul E. McKenney (paulmck(a)linux.vnet.ibm.com) wrote:
> On Wed, May 05, 2010 at 07:24:57PM -0400, Mathieu Desnoyers wrote:
> > * Paul E. McKenney (paulmck(a)linux.vnet.ibm.com) wrote:
> > > On Wed, May 05, 2010 at 06:46:41PM -0400, Mathieu Desnoyers wrote:
> > > > * Paul E. McKenney (paulmck(a)linux.vnet.ibm.com) wrote:
> > > > > From: Lai Jiangshan <laijs(a)cn.fujitsu.com>
> > > > >
> > > > > There is no need to disable lockdep after an RCU lockdep splat,
> > > > > so remove the debug_lockdeps_off() from lockdep_rcu_dereference().
> > > > > To avoid repeated lockdep splats, use a static variable in the inlined
> > > > > rcu_dereference_check() and rcu_dereference_protected() macros so that
> > > > > a given instance splats only once, but so that multiple instances can
> > > > > be detected per boot.
> > > > >
> > > > > This is controlled by a new config variable CONFIG_PROVE_RCU_REPEATEDLY,
> > > > > which is disabled by default. This provides the normal lockdep behavior
> > > > > by default, but permits people who want to find multiple RCU-lockdep
> > > > > splats per boot to easily do so.
> > > >
> > > > I'll play the devil's advocate here. (just because that's so much fun)
> > > > ;-)
> > > >
> > > > If we look at:
> > > >
> > > > include/linux/debug_locks.h:
> > > >
> > > > static inline int __debug_locks_off(void)
> > > > {
> > > > return xchg(&debug_locks, 0);
> > > > }
> > > >
> > > > We see that all code following a call to "debug_locks_off()" can assume
> > > > that it cannot possibly run concurrently with other code following
> > > > "debug_locks_off()". Now, I'm not saying that the code we currently have
> > > > will necessarily break, but I think it is worth asking if there is any
> > > > assumption in lockdep (or RCU lockdep more specifically) about mutual
> > > > exclusion after debug_locks_off() ?
> > > >
> > > > Because if there is, then this patch is definitely breaking something by
> > > > not protecting lockdep against multiple concurrent executions.
> > >
> > > So what in lockdep_rcu_dereference() needs to be protected from concurrent
> > > execution? All that happens beyond that point is a bunch of printk()s,
> > > printing the locks held by this task, and dumping this task's stack.
> > >
> > > Thanx, Paul
> >
> > I agree with you that printing the current task information should be safe.
> > However, I am not sure that concurrent updates to the lock_class while printk()
> > is showing its information will end up doing what we expect it to do.
> >
> > It could be acceptable to have unreliable information in these rare cases, but
> > the important thing would be to ensure that the kernel does not OOPS.
>
> But any races other than the printk()s can already happen as follows:
>
> o CPU 0 needs to update some information about the lock. It
> checks debug_locks and finds that it is non-zero.
>
> o CPU 1 detects a deadlock, and invokes __debug_locks_off(),
> which atomically sets debug_locks to zero.
>
> o CPU 1 then proceeds to printk() information that CPU 0
> is concurrently modifying. Which looks to be OK in any case.
>
> Or is there some other race that cannot already happen that I am
> introducing?

Nope, I don't think so. Although it's probably worth putting a comment in
lockdep_rcu_dereference() to state that lockdep can be used by multiple
concurrent instances here, just in case someone ever consider adding code
to this splat handler thinking lockdep is always only used by a single "splat"
at a time.

Thanks,

Mathieu

>
> Thanx, Paul

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2
Prev: [PATCH -next] usb gadget webcam: depends on VIDEO_DEV
Next: [PATCHv1 2.6.34-rc6 1/3] mxc: gadget: remove 60mhz clock requirement for freescale mx51 usb core