From: Benjamin Herrenschmidt on
On Wed, 2010-04-28 at 14:38 +1000, Benjamin Herrenschmidt wrote:
> Currently, we can hit a nasty case with optimistic spinning on mutexes:
>
> CPU A tries to take a mutex, while holding the BKL
>
> CPU B tried to take the BLK while holding the mutex
>
> This looks like a AB-BA scenario but in practice, is allowed and happens
> due to the auto-release-on-schedule nature of the BKL.

.../...

BTW. The patch is only compile-tested so far :-) It's going to be
hammered with the test case that triggered the original bug hopefully
tonight.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arnd Bergmann on
On Wednesday 28 April 2010, Benjamin Herrenschmidt wrote:
> Now, we -could- make it a bit smarter about the BKL by introducing a
> contention counter and only go out if we own the BKL and it is contended,
> but I didn't feel like this was worth the effort, time is better spent
> removing the BKL from sensitive code path instead.

Agreed.

> - for (;;) {
> + for (timeout = jiffies + 2; jiffies < timeout;) {
> [...]
> - for (;;) {
> + while (jiffies < timeout) {

This needs to use time_before() to avoid problems on jiffies wraparound.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on
On Wed, 2010-04-28 at 14:06 +0200, Arnd Bergmann wrote:
>
> This needs to use time_before() to avoid problems on jiffies
> wraparound.

Ah right, forgot about that, been a while I didn't use jiffies for
anything :-)

I'll respin later today.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Tony Breeds on
On Thu, Apr 29, 2010 at 08:35:21AM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2010-04-28 at 14:06 +0200, Arnd Bergmann wrote:
> >
> > This needs to use time_before() to avoid problems on jiffies
> > wraparound.
>
> Ah right, forgot about that, been a while I didn't use jiffies for
> anything :-)
>
> I'll respin later today.

Like thie perhaps? If this looks good it would be great to get this in .34

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 70abfd3..991d86f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -351,7 +351,8 @@ extern signed long schedule_timeout_killable(signed long timeout);
extern signed long schedule_timeout_uninterruptible(signed long timeout);
asmlinkage void __schedule(void);
asmlinkage void schedule(void);
-extern int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner);
+extern int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner,
+ unsigned long timeout);

struct nsproxy;
struct user_namespace;
diff --git a/kernel/mutex.c b/kernel/mutex.c
index 947b3ad..fcf2573 100644
--- a/kernel/mutex.c
+++ b/kernel/mutex.c
@@ -145,6 +145,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
struct task_struct *task = current;
struct mutex_waiter waiter;
unsigned long flags;
+ unsigned long timeout;

preempt_disable();
mutex_acquire(&lock->dep_map, subclass, 0, ip);
@@ -168,15 +169,22 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
* to serialize everything.
*/

- for (;;) {
+ for (timeout = jiffies + 2; time_before(jiffies, timeout);) {
struct thread_info *owner;

/*
+ * If we own the BKL, then don't spin. The owner of the mutex
+ * might be waiting on us to release the BKL.
+ */
+ if (current->lock_depth >= 0)
+ break;
+
+ /*
* If there's an owner, wait for it to either
* release the lock or go to sleep.
*/
owner = ACCESS_ONCE(lock->owner);
- if (owner && !mutex_spin_on_owner(lock, owner))
+ if (owner && !mutex_spin_on_owner(lock, owner, timeout))
break;

if (atomic_cmpxchg(&lock->count, 1, 0) == 1) {
diff --git a/kernel/sched.c b/kernel/sched.c
index 663a1d0..ef58dc0 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5523,7 +5523,8 @@ EXPORT_SYMBOL(schedule);
* Look out! "owner" is an entirely speculative pointer
* access and not reliable.
*/
-int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner)
+int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner,
+ unsigned long timeout)
{
unsigned int cpu;
struct rq *rq;
@@ -5559,7 +5560,7 @@ int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner)

rq = cpu_rq(cpu);

- for (;;) {
+ while (time_before(jiffies, timeout)) {
/*
* Owner changed, break to re-assess state.
*/

Yours Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Frederic Weisbecker on
On Fri, May 07, 2010 at 02:20:10PM +1000, Tony Breeds wrote:
> On Thu, Apr 29, 2010 at 08:35:21AM +1000, Benjamin Herrenschmidt wrote:
> > On Wed, 2010-04-28 at 14:06 +0200, Arnd Bergmann wrote:
> > >
> > > This needs to use time_before() to avoid problems on jiffies
> > > wraparound.
> >
> > Ah right, forgot about that, been a while I didn't use jiffies for
> > anything :-)
> >
> > I'll respin later today.
>
> Like thie perhaps? If this looks good it would be great to get this in .34
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 70abfd3..991d86f 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -351,7 +351,8 @@ extern signed long schedule_timeout_killable(signed long timeout);
> extern signed long schedule_timeout_uninterruptible(signed long timeout);
> asmlinkage void __schedule(void);
> asmlinkage void schedule(void);
> -extern int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner);
> +extern int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner,
> + unsigned long timeout);
>
> struct nsproxy;
> struct user_namespace;
> diff --git a/kernel/mutex.c b/kernel/mutex.c
> index 947b3ad..fcf2573 100644
> --- a/kernel/mutex.c
> +++ b/kernel/mutex.c
> @@ -145,6 +145,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> struct task_struct *task = current;
> struct mutex_waiter waiter;
> unsigned long flags;
> + unsigned long timeout;
>
> preempt_disable();
> mutex_acquire(&lock->dep_map, subclass, 0, ip);
> @@ -168,15 +169,22 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> * to serialize everything.
> */
>
> - for (;;) {
> + for (timeout = jiffies + 2; time_before(jiffies, timeout);) {
> struct thread_info *owner;
>
> /*
> + * If we own the BKL, then don't spin. The owner of the mutex
> + * might be waiting on us to release the BKL.
> + */
> + if (current->lock_depth >= 0)
> + break;
> +
> + /*
> * If there's an owner, wait for it to either
> * release the lock or go to sleep.
> */
> owner = ACCESS_ONCE(lock->owner);
> - if (owner && !mutex_spin_on_owner(lock, owner))
> + if (owner && !mutex_spin_on_owner(lock, owner, timeout))
> break;



I like the safeguard against the bkl, it looks indeed like something
we should have in .34

But I really don't like the timeout.

This is going to make the things even worse if we have another cause of
deadlock by hiding the worst part of the consequences without actually
solving the problem.
And since the induced latency or deadlock won't be easily visible anymore,
we'll miss there is a problem. So we are going to spin for two jiffies
and only someone doing specific latency measurements will notice, if he's
lucky enough to meet the bug.

Moreover that adds some unnessary (small) overhead in this path.

May be can we have it as a debugging option, something that would
be part of lockdep, which would require CONFIG_DEBUG_MUTEX to
support mutex adaptive spinning.

A debugging option that could just dump the held locks and the
current one if we spin for an excessive timeslice.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/