From: Ingo Molnar on

* Linus Torvalds <torvalds(a)linux-foundation.org> wrote:

> > But my preferred option would indeed be just turning it back into a
> > spinlock - and screw latency and BKL preemption - and having the RT
> > people who care deeply just work on removing the BKL in the long
> > run.
>
> Here's a trial balloon patch to do that.

here's a simpler trial baloon test-patch (well, hack) that is also
reasonably well tested. It turns the BKL into a "spin-semaphore". If
this resolves the performance problem then it's all due to the BKL's
scheduling/preemption properties.

this approach is ugly (it's just a more expensive spinlock), but has an
advantage: the code logic is obviously correct, and it would also make
it much easier later on to turn the BKL back into a sleeping lock again
- once the TTY code's BKL use is fixed. (i think Alan said it might
happen in the next few months) The BKL is more expensive than a simple
spinlock anyway.

Ingo

------------->
Subject: BKL: spin on acquire
From: Ingo Molnar <mingo(a)elte.hu>
Date: Wed May 07 19:05:40 CEST 2008

NOT-Signed-off-by: Ingo Molnar <mingo(a)elte.hu>
---
lib/kernel_lock.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

Index: linux/lib/kernel_lock.c
===================================================================
--- linux.orig/lib/kernel_lock.c
+++ linux/lib/kernel_lock.c
@@ -46,7 +46,8 @@ int __lockfunc __reacquire_kernel_lock(v
task->lock_depth = -1;
preempt_enable_no_resched();

- down(&kernel_sem);
+ while (down_trylock(&kernel_sem))
+ cpu_relax();

preempt_disable();
task->lock_depth = saved_lock_depth;
@@ -67,11 +68,13 @@ void __lockfunc lock_kernel(void)
struct task_struct *task = current;
int depth = task->lock_depth + 1;

- if (likely(!depth))
+ if (likely(!depth)) {
/*
* No recursion worries - we set up lock_depth _after_
*/
- down(&kernel_sem);
+ while (down_trylock(&kernel_sem))
+ cpu_relax();
+ }

task->lock_depth = depth;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrew Morton on
On Wed, 7 May 2008 10:08:18 -0700 (PDT) Linus Torvalds <torvalds(a)linux-foundation.org> wrote:

> Which is why I'm 100% convinced it's not even worth saving the old code.
> It needs to use mutexes, or spinlocks. I bet it has *nothing* to do with
> "slow path" other than the fact that it gets to that slow path much more
> these days.

Stupid question: why doesn't lock_kernel() use a mutex?

(stupid answer: it'll trigger might_sleep() checks when we do it early in
boot with irqs disabled, but we can fix that)

(And __might_sleep()'s system_state check might even save us from that)

Of course, we shouldn't change anything until we've worked out why the new
semaphores got slower.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Ingo Molnar <mingo(a)elte.hu> wrote:

> There's far more normal mutex fastpath use during an AIM7 run than any
> BKL use. So if it's due to any direct fastpath overhead and the
> resulting widening of the window for the real slowdown, we should see
> a severe slowdown on AIM7 with CONFIG_MUTEX_DEBUG=y. Agreed?

my own guesstimate about AIM7 performance impact resulting out of
CONFIG_MUTEX_DEBUG=y: performance overhead will not be measurable, or
will at most be in the sub-1% range. But i've been badly wrong before :)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Wed, 7 May 2008, Andrew Morton wrote:

> On Wed, 7 May 2008 10:08:18 -0700 (PDT) Linus Torvalds <torvalds(a)linux-foundation.org> wrote:
>
> > Which is why I'm 100% convinced it's not even worth saving the old code.
> > It needs to use mutexes, or spinlocks. I bet it has *nothing* to do with
> > "slow path" other than the fact that it gets to that slow path much more
> > these days.
>
> Stupid question: why doesn't lock_kernel() use a mutex?

Not stupid.

The only reason some code didn't get turned over to mutexes was literally
that they didn't want the debugging because they were doing intentionally
bad things.

I think the BKL is one of them (the console semaphore was another, iirc).

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Linus Torvalds <torvalds(a)linux-foundation.org> wrote:

> Which is why I'm 100% convinced it's not even worth saving the old
> code. It needs to use mutexes, or spinlocks. I bet it has *nothing* to
> do with "slow path" other than the fact that it gets to that slow path
> much more these days.

i think your theory should be easy to test: Yanmin, could you turn on
CONFIG_MUTEX_DEBUG=y and check by how much AIM7 regresses?

Because in the CONFIG_MUTEX_DEBUG=y case the mutex debug code does
exactly that: it doesnt use the single-instruction fastpath [it uses
asm-generic/mutex-null.h] but always drops into the slowpath (to be able
to access debug state). That debug code is about as expensive as the
generic semaphore code's current fastpath. (perhaps even more
expensive.)

There's far more normal mutex fastpath use during an AIM7 run than any
BKL use. So if it's due to any direct fastpath overhead and the
resulting widening of the window for the real slowdown, we should see a
severe slowdown on AIM7 with CONFIG_MUTEX_DEBUG=y. Agreed?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/