From: Ingo Molnar on

* Linus Torvalds <torvalds(a)linux-foundation.org> wrote:

> > [ this patch should in fact be a bit worse, because there's two more
> > atomics in the fastpath - the fastpath atomics of the old
> > semaphore code. ]
>
> Well, it doesn't have the irq stuff, which is also pretty costly.
> Also, it doesn't nest the accesses the same way (with the counts being
> *inside* the spinlock and serialized against each other), so I'm not
> 100% sure you'd get the same behaviour.
>
> But yes, it certainly has the potential to show the same slowdown. But
> it's not a very good patch, since not showing it doesn't really prove
> much.

ok, the one below does irq ops and the counter behavior - and because
the critical section also has the old-semaphore atomics i think this
should definitely be a more expensive fastpath than what the new generic
code introduces. So if this patch produces a 40% AIM7 slowdown on
v2.6.25 it's the fastpath overhead (and its effects on slowpath
probability) that makes the difference.

Ingo

------------------->
Subject: add BKL atomic overhead
From: Ingo Molnar <mingo(a)elte.hu>
Date: Wed May 07 20:09:13 CEST 2008

NOT-Signed-off-by: Ingo Molnar <mingo(a)elte.hu>
---
lib/kernel_lock.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)

Index: linux-2.6.25/lib/kernel_lock.c
===================================================================
--- linux-2.6.25.orig/lib/kernel_lock.c
+++ linux-2.6.25/lib/kernel_lock.c
@@ -24,6 +24,8 @@
* Don't use in new code.
*/
static DECLARE_MUTEX(kernel_sem);
+static int global_count;
+static DEFINE_SPINLOCK(global_lock);

/*
* Re-acquire the kernel semaphore.
@@ -39,6 +41,7 @@ int __lockfunc __reacquire_kernel_lock(v
{
struct task_struct *task = current;
int saved_lock_depth = task->lock_depth;
+ unsigned long flags;

BUG_ON(saved_lock_depth < 0);

@@ -47,6 +50,10 @@ int __lockfunc __reacquire_kernel_lock(v

down(&kernel_sem);

+ spin_lock_irqsave(&global_lock, flags);
+ global_count++;
+ spin_unlock_irqrestore(&global_lock, flags);
+
preempt_disable();
task->lock_depth = saved_lock_depth;

@@ -55,6 +62,10 @@ int __lockfunc __reacquire_kernel_lock(v

void __lockfunc __release_kernel_lock(void)
{
+ spin_lock_irqsave(&global_lock, flags);
+ global_count--;
+ spin_unlock_irqrestore(&global_lock, flags);
+
up(&kernel_sem);
}

@@ -66,12 +77,17 @@ void __lockfunc lock_kernel(void)
struct task_struct *task = current;
int depth = task->lock_depth + 1;

- if (likely(!depth))
+ if (likely(!depth)) {
/*
* No recursion worries - we set up lock_depth _after_
*/
down(&kernel_sem);

+ spin_lock_irqsave(&global_lock, flags);
+ global_count++;
+ spin_unlock_irqrestore(&global_lock, flags);
+ }
+
task->lock_depth = depth;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Linus Torvalds <torvalds(a)linux-foundation.org> wrote:

> .. Hmm ... Time passes. Linus looks at git history.
>
> It does look like "cond_resched()" has not worked with the BKL since
> 2005, and hasn't taken the BKL into account. Commit 5bbcfd9000:
>
> [PATCH] cond_resched(): fix bogus might_sleep() warning
>
> + if (unlikely(preempt_count()))
> + return;
>
> which talks about the BKS, ie it only took the *semaphore*
> implementation into account. Never the spinlock-with-preemption-count
> one.
>
> Or am I blind?

hm, i think you are right.

most latency reduction was concentrated on the PREEMPT+PREEMPT_BKL case,
and not getting proper cond_resched() behavior in case of !PREEMPT_BKL
would certainly not be noticed by distros or users.

We made CONFIG_PREEMPT_BKL=y the default on SMP in v2.6.8, in this
post-2.6.7 commit that introduced the feature:

| commit fb8f6499abc6a847109d9602b797aa6afd2d5a3d
| Author: Ingo Molnar <mingo(a)elte.hu>
| Date: Fri Jan 7 21:59:57 2005 -0800
|
| [PATCH] remove the BKL by turning it into a semaphore

There was constant trouble around all these variations of preemptability
and their combination with debugging helpers. (So i was rather happy to
get rid of !PREEMPT_BKL - in the (apparently wrong) assumption that no
tears will be shed.)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Wed, 7 May 2008, Ingo Molnar wrote:
>
> ok, the one below does irq ops and the counter behavior

No it doesn't. The counter isn't used for any actual *testing*, so the
locking around it and the serialization of it has absolutely no impact on
the scheduling behaviour!

Since the big slowdown was clearly accompanied by sleeping behaviour (the
processes who didn't get the lock end up sleeping!), that is a *big* part
of the slowdown.

Is it possible that your patch gets similar behaviour? Absolutely. But
you're missing the whole point here. Anybody can make code behave badly
and perform worse. But if you want to just verify that it's about the
sleeping behaviour and timings of the BKL, then you need to do exactly
that: emulate the sleeping behavior, not just the timings _outside_ of the
sleeping behavior.

The thing is, we definitely are interested to see whether it's the BKL or
some other semaphore that is the problem. But the best way to test that is
to just try my patch that *guarantees* that the BKL doesn't have any
semaphore behaviour AT ALL.

Could it be something else entirely? Yes. We know it's semaphore- related.
We don't know for a fact that it's the BKL itself. There could be other
semaphores that are that hot. It sounds unlikely, but quite frankly,
regardless, I don't really see the point of your patches.

If Yanmin tries my patch, it is *guaranteed* to show something. It either
shows that it's about the BKL (and that we absolutely have to do the BKL
as something _else_ than a generic semaphore), or it shows that it's not
about the BKL (and that _all_ the patches in this discussion are likely
pointless).

In contrast, these "try to emulate bad behavior with the old known-ok
semaphores" don't show anything AT ALL. We already know it's related to
semaphores. And your patches aren't even guaranteed to show the same
issue.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Linus Torvalds <torvalds(a)linux-foundation.org> wrote:

> In contrast, these "try to emulate bad behavior with the old known-ok
> semaphores" don't show anything AT ALL. We already know it's related
> to semaphores. And your patches aren't even guaranteed to show the
> same issue.

yeah, i was just trying to come up with patches to probe which one of
the following two possibilities is actually the case:

- if the regression is due to the difference in scheduling behavior of
new semaphores (different wakeup patterns, etc.), that's fixable in
the new semaphore code => then the BKL code need not change.

- if the regression is due due to difference in the fastpath cost, then
the new semaphores can probably not be improved (much of their appeal
comes from them not being complex and not being in assembly) => then
the BKL code needs to change to become cheaper [i.e. then we want
your patch].

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Matthew Wilcox on
On Wed, May 07, 2008 at 12:01:28PM -0700, Linus Torvalds wrote:
> The thing is, we definitely are interested to see whether it's the BKL or
> some other semaphore that is the problem. But the best way to test that is
> to just try my patch that *guarantees* that the BKL doesn't have any
> semaphore behaviour AT ALL.
>
> Could it be something else entirely? Yes. We know it's semaphore- related.
> We don't know for a fact that it's the BKL itself. There could be other
> semaphores that are that hot. It sounds unlikely, but quite frankly,
> regardless, I don't really see the point of your patches.
>
> If Yanmin tries my patch, it is *guaranteed* to show something. It either
> shows that it's about the BKL (and that we absolutely have to do the BKL
> as something _else_ than a generic semaphore), or it shows that it's not
> about the BKL (and that _all_ the patches in this discussion are likely
> pointless).

One patch I'd still like Yanmin to test is my one from yesterday which
removes the BKL from fs/locks.c.

http://marc.info/?l=linux-fsdevel&m=121009123427437&w=2

Obviously, it won't help if the problem isn't the BKL.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/