From: Benjamin Herrenschmidt on
On Thu, 2010-04-01 at 12:06 +0100, David Howells wrote:
> Can we provide a kmem_cache_create_early()? One that takes no locks and gets
> cleaned up with the other __init stuff?

Yuck. I hate having to expose more APIs. Also the problem with that is
means callers have to know. So we need to propagate up all call chains
etc... (ie, radix_tree_init_early(), etc...)

This is pretty much exactly the discussion we had when moving sl*b
early, and back then, the final word from Linus (heh, for once he agreed
with me :-) was that this made no sense.

We can bury logic inside kmem_cache_create() though, it's not -that- a
hot path.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Howells on
Linus Torvalds <torvalds(a)linux-foundation.org> wrote:

> Ahh, yes. In this case, that doesn't likely change anything. The
> save/restore versions of the irq-safe locks shouldn't be appreciably more
> expensive than the non-saving ones. And architectures that really care
> should have done their own per-arch optimized version anyway.

That depends on the CPU. Some CPUs have quite expensive interrupt disablement
instructions. FRV does for instance; but fortunately, on the FRV, I can use
some of the excessive quantities of conditional registers to pretend that I
disable interrupts, and only actually do so if an interrupt actually happens.

> Maybe we should even document that - so that nobody else makes the mistake
> x86-64 did of thinking that the "generic spinlock" version of the rwsem's
> is anything but a hacky and bad fallback case.

In some cases, it's actually the best way. On a UP machine, for instance,
where they reduce to nothing or where your only atomic instruction is an XCHG
equivalent.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Fri, 2 Apr 2010, David Howells wrote:

> Linus Torvalds <torvalds(a)linux-foundation.org> wrote:
>
> > Ahh, yes. In this case, that doesn't likely change anything. The
> > save/restore versions of the irq-safe locks shouldn't be appreciably more
> > expensive than the non-saving ones. And architectures that really care
> > should have done their own per-arch optimized version anyway.
>
> That depends on the CPU. Some CPUs have quite expensive interrupt disablement
> instructions. FRV does for instance; but fortunately, on the FRV, I can use
> some of the excessive quantities of conditional registers to pretend that I
> disable interrupts, and only actually do so if an interrupt actually happens.

I think you're missing the part where we're not _adding_ any irq disables:
we're just changing the unconditional irq disable to a save-and-disable
(and the unconditional irq enable to a restore).

So even if irq's are expensive to disable, the change from

spin_lock_irq()

to

spin_lock_irqsave()

won't make that code any more expensive.

> > Maybe we should even document that - so that nobody else makes the mistake
> > x86-64 did of thinking that the "generic spinlock" version of the rwsem's
> > is anything but a hacky and bad fallback case.
>
> In some cases, it's actually the best way. On a UP machine, for instance,
> where they reduce to nothing or where your only atomic instruction is an XCHG
> equivalent.

Again, you seem to think that we used to have just a plain spin_lock. Not
so. We currently have a spin_lock_irq(), and it is NOT a no-op even on UP.
It does that irq disable.

Anyway, I suspect that even with just an atomic xchg, you can do a better
job at doing down_read() than using the generic spin-lock version (likely
by busy-looping on a special "we're busy" value). But if you do want to
use the generic spin-lock version, I doubt any architecture makes that
irqsave version noticeable slower than the unconditional irq version.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Kevin Hilman on
Linus Torvalds <torvalds(a)linux-foundation.org> writes:

> On Wed, 31 Mar 2010, H. Peter Anvin wrote:
>>
>> The obvious way to fix this would be to use
>> spin_lock_irqsave..spin_lock_irqrestore in __down_read as well as in the
>> other locations; I don't have a good feel for what the cost of doing so
>> would be, though. On x86 it's fairly expensive simply because the only
>> way to save the state is to push it on the stack, which the compiler
>> doesn't deal well with, but this code isn't used on x86.
>

[...]

> So making the slow-path do the spin_[un]lock_irq{save,restore}() versions
> sounds like the right thing. It won't be a performance issue: it _is_ the
> slow-path, and we're already doing the expensive part (the spinlock itself
> and the irq thing).
>
> So ACK on the idea. Who wants to write the trivial patch and test it?

OK, I'll bite since I was seeing boot-time hangs on ARM (TI OMAP3) due
to this. Patch below.

Kevin


From 7baff4008353bbfd2a2e2a4da22b87bc4efa4194 Mon Sep 17 00:00:00 2001
From: Kevin Hilman <khilman(a)deeprootsystems.com>
Date: Wed, 7 Apr 2010 11:52:46 -0700
Subject: [PATCH] rwsem generic spinlock: use IRQ save/restore spinlocks

rwsems can be used with IRQs disabled, particularily in early boot
before IRQs are enabled. Currently the spin_unlock_irq() usage in the
slow-patch will unconditionally enable interrupts and cause problems
since interrupts are not yet initialized or enabled.

This patch uses save/restore versions of IRQ spinlocks in the slowpath
to ensure interrupts are not unintentionally disabled.

Signed-off-by: Kevin Hilman <khilman(a)deeprootsystems.com>
---
lib/rwsem-spinlock.c | 14 ++++++++------
1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/lib/rwsem-spinlock.c b/lib/rwsem-spinlock.c
index ccf95bf..ffc9fc7 100644
--- a/lib/rwsem-spinlock.c
+++ b/lib/rwsem-spinlock.c
@@ -143,13 +143,14 @@ void __sched __down_read(struct rw_semaphore *sem)
{
struct rwsem_waiter waiter;
struct task_struct *tsk;
+ unsigned long flags;

- spin_lock_irq(&sem->wait_lock);
+ spin_lock_irqsave(&sem->wait_lock, flags);

if (sem->activity >= 0 && list_empty(&sem->wait_list)) {
/* granted */
sem->activity++;
- spin_unlock_irq(&sem->wait_lock);
+ spin_unlock_irqrestore(&sem->wait_lock, flags);
goto out;
}

@@ -164,7 +165,7 @@ void __sched __down_read(struct rw_semaphore *sem)
list_add_tail(&waiter.list, &sem->wait_list);

/* we don't need to touch the semaphore struct anymore */
- spin_unlock_irq(&sem->wait_lock);
+ spin_unlock_irqrestore(&sem->wait_lock, flags);

/* wait to be given the lock */
for (;;) {
@@ -209,13 +210,14 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
{
struct rwsem_waiter waiter;
struct task_struct *tsk;
+ unsigned long flags;

- spin_lock_irq(&sem->wait_lock);
+ spin_lock_irqsave(&sem->wait_lock, flags);

if (sem->activity == 0 && list_empty(&sem->wait_list)) {
/* granted */
sem->activity = -1;
- spin_unlock_irq(&sem->wait_lock);
+ spin_unlock_irqrestore(&sem->wait_lock, flags);
goto out;
}

@@ -230,7 +232,7 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
list_add_tail(&waiter.list, &sem->wait_list);

/* we don't need to touch the semaphore struct anymore */
- spin_unlock_irq(&sem->wait_lock);
+ spin_unlock_irqrestore(&sem->wait_lock, flags);

/* wait to be given the lock */
for (;;) {
--
1.7.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on
On 04/07/2010 02:13 PM, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a summary report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.33. Please verify if it still should be listed and let the tracking team
> know (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15668
> Subject : start_kernel(): bug: interrupts were enabled early
> Submitter : Rabin Vincent <rabin(a)rab.in>
> Date : 2010-03-25 19:53 (14 days old)
> First-Bad-Commit: http://git.kernel.org/git/linus/773e3eb7b81e5ba13b5155dfb3bb75b8ce37f8f9
^^^^

FWIW, this /git is redundant.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/