From: Russell King on
On Wed, Mar 31, 2010 at 01:47:23PM -0700, Yinghai Lu wrote:
> On 03/31/2010 01:40 PM, Andrew Morton wrote:
> > We have two checks in start_kernel():
> >
> > if (!irqs_disabled()) {
> > printk(KERN_WARNING "start_kernel(): bug: interrupts were "
> > "enabled *very* early, fixing it\n");
> > local_irq_disable();
> > }
> > rcu_init();
> > radix_tree_init();
> > /* init some links before init_ISA_irqs() */
> > early_irq_init();
> > init_IRQ();
> > prio_tree_init();
> > init_timers();
> > hrtimers_init();
> > softirq_init();
> > timekeeping_init();
> > time_init();
> > profile_init();
> > if (!irqs_disabled())
> > printk(KERN_CRIT "start_kernel(): bug: interrupts were "
> > "enabled early\n");
> >
> > perhaps the second one isn't needed? Perhaps no architecture requires
> > that local interrupts be disabled across the above initialisations?
>
> spin_unlock_irq from arm is different from other archs?

We use the standard generic kernel implementation. Is x86 different? ;)

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Matthew Wilcox on
On Wed, Mar 31, 2010 at 02:05:00PM -0700, H. Peter Anvin wrote:
> What I note is that lib/rwsem-spinlock.c seems to be rather inconsistent
> in its use of spin_lock_irqsave/spin_lock_irqrestore versus
> spin_lock_irq/spin_unlock_irq... in fact, __down_read is the *only*
> place where we use the latter as opposed to the former.
>
> Is that a bug? If so, it would certainly explain this behavior.

It's based on down_read() and down_write() not being callable from
interrupt context, or with interrupts disabled (since they can sleep).
up_read(), up_write(), down_read_trylock(), down_write_trylock(),
downgrade_write() can all be called from interrupt context since they
cannot sleep.

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on
On 03/31/2010 02:01 PM, Matthew Wilcox wrote:
> On Wed, Mar 31, 2010 at 01:47:23PM -0700, Yinghai Lu wrote:
>>>>> This appears to be caused by:
>>>>>
>>>>> start_kernel -> radix_tree_init -> kmem_cache_create (slub) ->
>>>>> down_write -> __down_write (lib/rwsem-spinlock.c) -> spin_unlock_irq
>>>>>
>>> That's going to be hard to fix.
>>>
>> spin_unlock_irq from arm is different from other archs?
>
> Not all arches use lib/rwsem-spinlock.c. In particular, x86 doesn't
> when X86_XADD is set.
>

What I note is that lib/rwsem-spinlock.c seems to be rather inconsistent
in its use of spin_lock_irqsave/spin_lock_irqrestore versus
spin_lock_irq/spin_unlock_irq... in fact, __down_read is the *only*
place where we use the latter as opposed to the former.

Is that a bug? If so, it would certainly explain this behavior.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on
On 03/31/2010 01:52 PM, Andrew Morton wrote:
> On Wed, 31 Mar 2010 13:47:23 -0700
> Yinghai Lu <yinghai(a)kernel.org> wrote:
>
>> spin_unlock_irq from arm is different from other archs?
>
> No, spin_unlock_irq() unconditionally enables interrupts on all
> architectures.

So I found checkin 60ba96e546da45d9e22bb04b84971a25684e4d46 in the
bk-historic git tree:

[PATCH] rwsem: Make rwsems use interrupt disabling spinlocks

The attached patch makes read/write semaphores use interrupt disabling
spinlocks in the slow path, thus rendering the up functions and trylock
functions available for use in interrupt context. This matches the
regular semaphore behaviour.

I've assumed that the normal down functions must be called with
interrupts enabled (since they might schedule), and used the
irq-disabling spinlock variants that don't save the flags.

Signed-Off-By: David Howells <dhowells(a)redhat.com>
Tested-by: Badari Pulavarty <pbadari(a)us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds(a)osdl.org>

What we have here is a case of this assumption being violated, because
the lock is taken with interrupts disabled on a path where contention
cannot happen (because the code is single-threaded at this point), but
the lock is taken due to reuse of generic code.

The obvious way to fix this would be to use
spin_lock_irqsave..spin_lock_irqrestore in __down_read as well as in the
other locations; I don't have a good feel for what the cost of doing so
would be, though. On x86 it's fairly expensive simply because the only
way to save the state is to push it on the stack, which the compiler
doesn't deal well with, but this code isn't used on x86.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrew Morton on
On Wed, 31 Mar 2010 14:12:54 -0700
"H. Peter Anvin" <hpa(a)zytor.com> wrote:

> On 03/31/2010 01:52 PM, Andrew Morton wrote:
> > On Wed, 31 Mar 2010 13:47:23 -0700
> > Yinghai Lu <yinghai(a)kernel.org> wrote:
> >
> >> spin_unlock_irq from arm is different from other archs?
> >
> > No, spin_unlock_irq() unconditionally enables interrupts on all
> > architectures.
>
> So I found checkin 60ba96e546da45d9e22bb04b84971a25684e4d46 in the
> bk-historic git tree:
>
> [PATCH] rwsem: Make rwsems use interrupt disabling spinlocks
>
> The attached patch makes read/write semaphores use interrupt disabling
> spinlocks in the slow path, thus rendering the up functions and trylock
> functions available for use in interrupt context. This matches the
> regular semaphore behaviour.
>
> I've assumed that the normal down functions must be called with
> interrupts enabled (since they might schedule), and used the
> irq-disabling spinlock variants that don't save the flags.
>
> Signed-Off-By: David Howells <dhowells(a)redhat.com>
> Tested-by: Badari Pulavarty <pbadari(a)us.ibm.com>
> Signed-off-by: Linus Torvalds <torvalds(a)osdl.org>
>
> What we have here is a case of this assumption being violated, because
> the lock is taken with interrupts disabled on a path where contention
> cannot happen (because the code is single-threaded at this point), but
> the lock is taken due to reuse of generic code.
>
> The obvious way to fix this would be to use
> spin_lock_irqsave..spin_lock_irqrestore in __down_read as well as in the
> other locations; I don't have a good feel for what the cost of doing so
> would be, though. On x86 it's fairly expensive simply because the only
> way to save the state is to push it on the stack, which the compiler
> doesn't deal well with, but this code isn't used on x86.
>

Well, it's all a bit nasty. kmem_cache_create() does a lot of stuff,
including calling into the page allocator with GFP_KERNEL - expecting
kmem_cache_create() to preserve local_irq_disable() is a bit optimistic.

radix_tree_init() calls hotcpu_notifier() which also does
mutex_lock(&cpu_add_remove_lock);

The easiest fix is to reposition the interrutps-are-now-enabled point
in start_kernel(). But I have a feeling that some versions of
early_irq_init() won't like that.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/