From: Linus Torvalds on


On Wed, 31 Mar 2010, H. Peter Anvin wrote:
>
> The obvious way to fix this would be to use
> spin_lock_irqsave..spin_lock_irqrestore in __down_read as well as in the
> other locations; I don't have a good feel for what the cost of doing so
> would be, though. On x86 it's fairly expensive simply because the only
> way to save the state is to push it on the stack, which the compiler
> doesn't deal well with, but this code isn't used on x86.

I think that's what we should just do, with a good comment both in the
code and the changelog. I'm not entirely happy with it, because obviously
it's conceptually kind of dubious to take a lock with interrupts disabled
in the first place, but this is not a new issue per se.

The whole bootup code is special, and we already make similar guarantees
about memory allocators and friends - just because it's too dang painful
to have some special code that does GFP_ATOMIC for early bootup when the
same code is often shared and used at run-time too.

So we've accepted that people can do GFP_KERNEL allocations and we won't
care about them if we're in the boot phase (and suspend/resume), and we
have that whole 'gfp_allowed_mask' thing for that.

I think this probably falls under exactly the same heading of "not pretty,
but let's not blow up".

So making the slow-path do the spin_[un]lock_irq{save,restore}() versions
sounds like the right thing. It won't be a performance issue: it _is_ the
slow-path, and we're already doing the expensive part (the spinlock itself
and the irq thing).

So ACK on the idea. Who wants to write the trivial patch and test it?
Preferably somebody who sees the problem in the first place - x86 should
not be impacted, since the irq-disabling slow-path should never be hit
without contention anyway (and contention cannot/mustnot happen for this
case).

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on
On 03/31/2010 11:48 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2010-03-31 at 23:33 -0400, Andrew Morton wrote:
>> Just a few instructions, I guess. But we can do it with zero.
>>
>> And from a design POV, pretending that down_read()/down_write() can be
>> called with interrupts disabled is daft - they cannot! Why muck up
>> the
>> usual code paths with this startup-specific hack?
>
> Because we the problem of when interrupts are enabled for the first time
> is a nasty one, and having entire layer of things not usable at the
> right level of init because somewhere something might do an irq enable
> due to calling generic code that down's a semaphore is a PITA.
>
> Seriously, Andrew, I don't see a clean solution... Something -somewhere-
> will have to be ugly.
>
> Allocation is a pretty basic service that a lot of stuff expect
> especially when booting.
>
> We went through that discussion before when we moved the SLAB init
> earlier during boot, because it makes no sense to have tons of code to
> have to figure out what allocator to call depending on what phase of the
> moon it's called from (especially when said code can also be called
> later during boot, say for hotplug reasons).
>
> So we moved sl*b init earlier, thus we ought to be able to also
> kmem_cache_alloc() earlier. We -fixed- that problem already afaik.

I would like to point out that initialization is a particular subcase of
a more general rule:

- It is safe to call a semaphore/rwlock down with IRQ disabled *if and
only if* the caller can guarantee non-contention.

Initialization is an obvious subcase, but there might be others.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Howells on

Can we provide a kmem_cache_create_early()? One that takes no locks and gets
cleaned up with the other __init stuff?

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrew Morton on
On Thu, 1 Apr 2010 09:13:31 -0700 (PDT) Linus Torvalds <torvalds(a)linux-foundation.org> wrote:

>
>
> On Wed, 31 Mar 2010, H. Peter Anvin wrote:
> >
> > The obvious way to fix this would be to use
> > spin_lock_irqsave..spin_lock_irqrestore in __down_read as well as in the
> > other locations; I don't have a good feel for what the cost of doing so
> > would be, though. On x86 it's fairly expensive simply because the only
> > way to save the state is to push it on the stack, which the compiler
> > doesn't deal well with, but this code isn't used on x86.
>
> I think that's what we should just do, with a good comment both in the
> code and the changelog. I'm not entirely happy with it, because obviously
> it's conceptually kind of dubious to take a lock with interrupts disabled
> in the first place, but this is not a new issue per se.
>
> The whole bootup code is special, and we already make similar guarantees
> about memory allocators and friends - just because it's too dang painful
> to have some special code that does GFP_ATOMIC for early bootup when the
> same code is often shared and used at run-time too.
>
> So we've accepted that people can do GFP_KERNEL allocations and we won't
> care about them if we're in the boot phase (and suspend/resume), and we
> have that whole 'gfp_allowed_mask' thing for that.
>
> I think this probably falls under exactly the same heading of "not pretty,
> but let's not blow up".
>
> So making the slow-path do the spin_[un]lock_irq{save,restore}() versions
> sounds like the right thing. It won't be a performance issue: it _is_ the
> slow-path, and we're already doing the expensive part (the spinlock itself
> and the irq thing).

It's actually on the fastpath for lib/rwsem-spinlock.c.

> So ACK on the idea. Who wants to write the trivial patch and test it?
> Preferably somebody who sees the problem in the first place - x86 should
> not be impacted, since the irq-disabling slow-path should never be hit
> without contention anyway (and contention cannot/mustnot happen for this
> case).
>
> Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Thu, 1 Apr 2010, Andrew Morton wrote:
>
> > So making the slow-path do the spin_[un]lock_irq{save,restore}() versions
> > sounds like the right thing. It won't be a performance issue: it _is_ the
> > slow-path, and we're already doing the expensive part (the spinlock itself
> > and the irq thing).
>
> It's actually on the fastpath for lib/rwsem-spinlock.c.

Ahh, yes. In this case, that doesn't likely change anything. The
save/restore versions of the irq-safe locks shouldn't be appreciably more
expensive than the non-saving ones. And architectures that really care
should have done their own per-arch optimized version anyway.

Maybe we should even document that - so that nobody else makes the mistake
x86-64 did of thinking that the "generic spinlock" version of the rwsem's
is anything but a hacky and bad fallback case.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/