From: Linus Torvalds on
On Wed, Aug 11, 2010 at 10:09 PM, Michel Lespinasse <walken(a)google.com> wrote:
> On Wed, Aug 11, 2010 at 10:02 PM, Michel Lespinasse <walken(a)google.com> wrote:
>> In arch/ia64/include/asm/rwsem.h I see RWSEM_WAITING_BIAS defined as
>> -__IA64_UL_CONST(0x0000000100000000)
>>
>> This makes it a large, positive unsigned value. This is probably
>> throwing off the rwsem_atomic_update(0, sem) < RWSEM_WAITING_BIAS
>> comparison in my patch (supposed to be long versus long, but actually
>> is long versus unsigned long on ia64).
>
> FYI, I just verified that RWSEM_WAITING_BIAS is defined as signed on
> all architectures except for ia64. So, this would be consistent with
> the issue being observed only on ia64.

Good.

Tony, can you verify that just dropping the "__IA64_UL_CONST" makes
things work for you? Having a non-working rwsem makes me really
nervous, but considering the lack of reports on other architectures, I
really do hope this is just a trivial ia64 signedness difference.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Tony Luck on
On Wed, Aug 11, 2010 at 10:02 PM, Michel Lespinasse <walken(a)google.com> wrote:
> #define RWSEM_UNLOCKED_VALUE __IA64_UL_CONST(0x0000000000000000)
> #define RWSEM_ACTIVE_BIAS (1L)
> #define RWSEM_ACTIVE_MASK (0xffffffffL)
> #define RWSEM_WAITING_BIAS (-0x100000000L)
> #define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
> #define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)

Thanks - that fixes it for me.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Tony Luck on
On Thu, Aug 12, 2010 at 9:24 AM, Tony Luck <tony.luck(a)intel.com> wrote:
> Yes it does. Please pull from
>
> �git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6.git release

I spoke too soon. Life is much better (boots with no issues). But when
using this
kernel to rebuild all my different configs, it locked up on the third
"make -j32".
That is to say that the build stopped making progress. The system is still alive
and can run other things.

top(1) shows few processes consuming cpu time. There's a pile of
defunct processes
and one of the "make" processes is stuck in "D" wait state.

I'm going to try the current kernel with the 424acaae reverted (and
with the ia64 rwsem.h
reverted too) to check whether this is still the same root cause - or
whether I need to
look for some other problem.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Tony Luck on
On Thu, Aug 12, 2010 at 3:23 PM, Tony Luck <tony.luck(a)gmail.com> wrote:
> I'm going to try the current kernel with the 424acaae reverted (and
> with the ia64 rwsem.h reverted too) to check whether this is still the same
> root cause

This kernel ran "make clean ; make -j32" all night long. It's completed 415
cycles with no apparent problems. Average cycle time is 147.2 seconds.
Minimum is 144, max is 158 ... which all looks very normal.

It's probably a good thing that the unsigned RWSEM_WAITING_BIAS
problem led me to this commit with a problem that showed up during
boot. I'd hate to be bisecting this based on a problem that only shows
up after an unknown number of kernel builds :-)

Still no reports from other architectures? So this still seems to be an
ia64 specific problem. Time to start reading rwsem.c

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Tony Luck on
On Fri, Aug 13, 2010 at 9:09 AM, Tony Luck <tony.luck(a)intel.com> wrote:
> Still no reports from other architectures? So this still seems to be an
> ia64 specific problem. Time to start reading rwsem.c

My hung process is "make" (GNU Make 3.81). Stuck forever here:

SYSCALL_DEFINE1(brk, unsigned long, brk)
{
unsigned long rlim, retval;
unsigned long newbrk, oldbrk;
struct mm_struct *mm = current->mm;
unsigned long min_brk;

down_write(&mm->mmap_sem); <<<<<<<<<<<<<<<

Is make multi-threaded these days? If not, I'm confused about how anything
complicated could have happened on mmap_sem.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/