From: Michal Piotrowski on
Hi,

On 28/09/06, Andrew Morton <akpm(a)osdl.org> wrote:
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18/2.6.18-mm2/
>
>

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.18-mm2 #1
-------------------------------------------------------
nash/1264 is trying to acquire lock:
(&bdev_part_lock_key){--..}, at: [<c0310d4a>] mutex_lock+0x1c/0x1f

but task is already holding lock:
(&new->reconfig_mutex){--..}, at: [<c03108ff>]
mutex_lock_interruptible+0x1c/0x1f

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&new->reconfig_mutex){--..}:
[<c01390b8>] add_lock_to_list+0x5c/0x7a
[<c013b1dd>] __lock_acquire+0x9f3/0xaef
[<c013b643>] lock_acquire+0x71/0x91
[<c031068f>] __mutex_lock_interruptible_slowpath+0xd2/0x326
[<c03108ff>] mutex_lock_interruptible+0x1c/0x1f
[<c02ba4e3>] md_open+0x28/0x5d
[<c0197853>] do_open+0x8b/0x377
[<c0197cd5>] blkdev_open+0x1d/0x46
[<c0172f36>] __dentry_open+0x133/0x260
[<c01730d1>] nameidata_to_filp+0x1c/0x2e
[<c0173111>] do_filp_open+0x2e/0x35
[<c0173170>] do_sys_open+0x58/0xde
[<c0173222>] sys_open+0x16/0x18
[<c0103297>] syscall_call+0x7/0xb
[<ffffffff>] 0xffffffff

-> #1 (&bdev->bd_mutex){--..}:
[<c01390b8>] add_lock_to_list+0x5c/0x7a
[<c013b1dd>] __lock_acquire+0x9f3/0xaef
[<c013b643>] lock_acquire+0x71/0x91
[<c0310b0f>] __mutex_lock_slowpath+0xd2/0x2f1
[<c0310d4a>] mutex_lock+0x1c/0x1f
[<c0197824>] do_open+0x5c/0x377
[<c0197bab>] blkdev_get+0x6c/0x77
[<c01978d0>] do_open+0x108/0x377
[<c0197bab>] blkdev_get+0x6c/0x77
[<c0197eb1>] open_by_devnum+0x30/0x3c
[<c0147419>] swsusp_check+0x14/0xc5
[<c0145865>] software_resume+0x7e/0x100
[<c010049e>] init+0x121/0x29f
[<c0103f23>] kernel_thread_helper+0x7/0x10
[<c0109523>] save_stack_trace+0x17/0x30
[<c0138fb0>] save_trace+0x4f/0xfb
[<c01390b8>] add_lock_to_list+0x5c/0x7a
[<c013b1dd>] __lock_acquire+0x9f3/0xaef
[<c013b643>] lock_acquire+0x71/0x91
[<c0310b0f>] __mutex_lock_slowpath+0xd2/0x2f1
[<c0310d4a>] mutex_lock+0x1c/0x1f
[<c0197824>] do_open+0x5c/0x377
[<c0197bab>] blkdev_get+0x6c/0x77
[<c01978d0>] do_open+0x108/0x377
[<c0197bab>] blkdev_get+0x6c/0x77
[<c0197eb1>] open_by_devnum+0x30/0x3c
[<c0147419>] swsusp_check+0x14/0xc5
[<c0145865>] software_resume+0x7e/0x100
[<c010049e>] init+0x121/0x29f
[<c0103f23>] kernel_thread_helper+0x7/0x10
[<ffffffff>] 0xffffffff

-> #0 (&bdev_part_lock_key){--..}:
[<c013a7b6>] print_circular_bug_tail+0x30/0x64
[<c013b114>] __lock_acquire+0x92a/0xaef
[<c013b643>] lock_acquire+0x71/0x91
[<c0310b0f>] __mutex_lock_slowpath+0xd2/0x2f1
[<c0310d4a>] mutex_lock+0x1c/0x1f
[<c0197323>] bd_claim_by_disk+0x5f/0x18e
[<c02b44ec>] bind_rdev_to_array+0x1f0/0x20e
[<c02b6453>] autostart_arrays+0x24b/0x322
[<c02b9158>] md_ioctl+0x91/0x13f4
[<c01ea5bc>] blkdev_driver_ioctl+0x49/0x5b
[<c01ead23>] blkdev_ioctl+0x755/0x7a2
[<c0196f9d>] block_ioctl+0x16/0x1b
[<c01801d2>] do_ioctl+0x22/0x67
[<c0180460>] vfs_ioctl+0x249/0x25c
[<c01804ba>] sys_ioctl+0x47/0x75
[<c0103297>] syscall_call+0x7/0xb
[<ffffffff>] 0xffffffff

other info that might help us debug this:

1 lock held by nash/1264:
#0: (&new->reconfig_mutex){--..}, at: [<c03108ff>]
mutex_lock_interruptible+0x1c/0x1f
stack backtrace:
[<c0104215>] dump_trace+0x64/0x1cd
[<c0104390>] show_trace_log_lvl+0x12/0x25
[<c01049e5>] show_trace+0xd/0x10
[<c0104aad>] dump_stack+0x19/0x1b
[<c013a7df>] print_circular_bug_tail+0x59/0x64
[<c013b114>] __lock_acquire+0x92a/0xaef
[<c013b643>] lock_acquire+0x71/0x91
[<c0310b0f>] __mutex_lock_slowpath+0xd2/0x2f1
[<c0310d4a>] mutex_lock+0x1c/0x1f
[<c0197323>] bd_claim_by_disk+0x5f/0x18e
[<c02b44ec>] bind_rdev_to_array+0x1f0/0x20e
[<c02b6453>] autostart_arrays+0x24b/0x322
[<c02b9158>] md_ioctl+0x91/0x13f4
[<c01ea5bc>] blkdev_driver_ioctl+0x49/0x5b
[<c01ead23>] blkdev_ioctl+0x755/0x7a2
[<c0196f9d>] block_ioctl+0x16/0x1b
[<c01801d2>] do_ioctl+0x22/0x67
[<c0180460>] vfs_ioctl+0x249/0x25c
[<c01804ba>] sys_ioctl+0x47/0x75
[<c0103297>] syscall_call+0x7/0xb
DWARF2 unwinder stuck at syscall_call+0x7/0xb

Leftover inexact backtrace:

=======================
md: bind<hdb2>

config & dmesg http://www.stardust.webpages.pl/files/mm/2.6.18-mm2/

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Steve Fox on
On Thu, 28 Sep 2006 01:46:23 -0700, Andrew Morton wrote:

> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18/2.6.18-mm2/

Panic on boot. This machine booted 2.6.18-mm1 fine. em64t machine.

TCP bic registered
TCP westwood registered
TCP htcp registered
NET: Registered protocol family 1
NET: Registered protocol family 17
Unable to handle kernel paging request at ffffffffffffffff RIP:
[<ffffffff8047ef93>] packet_notifier+0x163/0x1a0
PGD 203027 PUD 2b031067 PMD 0
Oops: 0000 [1] SMP
last sysfs file:
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.18-mm2-autokern1 #1
RIP: 0010:[<ffffffff8047ef93>] [<ffffffff8047ef93>] packet_notifier+0x163/0x1a0
RSP: 0000:ffff810bffcbde90 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff810bff4a1000 RCX: 2222222222222222
RDX: ffff810bff4a1000 RSI: 0000000000000005 RDI: ffffffff8055f5e0
RBP: ffffffffffffffff R08: 0000000000007616 R09: 000000000000000e
R10: 0000000000000006 R11: ffffffff803373f0 R12: 0000000000000000
R13: 0000000000000005 R14: ffff810bff4a1000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffffffff805d8000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffffffffffffffff CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 1, threadinfo ffff810bffcbc000, task ffff810bffcbb510)
Stack: ffff810bff4a1000 ffffffff8055f4c0 0000000000000000 ffff810bffcbdef0
0000000000000000 ffffffff8042736e 0000000000000000 0000000000000000
0000000000000000 ffffffff8061c68d ffffffff806260f0 ffffffff80207182
Call Trace:
[<ffffffff8042736e>] register_netdevice_notifier+0x3e/0x70
[<ffffffff8061c68d>] packet_init+0x2d/0x53
[<ffffffff80207182>] init+0x162/0x330
[<ffffffff8020a9d8>] child_rip+0xa/0x12
[<ffffffff8033c2a2>] acpi_ds_init_one_object+0x0/0x82
[<ffffffff80207020>] init+0x0/0x330
[<ffffffff8020a9ce>] child_rip+0x0/0x12


Code: 48 8b 45 00 0f 18 08 49 83 fd 02 4c 8d 65 f8 0f 84 f8 fe ff
RIP [<ffffffff8047ef93>] packet_notifier+0x163/0x1a0
RSP <ffff810bffcbde90>
CR2: ffffffffffffffff
<0>Kernel panic - not syncing: Attempted to kill init!

--

Steve Fox
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: thunder7 on
From: Steve Fox <drfickle(a)us.ibm.com>
Date: Thu, Sep 28, 2006 at 05:50:31PM +0000
> On Thu, 28 Sep 2006 01:46:23 -0700, Andrew Morton wrote:
>
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18/2.6.18-mm2/
>
> Panic on boot. This machine booted 2.6.18-mm1 fine. em64t machine.
>
> TCP bic registered
> TCP westwood registered
> TCP htcp registered
> NET: Registered protocol family 1
> NET: Registered protocol family 17
> Unable to handle kernel paging request at ffffffffffffffff RIP:

I think you need to post additional details, such as .config files.
2.6.18-mm2 boots fine here (x86-64, X2 4600 cpu, smp)

Linux version 2.6.18-mm2 (jurriaan(a)middle) (gcc version 4.1.2 20060920 (prerelease) (Debian 4.1.1-14)) #5 SMP Thu Sep 28 19:56:29 CEST 2006
Command line: root=/dev/md2 video=nvidiafb:1600x1200-32(a)85 atkbd.softrepeat=1
protocol family 1
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
NET: Registered protocol family 17
NET: Registered protocol family 15
NET: Registered protocol family 8
NET: Registered protocol family 20
powernow-k8: Found 2 AMD Athlon(tm) 64 X2 Dual Core Processor 4600+ processors (version 2.00.00)

Kind regards,
Jurriaan
--
"I resent it as well," said Scharde. "I am working to keep my rage under
control."
Jack Vance - Ecce and Old Earth
Debian (Unstable) GNU/Linux 2.6.18-mm2 2x4826 bogomips load 1.35
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrew Morton on

(please always do reply-to-all)

On Thu, 28 Sep 2006 17:50:31 +0000 (UTC)
"Steve Fox" <drfickle(a)us.ibm.com> wrote:

> On Thu, 28 Sep 2006 01:46:23 -0700, Andrew Morton wrote:
>
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18/2.6.18-mm2/
>
> Panic on boot. This machine booted 2.6.18-mm1 fine. em64t machine.
>
> TCP bic registered
> TCP westwood registered
> TCP htcp registered
> NET: Registered protocol family 1
> NET: Registered protocol family 17
> Unable to handle kernel paging request at ffffffffffffffff RIP:
> [<ffffffff8047ef93>] packet_notifier+0x163/0x1a0
> PGD 203027 PUD 2b031067 PMD 0
> Oops: 0000 [1] SMP
> last sysfs file:
> CPU 0
> Modules linked in:
> Pid: 1, comm: swapper Not tainted 2.6.18-mm2-autokern1 #1
> RIP: 0010:[<ffffffff8047ef93>] [<ffffffff8047ef93>] packet_notifier+0x163/0x1a0
> RSP: 0000:ffff810bffcbde90 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: ffff810bff4a1000 RCX: 2222222222222222
> RDX: ffff810bff4a1000 RSI: 0000000000000005 RDI: ffffffff8055f5e0
> RBP: ffffffffffffffff R08: 0000000000007616 R09: 000000000000000e
> R10: 0000000000000006 R11: ffffffff803373f0 R12: 0000000000000000
> R13: 0000000000000005 R14: ffff810bff4a1000 R15: 0000000000000000
> FS: 0000000000000000(0000) GS:ffffffff805d8000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: ffffffffffffffff CR3: 0000000000201000 CR4: 00000000000006e0
> Process swapper (pid: 1, threadinfo ffff810bffcbc000, task ffff810bffcbb510)
> Stack: ffff810bff4a1000 ffffffff8055f4c0 0000000000000000 ffff810bffcbdef0
> 0000000000000000 ffffffff8042736e 0000000000000000 0000000000000000
> 0000000000000000 ffffffff8061c68d ffffffff806260f0 ffffffff80207182
> Call Trace:
> [<ffffffff8042736e>] register_netdevice_notifier+0x3e/0x70
> [<ffffffff8061c68d>] packet_init+0x2d/0x53
> [<ffffffff80207182>] init+0x162/0x330
> [<ffffffff8020a9d8>] child_rip+0xa/0x12
> [<ffffffff8033c2a2>] acpi_ds_init_one_object+0x0/0x82
> [<ffffffff80207020>] init+0x0/0x330
> [<ffffffff8020a9ce>] child_rip+0x0/0x12
>
>
> Code: 48 8b 45 00 0f 18 08 49 83 fd 02 4c 8d 65 f8 0f 84 f8 fe ff
> RIP [<ffffffff8047ef93>] packet_notifier+0x163/0x1a0
> RSP <ffff810bffcbde90>
> CR2: ffffffffffffffff
> <0>Kernel panic - not syncing: Attempted to kill init!
>

I'm really struggling to work out what went wrong there. Comparing your
miserable 20 bytes of code to my object code makes me think that this:

struct packet_sock *po = pkt_sk(sk);

returned -1, perhaps in %ebp. But it's all very crude.

Perhaps you could compile that kernel with CONFIG_DEBUG_INFO, rerun it (the
addresses might change) then have a poke around with `gdb vmlinux' (or
maybe just addr2line) to work out where it's really oopsing?

I don't see much which has changed in that area recently.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jim Cromie on

[jimc(a)harpo linux-2.6.18-mm2-sk]$ make
CHK include/linux/version.h
CHK include/linux/utsrelease.h
CHK include/linux/compile.h
GEN .version
CHK include/linux/compile.h
UPD include/linux/compile.h
CC init/version.o
LD init/built-in.o
LD .tmp_vmlinux1
arch/i386/kernel/built-in.o(.text+0x34f1): In function `do_nmi':
arch/i386/kernel/traps.c:752: undefined reference to
`panic_on_unrecovered_nmi'
arch/i386/kernel/built-in.o(.text+0x3564):arch/i386/kernel/traps.c:712:
undefined reference to `panic_on_unrecovered_nmi'


$ grep nmi arch/i386/kernel/Makefile
obj-$(CONFIG_X86_LOCAL_APIC) += apic.o nmi.o

which I dont have enabled.

It looks to be due to changes in x86_64-mm-nmi-sysctl-cleanup.patch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/