2.6.34-rc1: rcu lockdep bug? [Kernel]

Prev: sched: remove AFFINE_WAKEUPS feature
Next: [PATCH] kbuild: Really don't clean bounds.h and asm-offsets.h

From: Américo Wang on 11 Mar 2010 05:10

Hello, Paul and Peter,

Attached is the lockdep warning that I triggered today.

I am not sure if this is a bug of rcu lockdep, because I am
testing my patch when this occurred. However, in the backtrace,
there is none of the functions that I touched, weird.

So, please help to check if this is a bug of rcu lockdep.

Please Cc netdev when necessary.

Thanks much!

From: Paul E. McKenney on 11 Mar 2010 08:50

On Thu, Mar 11, 2010 at 06:05:38PM +0800, Am�rico Wang wrote:
> Hello, Paul and Peter,
>
> Attached is the lockdep warning that I triggered today.
>
> I am not sure if this is a bug of rcu lockdep, because I am
> testing my patch when this occurred. However, in the backtrace,
> there is none of the functions that I touched, weird.
>
> So, please help to check if this is a bug of rcu lockdep.

This sort of thing is caused by acquiring the same lock with softirq
(AKA BH) blocked and not, which can result in self-deadlock.

There was such a bug in the RCU lockdep stuff in -tip, but it has long
since been fixed. If you were seeing that bug, rcu_do_batch() would
be on the stack, which it does not appear to be.

So does your patch involve the usbfs_mutex? Or attempt to manipulate
vfs/fs state from withing networking softirq/BH context?

Thanx, Paul

> Please Cc netdev when necessary.
>
> Thanks much!

> Mar 11 17:11:22 dhcp-66-70-5 kernel: =================================
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [ INFO: inconsistent lock state ]
> Mar 11 17:11:22 dhcp-66-70-5 kernel: 2.6.34-rc1 #58
> Mar 11 17:11:22 dhcp-66-70-5 kernel: ---------------------------------
> Mar 11 17:11:22 dhcp-66-70-5 kernel: inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> Mar 11 17:11:22 dhcp-66-70-5 kernel: swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
> Mar 11 17:11:22 dhcp-66-70-5 kernel: (usbfs_mutex){+.?...}, at: [<ffffffff8146419f>] netif_receive_skb+0xe7/0x819
> Mar 11 17:11:22 dhcp-66-70-5 kernel: {SOFTIRQ-ON-W} state was registered at:
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff810a6dc2>] __lock_acquire+0xaec/0xe55
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff810a9585>] lock_acquire+0x163/0x1aa
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff8154f5f9>] mutex_lock_nested+0x64/0x4e9
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff813e5ec5>] usbdev_open+0x115/0x4e9
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff8117af83>] chrdev_open+0x27a/0x2fe
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff81172f98>] __dentry_open+0x2d4/0x4d2
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff81174560>] nameidata_to_filp+0x58/0x7e
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff811858e9>] do_last+0x81b/0xa3d
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff81187220>] do_filp_open+0x2ff/0x869
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff81172b53>] do_sys_open+0x8c/0x187
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff81172c96>] sys_open+0x27/0x30
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff81003a5b>] system_call_fastpath+0x16/0x1b
> Mar 11 17:11:22 dhcp-66-70-5 kernel: irq event stamp: 7135619
> Mar 11 17:11:22 dhcp-66-70-5 kernel: hardirqs last enabled at (7135618): [<ffffffff8116c69f>] __kmalloc_node_track_caller+0x18d/0x256
> Mar 11 17:11:22 dhcp-66-70-5 kernel: hardirqs last disabled at (7135619): [<ffffffff8155220a>] _raw_spin_lock_irqsave+0x3f/0xe2
> Mar 11 17:11:22 dhcp-66-70-5 kernel: softirqs last enabled at (7135604): [<ffffffff8106a9a8>] __do_softirq+0x334/0x35f
> Mar 11 17:11:22 dhcp-66-70-5 kernel: softirqs last disabled at (7135611): [<ffffffff8100498c>] call_softirq+0x1c/0x4c
> Mar 11 17:11:22 dhcp-66-70-5 kernel:
> Mar 11 17:11:22 dhcp-66-70-5 kernel: other info that might help us debug this:
> Mar 11 17:11:22 dhcp-66-70-5 kernel: 2 locks held by swapper/0:
> Mar 11 17:11:22 dhcp-66-70-5 kernel: #0: (rcu_read_lock){.+.+..}, at: [<ffffffff81468baa>] net_rx_action+0xc9/0x40f
> Mar 11 17:11:22 dhcp-66-70-5 kernel: #1: (&(&napi->poll_lock)->rlock){+.-...}, at: [<ffffffff81468c39>] net_rx_action+0x158/0x40f
> Mar 11 17:11:22 dhcp-66-70-5 kernel:
> Mar 11 17:11:22 dhcp-66-70-5 kernel: stack backtrace:
> Mar 11 17:11:22 dhcp-66-70-5 kernel: Pid: 0, comm: swapper Not tainted 2.6.34-rc1 #58
> Mar 11 17:11:22 dhcp-66-70-5 kernel: Call Trace:
> Mar 11 17:11:22 dhcp-66-70-5 kernel: <IRQ> [<ffffffff810a1985>] print_usage_bug+0x26b/0x283
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff810a4486>] ? check_usage_forwards+0x0/0x141
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff810a1e7d>] mark_lock+0x4e0/0x8ff
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff810a6ce3>] __lock_acquire+0xa0d/0xe55
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff810a9585>] lock_acquire+0x163/0x1aa
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff8146419f>] ? netif_receive_skb+0xe7/0x819
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff81552237>] _raw_spin_lock_irqsave+0x6c/0xe2
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff8146419f>] ? netif_receive_skb+0xe7/0x819
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff8146419f>] netif_receive_skb+0xe7/0x819
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff814649a8>] napi_skb_finish+0x37/0x6a
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff81465118>] napi_gro_receive+0x44/0x50
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffffa01d8bdc>] tg3_poll_work+0x9c6/0xfab [tg3]
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff810a7063>] ? __lock_acquire+0xd8d/0xe55
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffffa01d92bc>] tg3_poll+0xfb/0x322 [tg3]
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff81468c95>] net_rx_action+0x1b4/0x40f
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff81468baa>] ? net_rx_action+0xc9/0x40f
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff8106a808>] __do_softirq+0x194/0x35f
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff8100498c>] call_softirq+0x1c/0x4c
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff81006d10>] do_softirq+0x8c/0x181
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff8106a66b>] irq_exit+0xa5/0xae
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff8155c2aa>] do_IRQ+0x10a/0x136
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff81553453>] ret_from_intr+0x0/0xf
> Mar 11 17:11:22 dhcp-66-70-5 kernel: <EOI> [<ffffffff81332544>] ? acpi_safe_halt+0x61/0x84
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff8133253b>] ? acpi_safe_halt+0x58/0x84
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff8133270e>] acpi_idle_enter_c1+0xd7/0x17b
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff815581b4>] ? __atomic_notifier_call_chain+0x92/0xa7
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff8142960f>] ? ladder_select_state+0x42/0x23d
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff81428439>] cpuidle_idle_call+0x104/0x1b4
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff810020df>] cpu_idle+0xfd/0x163
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff8152cb36>] rest_init+0x13a/0x145
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff8152c9fc>] ? rest_init+0x0/0x145
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff81c92702>] start_kernel+0x789/0x79d
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff81c912f4>] x86_64_start_reservations+0x104/0x10f
> Mar 11 17:11:22 dhcp-66-70-5 kernel: [<ffffffff81c914df>] x86_64_start_kernel+0x1e0/0x1ee

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Américo Wang on 11 Mar 2010 11:20

On Thu, Mar 11, 2010 at 05:45:56AM -0800, Paul E. McKenney wrote:
>On Thu, Mar 11, 2010 at 06:05:38PM +0800, Américo Wang wrote:
>> Hello, Paul and Peter,
>>
>> Attached is the lockdep warning that I triggered today.
>>
>> I am not sure if this is a bug of rcu lockdep, because I am
>> testing my patch when this occurred. However, in the backtrace,
>> there is none of the functions that I touched, weird.
>>
>> So, please help to check if this is a bug of rcu lockdep.
>
>This sort of thing is caused by acquiring the same lock with softirq
>(AKA BH) blocked and not, which can result in self-deadlock.
>
>There was such a bug in the RCU lockdep stuff in -tip, but it has long
>since been fixed. If you were seeing that bug, rcu_do_batch() would
>be on the stack, which it does not appear to be.
>
>So does your patch involve the usbfs_mutex? Or attempt to manipulate
>vfs/fs state from withing networking softirq/BH context?
>

Nope, it is a patch for netpoll, nothing related with usb, nor vfs.

Thanks much!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Américo Wang on 12 Mar 2010 03:00

(Cc'ing netdev)

On Fri, Mar 12, 2010 at 12:17 AM, Américo Wang <xiyou.wangcong(a)gmail.com> wrote:
> On Thu, Mar 11, 2010 at 05:45:56AM -0800, Paul E. McKenney wrote:
>>On Thu, Mar 11, 2010 at 06:05:38PM +0800, Américo Wang wrote:
>>> Hello, Paul and Peter,
>>>
>>> Attached is the lockdep warning that I triggered today.
>>>
>>> I am not sure if this is a bug of rcu lockdep, because I am
>>> testing my patch when this occurred. However, in the backtrace,
>>> there is none of the functions that I touched, weird.
>>>
>>> So, please help to check if this is a bug of rcu lockdep.
>>
>>This sort of thing is caused by acquiring the same lock with softirq
>>(AKA BH) blocked and not, which can result in self-deadlock.
>>
>>There was such a bug in the RCU lockdep stuff in -tip, but it has long
>>since been fixed. If you were seeing that bug, rcu_do_batch() would
>>be on the stack, which it does not appear to be.
>>
>>So does your patch involve the usbfs_mutex? Or attempt to manipulate
>>vfs/fs state from withing networking softirq/BH context?
>>
>
> Nope, it is a patch for netpoll, nothing related with usb, nor vfs.
>

Ok, after decoding the lockdep output, it looks like that
netif_receive_skb() should call rcu_read_lock_bh() instead of rcu_read_lock()?
But I don't know if all callers of netif_receive_skb() are in softirq context.

Paul, what do you think?

Thank you.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: David Miller on 12 Mar 2010 03:10

From: Am�rico Wang <xiyou.wangcong(a)gmail.com>
Date: Fri, 12 Mar 2010 15:56:03 +0800

> Ok, after decoding the lockdep output, it looks like that
> netif_receive_skb() should call rcu_read_lock_bh() instead of rcu_read_lock()?
> But I don't know if all callers of netif_receive_skb() are in softirq context.

Normally, netif_receive_skb() is invoked from softirq context.

However, via netpoll it can be invoked essentially from any context.

But, when this happens, the networking receive path makes amends such
that this works fine. That's what the netpoll_receive_skb() check in
netif_receive_skb() is for. That check makes it bail out early if the
call to netif_receive_skb() is via a netpoll invocation.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2 3 4
Prev: sched: remove AFFINE_WAKEUPS feature
Next: [PATCH] kbuild: Really don't clean bounds.h and asm-offsets.h