From: Andy Gospodarek on
On Mon, Apr 05, 2010 at 05:12:40AM -0400, Amerigo Wang wrote:
>
> Based on Andy's work, but I modified a lot.
>
> Similar to the patch for bridge, this patch does:
>
> 1) implement the 2 methods to support netpoll for bonding;
>
> 2) modify netpoll during forwarding packets via bonding;
>
> 3) disable netpoll support of bonding when a netpoll-unabled device
> is added to bonding;
>
> 4) enable netpoll support when all underlying devices support netpoll.
>
> Cc: Andy Gospodarek <gospo(a)redhat.com>
> Cc: Jeff Moyer <jmoyer(a)redhat.com>
> Cc: Matt Mackall <mpm(a)selenic.com>
> Cc: Neil Horman <nhorman(a)tuxdriver.com>
> Cc: Jay Vosburgh <fubar(a)us.ibm.com>
> Cc: David Miller <davem(a)davemloft.net>
> Signed-off-by: WANG Cong <amwang(a)redhat.com>
>

I tried these patches on top of Linus' latest tree and still get
deadlocks. Your line numbers might differ a bit, but you should be
seeing them too.

# echo 7 4 1 7 > /proc/sys/kernel/printk
# ifup bond0
bonding: bond0: setting mode to balance-rr (0).
bonding: bond0: Setting MII monitoring interval to 1000.
ADDRCONF(NETDEV_UP): bond0: link is not ready
bonding: bond0: Adding slave eth4.
bnx2 0000:10:00.0: eth4: using MSIX
bonding: bond0: enslaving eth4 as an active interface with a down link.
bonding: bond0: Adding slave eth5.
bnx2 0000:10:00.1: eth5: using MSIX
bonding: bond0: enslaving eth5 as an active interface with a down link.
bnx2 0000:10:00.0: eth4: NIC Copper Link is Up, 100 Mbps full duplex,
receive & transmit flow control ON
bonding: bond0: link status definitely up for interface eth4.
ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
bnx2 0000:10:00.1: eth5: NIC Copper Link is Up, 100 Mbps full duplex,
receive & transmit flow control ON
bond0: IPv6 duplicate address fe80::210:18ff:fe36:ad4 detected!
bonding: bond0: link status definitely up for interface eth5.
# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 1000
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth4
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:10:18:36:0a:d4

Slave Interface: eth5
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:10:18:36:0a:d6
# modprobe netconsole
netconsole: local port 1234
netconsole: local IP 10.0.100.2
netconsole: interface 'bond0'
netconsole: remote port 6666
netconsole: remote IP 10.0.100.1
netconsole: remote ethernet address 00:e0:81:71:ee:aa
console [netcon0] enabled
netconsole: network logging started
# echo -eth4 > /sys/class/net/bond0/bonding/slaves
bonding: bond0: Removing slave eth4

[ now the system is hung ]

My suspicion from dealing with this problem in the past is that there is
contention over bond->lock.

Since there statements that will result in netconsole messages inside
the write_lock_bh in bond_release:

1882 write_lock_bh(&bond->lock);
1883
1884 slave = bond_get_slave_by_dev(bond, slave_dev);
1885 if (!slave) {
1886 /* not a slave of this bond */
1887 pr_info("%s: %s not enslaved\n",
1888 bond_dev->name, slave_dev->name);
1889 write_unlock_bh(&bond->lock);
1890 return -EINVAL;
1891 }
1892
1893 if (!bond->params.fail_over_mac) {
1894 if (!compare_ether_addr(bond_dev->dev_addr, slave->perm_hwaddr) &&
1895 bond->slave_cnt > 1)
1896 pr_warning("%s: Warning: the permanent HWaddr of %s - %pM - is still in use by %s.

we are getting stuck at 1986 since bond_xmit_roundrobin (in my case)
will try and acquire bond->lock for reading.

One valuable aspect netpoll_start_xmit routine was that is could be used
to check to be sure that bond->lock could be taken for writing. This
made us sure that we were not in a call stack that has already taken the
lock and queuing the skb to be sent later would prevent the imminent
deadlock.

A way to prevent this is needed and a first-pass might be to do
something similar to what I below above for all the xmit routines. I
confirmed the following patch prevents that deadlock:

# git diff drivers/net/bonding/
diff --git a/drivers/net/bonding/bond_main.c
b/drivers/net/bonding/bond_main.c
index 4a41886..53b39cc 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4232,7 +4232,8 @@ static int bond_xmit_roundrobin(struct sk_buff *skb, struc
int i, slave_no, res = 1;
struct iphdr *iph = ip_hdr(skb);

- read_lock(&bond->lock);
+ if (!read_trylock(&bond->lock))
+ return NETDEV_TX_BUSY;

if (!BOND_IS_OK(bond))
goto out;

The kernel no longer hangs, but a new warning message shows up (over
netconsole even!):

------------[ cut here ]------------
WARNING: at kernel/softirq.c:143 local_bh_enable+0x43/0xba()
Hardware name: HP xw4400 Workstation
Modules linked in: tg3 netconsole bonding ipt_REJECT bridge stp autofs4
i2c_dev i2c_core hidp rfcomm l2cap crc16 bluetooth rfkill sunrpc 8021q
iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter
ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq dm_multipath
video output sbs sbshc battery acpi_memhotplug ac lp sg ide_cd_mod
tpm_tis rtc_cmos rtc_core serio_raw cdrom libphy e1000e floppy
parport_pc parport button tpm tpm_bios bnx2 rtc_lib tulip pcspkr shpchp
dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ata_piix ahci
libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last
unloaded: tg3]
Pid: 9, comm: events/0 Not tainted 2.6.34-rc3 #6
Call Trace:
[<ffffffff81058754>] ? cpu_clock+0x2d/0x41
[<ffffffff810404d9>] ? local_bh_enable+0x43/0xba
[<ffffffff8103a350>] warn_slowpath_common+0x77/0x8f
[<ffffffff812a4659>] ? dev_queue_xmit+0x408/0x467
[<ffffffff8103a377>] warn_slowpath_null+0xf/0x11
[<ffffffff810404d9>] local_bh_enable+0x43/0xba
[<ffffffff812a4659>] dev_queue_xmit+0x408/0x467
[<ffffffff812a435e>] ? dev_queue_xmit+0x10d/0x467
[<ffffffffa04a3868>] bond_dev_queue_xmit+0x1cd/0x1f9 [bonding]
[<ffffffffa04a4217>] bond_start_xmit+0x139/0x3e9 [bonding]
[<ffffffff812b0e9a>] queue_process+0xa8/0x160
[<ffffffff812b0df2>] ? queue_process+0x0/0x160
[<ffffffff81003794>] kernel_thread_helper+0x4/0x10
[<ffffffff813362bc>] ? restore_args+0x0/0x30
[<ffffffff81053884>] ? kthread+0x0/0x85

to point out possible locking issues (probably in netpoll_send_skb) that
I would suggest you investigate further. It may point to why we cannot
perform an:

# rmmod bonding

without the system deadlocking (even with my patch above).

> ---
>
> Index: linux-2.6/drivers/net/bonding/bond_main.c
> ===================================================================
> --- linux-2.6.orig/drivers/net/bonding/bond_main.c
> +++ linux-2.6/drivers/net/bonding/bond_main.c
> @@ -59,6 +59,7 @@
> #include <linux/uaccess.h>
> #include <linux/errno.h>
> #include <linux/netdevice.h>
> +#include <linux/netpoll.h>
> #include <linux/inetdevice.h>
> #include <linux/igmp.h>
> #include <linux/etherdevice.h>
> @@ -430,7 +431,18 @@ int bond_dev_queue_xmit(struct bonding *
> }
>
> skb->priority = 1;
> - dev_queue_xmit(skb);
> +#ifdef CONFIG_NET_POLL_CONTROLLER
> + if (bond->dev->priv_flags & IFF_IN_NETPOLL) {
> + struct netpoll *np = bond->dev->npinfo->netpoll;
> + slave_dev->npinfo = bond->dev->npinfo;
> + np->real_dev = np->dev = skb->dev;
> + slave_dev->priv_flags |= IFF_IN_NETPOLL;
> + netpoll_send_skb(np, skb);
> + slave_dev->priv_flags &= ~IFF_IN_NETPOLL;
> + np->dev = bond->dev;
> + } else
> +#endif
> + dev_queue_xmit(skb);
>
> return 0;
> }
> @@ -1329,6 +1341,60 @@ static void bond_detach_slave(struct bon
> bond->slave_cnt--;
> }
>
> +#ifdef CONFIG_NET_POLL_CONTROLLER
> +static bool slaves_support_netpoll(struct net_device *bond_dev)
> +{
> + struct bonding *bond = netdev_priv(bond_dev);
> + struct slave *slave;
> + int i = 0;
> + bool ret = true;
> +
> + read_lock(&bond->lock);
> + bond_for_each_slave(bond, slave, i) {
> + if ((slave->dev->priv_flags & IFF_DISABLE_NETPOLL)
> + || !slave->dev->netdev_ops->ndo_poll_controller)
> + ret = false;
> + }
> + read_unlock(&bond->lock);
> + return i != 0 && ret;
> +}
> +
> +static void bond_poll_controller(struct net_device *bond_dev)
> +{
> + struct net_device *dev = bond_dev->npinfo->netpoll->real_dev;
> + if (dev != bond_dev)
> + netpoll_poll_dev(dev);
> +}
> +
> +static void bond_netpoll_cleanup(struct net_device *bond_dev)
> +{
> + struct bonding *bond = netdev_priv(bond_dev);
> + struct slave *slave;
> + const struct net_device_ops *ops;
> + int i;
> +
> + read_lock(&bond->lock);
> + bond_dev->npinfo = NULL;
> + bond_for_each_slave(bond, slave, i) {
> + if (slave->dev) {
> + ops = slave->dev->netdev_ops;
> + if (ops->ndo_netpoll_cleanup)
> + ops->ndo_netpoll_cleanup(slave->dev);
> + else
> + slave->dev->npinfo = NULL;
> + }
> + }
> + read_unlock(&bond->lock);
> +}
> +
> +#else
> +
> +static void bond_netpoll_cleanup(struct net_device *bond_dev)
> +{
> +}
> +
> +#endif
> +
> /*---------------------------------- IOCTL ----------------------------------*/
>
> static int bond_sethwaddr(struct net_device *bond_dev,
> @@ -1746,6 +1812,18 @@ int bond_enslave(struct net_device *bond
> new_slave->state == BOND_STATE_ACTIVE ? "n active" : " backup",
> new_slave->link != BOND_LINK_DOWN ? "n up" : " down");
>
> +#ifdef CONFIG_NET_POLL_CONTROLLER
> + if (slaves_support_netpoll(bond_dev)) {
> + bond_dev->priv_flags &= ~IFF_DISABLE_NETPOLL;
> + if (bond_dev->npinfo)
> + slave_dev->npinfo = bond_dev->npinfo;
> + } else if (!(bond_dev->priv_flags & IFF_DISABLE_NETPOLL)) {
> + bond_dev->priv_flags |= IFF_DISABLE_NETPOLL;
> + pr_info("New slave device %s does not support netpoll\n",
> + slave_dev->name);
> + pr_info("Disabling netpoll support for %s\n", bond_dev->name);
> + }
> +#endif
> /* enslave is successful */
> return 0;
>
> @@ -1929,6 +2007,15 @@ int bond_release(struct net_device *bond
>
> netdev_set_master(slave_dev, NULL);
>
> +#ifdef CONFIG_NET_POLL_CONTROLLER
> + if (slaves_support_netpoll(bond_dev))
> + bond_dev->priv_flags &= ~IFF_DISABLE_NETPOLL;
> + if (slave_dev->netdev_ops->ndo_netpoll_cleanup)
> + slave_dev->netdev_ops->ndo_netpoll_cleanup(slave_dev);
> + else
> + slave_dev->npinfo = NULL;
> +#endif
> +
> /* close slave before restoring its mac address */
> dev_close(slave_dev);
>
> @@ -4448,6 +4535,10 @@ static const struct net_device_ops bond_
> .ndo_vlan_rx_register = bond_vlan_rx_register,
> .ndo_vlan_rx_add_vid = bond_vlan_rx_add_vid,
> .ndo_vlan_rx_kill_vid = bond_vlan_rx_kill_vid,
> +#ifdef CONFIG_NET_POLL_CONTROLLER
> + .ndo_netpoll_cleanup = bond_netpoll_cleanup,
> + .ndo_poll_controller = bond_poll_controller,
> +#endif
> };
>
> static void bond_setup(struct net_device *bond_dev)
> @@ -4533,6 +4624,8 @@ static void bond_uninit(struct net_devic
> {
> struct bonding *bond = netdev_priv(bond_dev);
>
> + bond_netpoll_cleanup(bond_dev);
> +
> /* Release the bonded slaves */
> bond_release_all(bond_dev);
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Cong Wang on
Andy Gospodarek wrote:
>
> I tried these patches on top of Linus' latest tree and still get
> deadlocks. Your line numbers might differ a bit, but you should be
> seeing them too.
>


Yeah, my local clone is some days behind Linus' latest tree. :)


> # echo 7 4 1 7 > /proc/sys/kernel/printk
> # ifup bond0
> bonding: bond0: setting mode to balance-rr (0).
> bonding: bond0: Setting MII monitoring interval to 1000.
> ADDRCONF(NETDEV_UP): bond0: link is not ready
> bonding: bond0: Adding slave eth4.
> bnx2 0000:10:00.0: eth4: using MSIX
> bonding: bond0: enslaving eth4 as an active interface with a down link.
> bonding: bond0: Adding slave eth5.
> bnx2 0000:10:00.1: eth5: using MSIX
> bonding: bond0: enslaving eth5 as an active interface with a down link.
> bnx2 0000:10:00.0: eth4: NIC Copper Link is Up, 100 Mbps full duplex,
> receive & transmit flow control ON
> bonding: bond0: link status definitely up for interface eth4.
> ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
> bnx2 0000:10:00.1: eth5: NIC Copper Link is Up, 100 Mbps full duplex,
> receive & transmit flow control ON
> bond0: IPv6 duplicate address fe80::210:18ff:fe36:ad4 detected!
> bonding: bond0: link status definitely up for interface eth5.
> # cat /proc/net/bonding/bond0
> Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
>
> Bonding Mode: load balancing (round-robin)
> MII Status: up
> MII Polling Interval (ms): 1000
> Up Delay (ms): 0
> Down Delay (ms): 0
>
> Slave Interface: eth4
> MII Status: up
> Link Failure Count: 0
> Permanent HW addr: 00:10:18:36:0a:d4
>
> Slave Interface: eth5
> MII Status: up
> Link Failure Count: 0
> Permanent HW addr: 00:10:18:36:0a:d6
> # modprobe netconsole
> netconsole: local port 1234
> netconsole: local IP 10.0.100.2
> netconsole: interface 'bond0'
> netconsole: remote port 6666
> netconsole: remote IP 10.0.100.1
> netconsole: remote ethernet address 00:e0:81:71:ee:aa
> console [netcon0] enabled
> netconsole: network logging started
> # echo -eth4 > /sys/class/net/bond0/bonding/slaves
> bonding: bond0: Removing slave eth4
>
> [ now the system is hung ]
>
> My suspicion from dealing with this problem in the past is that there is
> contention over bond->lock.
>
> Since there statements that will result in netconsole messages inside
> the write_lock_bh in bond_release:
>
> 1882 write_lock_bh(&bond->lock);
> 1883
> 1884 slave = bond_get_slave_by_dev(bond, slave_dev);
> 1885 if (!slave) {
> 1886 /* not a slave of this bond */
> 1887 pr_info("%s: %s not enslaved\n",
> 1888 bond_dev->name, slave_dev->name);
> 1889 write_unlock_bh(&bond->lock);
> 1890 return -EINVAL;
> 1891 }
> 1892
> 1893 if (!bond->params.fail_over_mac) {
> 1894 if (!compare_ether_addr(bond_dev->dev_addr, slave->perm_hwaddr) &&
> 1895 bond->slave_cnt > 1)
> 1896 pr_warning("%s: Warning: the permanent HWaddr of %s - %pM - is still in use by %s.
>
> we are getting stuck at 1986 since bond_xmit_roundrobin (in my case)
> will try and acquire bond->lock for reading.
>
> One valuable aspect netpoll_start_xmit routine was that is could be used
> to check to be sure that bond->lock could be taken for writing. This
> made us sure that we were not in a call stack that has already taken the
> lock and queuing the skb to be sent later would prevent the imminent
> deadlock.
>
> A way to prevent this is needed and a first-pass might be to do
> something similar to what I below above for all the xmit routines. I
> confirmed the following patch prevents that deadlock:
>
> # git diff drivers/net/bonding/
> diff --git a/drivers/net/bonding/bond_main.c
> b/drivers/net/bonding/bond_main.c
> index 4a41886..53b39cc 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -4232,7 +4232,8 @@ static int bond_xmit_roundrobin(struct sk_buff *skb, struc
> int i, slave_no, res = 1;
> struct iphdr *iph = ip_hdr(skb);
>
> - read_lock(&bond->lock);
> + if (!read_trylock(&bond->lock))
> + return NETDEV_TX_BUSY;
>
> if (!BOND_IS_OK(bond))
> goto out;
>
> The kernel no longer hangs, but a new warning message shows up (over
> netconsole even!):
>
> ------------[ cut here ]------------
> WARNING: at kernel/softirq.c:143 local_bh_enable+0x43/0xba()
> Hardware name: HP xw4400 Workstation
> Modules linked in: tg3 netconsole bonding ipt_REJECT bridge stp autofs4
> i2c_dev i2c_core hidp rfcomm l2cap crc16 bluetooth rfkill sunrpc 8021q
> iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter
> ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq dm_multipath
> video output sbs sbshc battery acpi_memhotplug ac lp sg ide_cd_mod
> tpm_tis rtc_cmos rtc_core serio_raw cdrom libphy e1000e floppy
> parport_pc parport button tpm tpm_bios bnx2 rtc_lib tulip pcspkr shpchp
> dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ata_piix ahci
> libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last
> unloaded: tg3]
> Pid: 9, comm: events/0 Not tainted 2.6.34-rc3 #6
> Call Trace:
> [<ffffffff81058754>] ? cpu_clock+0x2d/0x41
> [<ffffffff810404d9>] ? local_bh_enable+0x43/0xba
> [<ffffffff8103a350>] warn_slowpath_common+0x77/0x8f
> [<ffffffff812a4659>] ? dev_queue_xmit+0x408/0x467
> [<ffffffff8103a377>] warn_slowpath_null+0xf/0x11
> [<ffffffff810404d9>] local_bh_enable+0x43/0xba
> [<ffffffff812a4659>] dev_queue_xmit+0x408/0x467
> [<ffffffff812a435e>] ? dev_queue_xmit+0x10d/0x467
> [<ffffffffa04a3868>] bond_dev_queue_xmit+0x1cd/0x1f9 [bonding]
> [<ffffffffa04a4217>] bond_start_xmit+0x139/0x3e9 [bonding]
> [<ffffffff812b0e9a>] queue_process+0xa8/0x160
> [<ffffffff812b0df2>] ? queue_process+0x0/0x160
> [<ffffffff81003794>] kernel_thread_helper+0x4/0x10
> [<ffffffff813362bc>] ? restore_args+0x0/0x30
> [<ffffffff81053884>] ? kthread+0x0/0x85
>
> to point out possible locking issues (probably in netpoll_send_skb) that
> I would suggest you investigate further. It may point to why we cannot
> perform an:
>
> # rmmod bonding
>
> without the system deadlocking (even with my patch above).
>

Thanks a lot for testing!

Before I try to reproduce it, could you please try to replace the 'read_lock()'
in slaves_support_netpoll() with 'read_lock_bh()'? (read_unlock() too) Try if this helps.

After I reproduce this, I will try it too.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Cong Wang on
Cong Wang wrote:
> Before I try to reproduce it, could you please try to replace the
> 'read_lock()'
> in slaves_support_netpoll() with 'read_lock_bh()'? (read_unlock() too)
> Try if this helps.
>

Confirmed. Please use the attached patch instead, for your testing.

Thanks!

From: Andy Gospodarek on
On Tue, Apr 06, 2010 at 12:38:16PM +0800, Cong Wang wrote:
> Cong Wang wrote:
>> Before I try to reproduce it, could you please try to replace the
>> 'read_lock()'
>> in slaves_support_netpoll() with 'read_lock_bh()'? (read_unlock() too)
>> Try if this helps.
>>
>
> Confirmed. Please use the attached patch instead, for your testing.
>
> Thanks!
>

Moving those locks to bh-locks will not resolve this. I tried that
yesterday and tried your new patch today without success. That warning
is a WARN_ON_ONCE so you need to reboot to see that it is still a
problem. Simply unloading and loading the new module is not an accurate
test.

Also, my system still hangs when removing the bonding module. I do not
think you intended to fix this with the patch, but wanted it to be clear
to everyone on the list.

You should also configure your kernel with a some of the lock debugging
enabled. I've been using the following:

CONFIG_DETECT_HUNG_TASK=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
CONFIG_LOCK_STAT=y
CONFIG_DEBUG_LOCKDEP=y

Here is the output when I remove a slave from the bond. My
xmit_roundrobin patch from earlier (replacing read_lock with
read_trylock) was applied. It might be helpful for you when debugging
these issues.

------------[ cut here ]------------
WARNING: at kernel/softirq.c:143 local_bh_enable+0x43/0xba()
Hardware name: HP xw4400 Workstation
Modules linked in: netconsole bonding ipt_REJECT bridge stp autofs4 i2c_dev i2c_core hidp rfcomm
l2cap crc16 bluetooth rfki]
Pid: 10, comm: events/1 Not tainted 2.6.34-rc3 #6
Call Trace:
[<ffffffff81058754>] ? cpu_clock+0x2d/0x41
[<ffffffff810404d9>] ? local_bh_enable+0x43/0xba
[<ffffffff8103a350>] warn_slowpath_common+0x77/0x8f
[<ffffffff812a4659>] ? dev_queue_xmit+0x408/0x467
[<ffffffff8103a377>] warn_slowpath_null+0xf/0x11
[<ffffffff810404d9>] local_bh_enable+0x43/0xba
[<ffffffff812a4659>] dev_queue_xmit+0x408/0x467
[<ffffffff812a435e>] ? dev_queue_xmit+0x10d/0x467
[<ffffffffa04a383f>] bond_dev_queue_xmit+0x1cd/0x1f9 [bonding]
[<ffffffffa04a41ee>] bond_start_xmit+0x139/0x3e9 [bonding]
[<ffffffff812b0e9a>] queue_process+0xa8/0x160
[<ffffffff812b0df2>] ? queue_process+0x0/0x160
[<ffffffff81050bfb>] worker_thread+0x1af/0x2ae
[<ffffffff81050ba2>] ? worker_thread+0x156/0x2ae
[<ffffffff81053c34>] ? autoremove_wake_function+0x0/0x38
[<ffffffff81050a4c>] ? worker_thread+0x0/0x2ae
[<ffffffff81053901>] kthread+0x7d/0x85
[<ffffffff81003794>] kernel_thread_helper+0x4/0x10
[<ffffffff813362bc>] ? restore_args+0x0/0x30
[<ffffffff81053884>] ? kthread+0x0/0x85
[<ffffffff81003790>] ? kernel_thread_helper+0x0/0x10
---[ end trace 241f49bf65e0f4f0 ]---

=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.34-rc3 #6
---------------------------------------------------------
events/1/10 just changed the state of lock:
(&bonding_netdev_xmit_lock_key){+.+...}, at: [<ffffffff812b0e75>] queue_process+0x83/0x160
but this lock was taken by another, SOFTIRQ-safe lock in the past:
(&(&dev->tx_global_lock)->rlock){+.-...}

and interrupts could create inverse lock ordering between them.


other info that might help us debug this:
4 locks held by events/1/10:
#0: (events){+.+.+.}, at: [<ffffffff81050ba2>] worker_thread+0x156/0x2ae
#1: ((&(&npinfo->tx_work)->work)){+.+...}, at: [<ffffffff81050ba2>] worker_thread+0x156/0x2ae
#2: (&bonding_netdev_xmit_lock_key){+.+...}, at: [<ffffffff812b0e75>] queue_process+0x83/0x160
#3: (&bond->lock){++.+..}, at: [<ffffffffa04a4107>] bond_start_xmit+0x52/0x3e9 [bonding]

the shortest dependencies between 2nd lock and 1st lock:
-> (&(&dev->tx_global_lock)->rlock){+.-...} ops: 129 {
HARDIRQ-ON-W at:
[<ffffffff810651ef>] __lock_acquire+0x643/0x813
[<ffffffff81065487>] lock_acquire+0xc8/0xed
[<ffffffff81335742>] _raw_spin_lock+0x31/0x66
[<ffffffff812b64bd>] dev_deactivate+0x6f/0x195
[<ffffffff812ad7c4>] linkwatch_do_dev+0x9a/0xae
[<ffffffff812ada6a>] __linkwatch_run_queue+0x106/0x14a
[<ffffffff812adad8>] linkwatch_event+0x2a/0x31
[<ffffffff81050bfb>] worker_thread+0x1af/0x2ae
[<ffffffff81053901>] kthread+0x7d/0x85
[<ffffffff81003794>] kernel_thread_helper+0x4/0x10
IN-SOFTIRQ-W at:
[<ffffffff810651a3>] __lock_acquire+0x5f7/0x813
[<ffffffff81065487>] lock_acquire+0xc8/0xed
[<ffffffff81335742>] _raw_spin_lock+0x31/0x66
[<ffffffff812b6606>] dev_watchdog+0x23/0x1f2
[<ffffffff8104701b>] run_timer_softirq+0x1d1/0x285
[<ffffffff81040021>] __do_softirq+0xdb/0x1ab
[<ffffffff8100388c>] call_softirq+0x1c/0x34
[<ffffffff81004f9d>] do_softirq+0x38/0x83
[<ffffffff8103ff44>] irq_exit+0x45/0x47
[<ffffffff810193bc>] smp_apic_timer_interrupt+0x88/0x98
[<ffffffff81003353>] apic_timer_interrupt+0x13/0x20
[<ffffffff81001a21>] cpu_idle+0x4d/0x6b
[<ffffffff8131da3a>] rest_init+0xbe/0xc2
[<ffffffff81a00d4e>] start_kernel+0x38c/0x399
[<ffffffff81a002a5>] x86_64_start_reservations+0xb5/0xb9
[<ffffffff81a0038f>] x86_64_start_kernel+0xe6/0xed
INITIAL USE at:
[<ffffffff8106525c>] __lock_acquire+0x6b0/0x813
[<ffffffff81065487>] lock_acquire+0xc8/0xed
[<ffffffff81335742>] _raw_spin_lock+0x31/0x66
[<ffffffff812b64bd>] dev_deactivate+0x6f/0x195
[<ffffffff812ad7c4>] linkwatch_do_dev+0x9a/0xae
[<ffffffff812ada6a>] __linkwatch_run_queue+0x106/0x14a
[<ffffffff812adad8>] linkwatch_event+0x2a/0x31
[<ffffffff81050bfb>] worker_thread+0x1af/0x2ae
[<ffffffff81053901>] kthread+0x7d/0x85
[<ffffffff81003794>] kernel_thread_helper+0x4/0x10
}
... key at: [<ffffffff8282ceb0>] __key.51521+0x0/0x8
... acquired at:
[<ffffffff810649f9>] validate_chain+0xb87/0xd3a
[<ffffffff81065359>] __lock_acquire+0x7ad/0x813
[<ffffffff81065487>] lock_acquire+0xc8/0xed
[<ffffffff81335742>] _raw_spin_lock+0x31/0x66
[<ffffffff812b64e4>] dev_deactivate+0x96/0x195
[<ffffffff812a17fc>] __dev_close+0x69/0x86
[<ffffffff8129f8ed>] __dev_change_flags+0xa8/0x12b
[<ffffffff812a148c>] dev_change_flags+0x1c/0x51
[<ffffffff812eee8a>] devinet_ioctl+0x26e/0x5d0
[<ffffffff812ef978>] inet_ioctl+0x8a/0xa2
[<ffffffff8128fc28>] sock_do_ioctl+0x26/0x45
[<ffffffff8128fe5a>] sock_ioctl+0x213/0x226
[<ffffffff810e5988>] vfs_ioctl+0x2a/0x9d
[<ffffffff810e5f13>] do_vfs_ioctl+0x491/0x4e2
[<ffffffff810e5fbb>] sys_ioctl+0x57/0x7a
[<ffffffff8100296b>] system_call_fastpath+0x16/0x1b

-> (&bonding_netdev_xmit_lock_key){+.+...} ops: 2 {
HARDIRQ-ON-W at:
[<ffffffff810651ef>] __lock_acquire+0x643/0x813
[<ffffffff81065487>] lock_acquire+0xc8/0xed
[<ffffffff81335742>] _raw_spin_lock+0x31/0x66
[<ffffffff812b64e4>] dev_deactivate+0x96/0x195
[<ffffffff812a17fc>] __dev_close+0x69/0x86
[<ffffffff8129f8ed>] __dev_change_flags+0xa8/0x12b
[<ffffffff812a148c>] dev_change_flags+0x1c/0x51
[<ffffffff812eee8a>] devinet_ioctl+0x26e/0x5d0
[<ffffffff812ef978>] inet_ioctl+0x8a/0xa2
[<ffffffff8128fc28>] sock_do_ioctl+0x26/0x45
[<ffffffff8128fe5a>] sock_ioctl+0x213/0x226
[<ffffffff810e5988>] vfs_ioctl+0x2a/0x9d
[<ffffffff810e5f13>] do_vfs_ioctl+0x491/0x4e2
[<ffffffff810e5fbb>] sys_ioctl+0x57/0x7a
[<ffffffff8100296b>] system_call_fastpath+0x16/0x1b
SOFTIRQ-ON-W at:
[<ffffffff81062006>] mark_held_locks+0x49/0x69
[<ffffffff81062139>] trace_hardirqs_on_caller+0x113/0x13e
[<ffffffff81062171>] trace_hardirqs_on+0xd/0xf
[<ffffffff81040548>] local_bh_enable+0xb2/0xba
[<ffffffff812a4659>] dev_queue_xmit+0x408/0x467
[<ffffffffa04a383f>] bond_dev_queue_xmit+0x1cd/0x1f9 [bonding]
[<ffffffffa04a41ee>] bond_start_xmit+0x139/0x3e9 [bonding]
[<ffffffff812b0e9a>] queue_process+0xa8/0x160
[<ffffffff81050bfb>] worker_thread+0x1af/0x2ae
[<ffffffff81053901>] kthread+0x7d/0x85
[<ffffffff81003794>] kernel_thread_helper+0x4/0x10
INITIAL USE at:
[<ffffffff8106525c>] __lock_acquire+0x6b0/0x813
[<ffffffff81065487>] lock_acquire+0xc8/0xed
[<ffffffff81335742>] _raw_spin_lock+0x31/0x66
[<ffffffff812b64e4>] dev_deactivate+0x96/0x195
[<ffffffff812a17fc>] __dev_close+0x69/0x86
[<ffffffff8129f8ed>] __dev_change_flags+0xa8/0x12b
[<ffffffff812a148c>] dev_change_flags+0x1c/0x51
[<ffffffff812eee8a>] devinet_ioctl+0x26e/0x5d0
[<ffffffff812ef978>] inet_ioctl+0x8a/0xa2
[<ffffffff8128fc28>] sock_do_ioctl+0x26/0x45
[<ffffffff8128fe5a>] sock_ioctl+0x213/0x226
[<ffffffff810e5988>] vfs_ioctl+0x2a/0x9d
[<ffffffff810e5f13>] do_vfs_ioctl+0x491/0x4e2
[<ffffffff810e5fbb>] sys_ioctl+0x57/0x7a
[<ffffffff8100296b>] system_call_fastpath+0x16/0x1b
}
... key at: [<ffffffffa04b1968>] bonding_netdev_xmit_lock_key+0x0/0xffffffffffffa78c [bonding]
... acquired at:
[<ffffffff8106386d>] check_usage_backwards+0xb8/0xc7
[<ffffffff81061d81>] mark_lock+0x311/0x54d
[<ffffffff81062006>] mark_held_locks+0x49/0x69
[<ffffffff81062139>] trace_hardirqs_on_caller+0x113/0x13e
[<ffffffff81062171>] trace_hardirqs_on+0xd/0xf
[<ffffffff81040548>] local_bh_enable+0xb2/0xba
[<ffffffff812a4659>] dev_queue_xmit+0x408/0x467
[<ffffffffa04a383f>] bond_dev_queue_xmit+0x1cd/0x1f9 [bonding]
[<ffffffffa04a41ee>] bond_start_xmit+0x139/0x3e9 [bonding]
[<ffffffff812b0e9a>] queue_process+0xa8/0x160
[<ffffffff81050bfb>] worker_thread+0x1af/0x2ae
[<ffffffff81053901>] kthread+0x7d/0x85
[<ffffffff81003794>] kernel_thread_helper+0x4/0x10


stack backtrace:
Pid: 10, comm: events/1 Tainted: G W 2.6.34-rc3 #6
Call Trace:
[<ffffffff8106189e>] print_irq_inversion_bug+0x121/0x130
[<ffffffff8106386d>] check_usage_backwards+0xb8/0xc7
[<ffffffff810637b5>] ? check_usage_backwards+0x0/0xc7
[<ffffffff81061d81>] mark_lock+0x311/0x54d
[<ffffffff81062006>] mark_held_locks+0x49/0x69
[<ffffffff81040548>] ? local_bh_enable+0xb2/0xba
[<ffffffff81062139>] trace_hardirqs_on_caller+0x113/0x13e
[<ffffffff812a4659>] ? dev_queue_xmit+0x408/0x467
[<ffffffff81062171>] trace_hardirqs_on+0xd/0xf
[<ffffffff81040548>] local_bh_enable+0xb2/0xba
[<ffffffff812a4659>] dev_queue_xmit+0x408/0x467
[<ffffffff812a435e>] ? dev_queue_xmit+0x10d/0x467
[<ffffffffa04a383f>] bond_dev_queue_xmit+0x1cd/0x1f9 [bonding]
[<ffffffffa04a41ee>] bond_start_xmit+0x139/0x3e9 [bonding]
[<ffffffff812b0e9a>] queue_process+0xa8/0x160
[<ffffffff812b0df2>] ? queue_process+0x0/0x160
[<ffffffff81050bfb>] worker_thread+0x1af/0x2ae
[<ffffffff81050ba2>] ? worker_thread+0x156/0x2ae
[<ffffffff81053c34>] ? autoremove_wake_function+0x0/0x38
[<ffffffff81050a4c>] ? worker_thread+0x0/0x2ae
[<ffffffff81053901>] kthread+0x7d/0x85
[<ffffffff81003794>] kernel_thread_helper+0x4/0x10
[<ffffffff813362bc>] ? restore_args+0x0/0x30
[<ffffffff81053884>] ? kthread+0x0/0x85
[<ffffffff81003790>] ? kernel_thread_helper+0x0/0x10
Dead loop on virtual device bond0, fix it urgently!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Cong Wang on
Andy Gospodarek wrote:
> On Tue, Apr 06, 2010 at 12:38:16PM +0800, Cong Wang wrote:
>> Cong Wang wrote:
>>> Before I try to reproduce it, could you please try to replace the
>>> 'read_lock()'
>>> in slaves_support_netpoll() with 'read_lock_bh()'? (read_unlock() too)
>>> Try if this helps.
>>>
>> Confirmed. Please use the attached patch instead, for your testing.
>>
>> Thanks!
>>
>
> Moving those locks to bh-locks will not resolve this. I tried that
> yesterday and tried your new patch today without success. That warning
> is a WARN_ON_ONCE so you need to reboot to see that it is still a
> problem. Simply unloading and loading the new module is not an accurate
> test.
>
> Also, my system still hangs when removing the bonding module. I do not
> think you intended to fix this with the patch, but wanted it to be clear
> to everyone on the list.


Actually I did reboot and then tested the module. I didn't get any warning.
I just tried again today, and no warnings at all.

For removing bonding module, you may need another fix of mine,
which is to fix a potential deadlock of workqueue. Try:

http://lkml.org/lkml/2010/4/1/58

>
> You should also configure your kernel with a some of the lock debugging
> enabled. I've been using the following:
>
> CONFIG_DETECT_HUNG_TASK=y
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_MUTEXES=y
> CONFIG_DEBUG_LOCK_ALLOC=y
> CONFIG_PROVE_LOCKING=y
> CONFIG_LOCKDEP=y
> CONFIG_LOCK_STAT=y
> CONFIG_DEBUG_LOCKDEP=y


Sure, I always keep these.

>
> Here is the output when I remove a slave from the bond. My
> xmit_roundrobin patch from earlier (replacing read_lock with
> read_trylock) was applied. It might be helpful for you when debugging
> these issues.


I don't apply your patch, just tested my patch.

>
> Dead loop on virtual device bond0, fix it urgently!
>

Please provide your bonding configuration and steps to reproduce it.

What I did is:

1. Load bonding module with "mode=0 miimon=100"
2. Enslave eth0 and active bond0
3. Load netconsole and send messages via bond0
4. Remove eth0 from bond0
5. Remove bonding module
6. Remove netconsole module

And no deadlocks, no warnings.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/