netpoll: add generic support for bridge and bonding devices [Kernel]

Prev: [PATCH -tip 2/2] [BUGFIX] kprobes/x86: Fix removed int3 checking order
Next: Fix migration races in rmap_walk() V2

From: Flavio Leitner on 27 May 2010 17:30

On Thu, May 27, 2010 at 01:35:59PM -0700, David Miller wrote:
> From: Flavio Leitner <fbl(a)sysclose.org>
> Date: Thu, 27 May 2010 15:05:45 -0300
>
> > I did the following patch to discard the packet if it was IN_NETPOLL
> > and the read_lock() fails, so I could go ahead testing it:
>
> This is disgusting, let's just disallow console output from such
> locations. Defer them to a workqueue if their output is so critical.

I did that patch just to see the backtrace in the serial console
and to keep testing it. It's not a solution at all.

Just to be clear, the second problem isn't related to that patch
and the console message is already in a workqueue.

--
Flavio
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Flavio Leitner on 28 May 2010 15:50

On Fri, May 28, 2010 at 10:47:01AM +0800, Cong Wang wrote:
> On 05/28/10 02:05, Flavio Leitner wrote:
> >
> >Hi guys!
> >
> >I finally could test this to see if an old problem reported on bugzilla[1] was
> >fixed now, but unfortunately it is still there.
> >
> >The ticket is private I guess, but basically the problem happens when bonding
> >driver tries to print something after it had taken the write_lock (monitor
> >functions, enslave/de-enslave), so the printk() will pass through netpoll, then
> >on bonding again which no matter what mode you use, it will try to read_lock()
> >the lock again. The result is a deadlock and the entire system hangs.
>
>
> This is true, I already fixed some similar issues.
>
> >
> >I manage to get a fresh backtrace with mode 1, see below:
> >
> >
> >[ 93.167079] Call Trace:
> >[ 93.167079] [<ffffffff81034cf9>] warn_slowpath_common+0x77/0x8f
> >[ 93.167079] [<ffffffff81034d5e>] warn_slowpath_fmt+0x3c/0x3e
> >[ 93.167079] [<ffffffff81366aef>] ? _raw_read_trylock+0x11/0x4b
> >[ 93.167079] [<ffffffffa02a2c42>] ? bond_start_xmit+0x12b/0x401 [bonding]
> >-> read_lock fails
> >[ 93.167079] [<ffffffffa02a2c9f>] bond_start_xmit+0x188/0x401 [bonding]
> >[ 93.167079] [<ffffffff81055b37>] ? trace_hardirqs_off+0xd/0xf
> >[ 93.167079] [<ffffffff812dfdb9>] netpoll_send_skb+0xbd/0x1f3
> >[ 93.167079] [<ffffffff812e00ed>] netpoll_send_udp+0x1fe/0x20d
> >[ 93.167079] [<ffffffffa02c017c>] write_msg+0x89/0xcd [netconsole]
> >[ 93.167079] [<ffffffff81034e65>] __call_console_drivers+0x67/0x79
> >[ 93.167079] [<ffffffff81034ed0>] _call_console_drivers+0x59/0x5d
> >[ 93.167079] [<ffffffff810352d3>] release_console_sem+0x121/0x1d7
> >[ 93.167079] [<ffffffff8103590a>] vprintk+0x35d/0x393
> >[ 93.167079] [<ffffffff8103f947>] ? add_timer+0x17/0x19
> >[ 93.167079] [<ffffffff81046ddf>] ? queue_delayed_work_on+0xa2/0xa9
> >[ 93.167079] [<ffffffff81363bb8>] printk+0x3c/0x44
> >[ 93.167079] [<ffffffffa02a3b17>] bond_select_active_slave+0x105/0x109 [bonding]
> >-> write_locked
> >[ 93.167079] [<ffffffffa02a4798>] bond_mii_monitor+0x479/0x4ed [bonding]
> >[ 93.167079] [<ffffffff81046009>] worker_thread+0x1ef/0x2e2
> >
> >In this case, the message should be
> > "bonding: bond0: making interface eth0 the new active one"
>
>
> Hmm, you triggered a warning here, let me check the source code
> and try to reproduce it here.

Okay, just pull the cable from one slave or both slaves.

> >I did the following patch to discard the packet if it was IN_NETPOLL
> >and the read_lock() fails, so I could go ahead testing it:
> >
> >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> >index 5e12462..a3b8bad 100644
> >--- a/drivers/net/bonding/bond_main.c
> >+++ b/drivers/net/bonding/bond_main.c
> >@@ -4258,8 +4258,19 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d
> > struct bonding *bond = netdev_priv(bond_dev);
> > int res = 1;
> >
> >- read_lock(&bond->lock);
> >- read_lock(&bond->curr_slave_lock);
> >+ if (read_trylock(&bond->lock) == 0&&
> >+ (bond_dev->flags& IFF_IN_NETPOLL)) {
> >+ dev_kfree_skb(skb);
> >+ return NETDEV_TX_OK;
> >+ }
> >+
> >+ if (read_trylock(&bond->curr_slave_lock) == 0&&
> >+ (bond_dev->flags& IFF_IN_NETPOLL)) {
> >+ read_unlock(&bond->lock);
> >+ dev_kfree_skb(skb);
> >+ return NETDEV_TX_OK;
> >+ }
> >+
> >
> > if (!BOND_IS_OK(bond))
> > goto out;
> >
>
>
> This looks like a workaround, not a fix. :)

No, it's a debugging patch, otherwise I couldn't see anything on
serial console when it deadlocks or to keep testing it. :)

> >and I found another problem. The function netpoll_send_skb() checks
> >if the npinfo's queue length is zero and if it's not, it will queue
> >the packet to make sure it's in order and then schedule the thread
> >to run. Later, the thread wakes up running queue_process() which disables
> >interrupts before calling ndo_start_xmit(). However, dev_queue_xmit()
> >uses rcu_*_bh() and before return, it will enable the interrupts again,
> >spitting this:
> >
> >------------[ cut here ]------------
> >WARNING: at kernel/softirq.c:143 local_bh_enable+0x3c/0x86()
> >Hardware name: Precision WorkStation 490
> >Modules linked in: netconsole bonding sunrpc ip6t_REJECT xt_tcpudp nf_conntrack_ipv6]
> >Pid: 17, comm: events/2 Not tainted 2.6.34-04700-gd938a70 #21
> >Call Trace:
> > [<ffffffff810381d6>] warn_slowpath_common+0x77/0x8f
> > [<ffffffff810381fd>] warn_slowpath_null+0xf/0x11
> > [<ffffffff8103d691>] local_bh_enable+0x3c/0x86
> > [<ffffffff812e4d85>] dev_queue_xmit+0x462/0x493
> > [<ffffffffa018805f>] bond_dev_queue_xmit+0x1bd/0x1e3 [bonding]
> > [<ffffffffa01881dd>] bond_start_xmit+0x158/0x37b [bonding]
> >-> interrupts disabled
> > [<ffffffff812f3fca>] queue_process+0x9d/0xf9
> > [<ffffffff8104d022>] worker_thread+0x19d/0x224
> > [<ffffffff812f3f2d>] ? queue_process+0x0/0xf9
> > [<ffffffff81050819>] ? autoremove_wake_function+0x0/0x34
> > [<ffffffff8104ce85>] ? worker_thread+0x0/0x224
> > [<ffffffff8105040b>] kthread+0x7a/0x82
> > [<ffffffff810036d4>] kernel_thread_helper+0x4/0x10
> > [<ffffffff81050391>] ? kthread+0x0/0x82
> > [<ffffffff810036d0>] ? kernel_thread_helper+0x0/0x10
> >---[ end trace 74e3904503fdb632 ]---
> >
> >kernel/softirq.c:
> >141 static inline void _local_bh_enable_ip(unsigned long ip)
> >142 {
> >143 WARN_ON_ONCE(in_irq() || irqs_disabled());
> >144 #ifdef CONFIG_TRACE_IRQFLAGS
> >145 local_irq_disable();
> >146 #endif
> >147 /*
> >148 * Are softirqs going to be turned on now:
> >149 */
> >
> >
>
> I am wondering if this was caused by the previous issue.

Yeah, the first problem can help to trigger this.
The spinlock target_list_lock in write_msg() seems to serialize
this, but I'm not sure yet if the queue will always be empty by
the time another call to write_msg() is made because the
netpoll_send_skb() will queue the packet if it fails to send.

> >The git is updated up to:
> > d938a70 be2net: increase POST timeout for EEH recovery
> >
> >Two slave interfaces, bonding mode 1, netconsole over bond0.
> >
> >[1] https://bugzilla.redhat.com/show_bug.cgi?id=248374#c5
>
> How did you reproduce this?
> I will check that BZ to see if I can find how to reproduce this.
>
> Thanks.
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Flavio
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Flavio Leitner on 28 May 2010 16:50

On Fri, May 28, 2010 at 04:16:34PM +0800, Cong Wang wrote:
> On 05/28/10 02:05, Flavio Leitner wrote:
> >
> >Hi guys!
> >
> >I finally could test this to see if an old problem reported on bugzilla[1] was
> >fixed now, but unfortunately it is still there.
> >
> >The ticket is private I guess, but basically the problem happens when bonding
> >driver tries to print something after it had taken the write_lock (monitor
> >functions, enslave/de-enslave), so the printk() will pass through netpoll, then
> >on bonding again which no matter what mode you use, it will try to read_lock()
> >the lock again. The result is a deadlock and the entire system hangs.
> >
>
> Does the attached patch fix this hang?

I got another issue now:

[ 89.523062] bonding: bond0: enslaving eth0 as a backup interface with a down link.
[ 89.580746] bonding: bond0: enslaving eth2 as a backup interface with a down link.
[ 91.198527] e1000: eth2 NIC Link is Up 100 Mbps Half Duplex, Flow Control: None
[ 91.238245] bonding: bond0: link status definitely up for interface eth2.

[ 91.245381] BUG: scheduling while atomic: bond0/2716/0x10000100
[ 91.251565] 5 locks held by bond0/2716:
[ 91.255663] #0: ((bond_dev->name)){+.+.+.}, at: [<ffffffff81045fb4>] worker_thread+0x19a/0x2e2
[ 91.265179] #1: ((&(&bond->mii_work)->work)){+.+.+.}, at: [<ffffffff81045fb4>] worker_thread+0x19a/0x2e2
[ 91.275554] #2: (rtnl_mutex){+.+.+.}, at: [<ffffffff812daf38>] rtnl_lock+0x12/0x14
[ 91.284018] #3: (&bond->lock){++.+.+}, at: [<ffffffffa029e06a>] bond_mii_monitor+0x2a2/0x4ed [bonding]
[ 91.294230] #4: (&bond->curr_slave_lock){+...+.}, at: [<ffffffffa029e239>] bond_mii_monitor+0x471/0x4ed [bonding]
[ 91.305387] Modules linked in: bonding sunrpc ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 dm_mirror dm_region_hash dm_log dm_multipath uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm ppdev parport_pc parport rtc_cmos snd_timer tg3 snd ide_cd_mod i5000_edac i2c_i801 libphy rtc_core rtc_lib edac_core pcspkr e1000 dcdbas uhci_hcd tulip shpchp i2c_core cdrom serio_raw soundcore sg snd_page_alloc raid0 sd_mod button [last unloaded: mperf]
[ 91.357735] Pid: 2716, comm: bond0 Not tainted 2.6.34-04700-gd938a70-dirty #36
[ 91.371112] Call Trace:
[ 91.373825] [<ffffffff81056002>] ? __debug_show_held_locks+0x22/0x24
[ 91.380530] [<ffffffff8102e4a2>] __schedule_bug+0x6d/0x72
[ 91.386284] [<ffffffff81363f6e>] schedule+0xc9/0x791
[ 91.391600] [<ffffffff81032540>] __cond_resched+0x25/0x30
[ 91.397350] [<ffffffff81364757>] _cond_resched+0x27/0x32
[ 91.403013] [<ffffffff810ab243>] kmem_cache_alloc+0x2b/0xac
[ 91.408936] [<ffffffff812c61fd>] skb_clone+0x42/0x5d
[ 91.414253] [<ffffffff812ec696>] netlink_broadcast+0x192/0x369
[ 91.420436] [<ffffffff812ecdc3>] nlmsg_notify+0x43/0x89
[ 91.426012] [<ffffffff812dabc7>] rtnl_notify+0x2b/0x2d
[ 91.431501] [<ffffffff812dacbc>] rtmsg_ifinfo+0xf3/0x118
[ 91.437165] [<ffffffff812dad0c>] rtnetlink_event+0x2b/0x2f
[ 91.443003] [<ffffffff81369fe4>] notifier_call_chain+0x32/0x5e
[ 91.449188] [<ffffffff8104d618>] raw_notifier_call_chain+0xf/0x11
[ 91.455634] [<ffffffff812cfc73>] call_netdevice_notifiers+0x45/0x4a
[ 91.462253] [<ffffffff812d04f7>] netdev_bonding_change+0x12/0x14
[ 91.468614] [<ffffffffa029d589>] bond_select_active_slave+0xe8/0x123 [bonding]
[ 91.476408] [<ffffffffa029e241>] bond_mii_monitor+0x479/0x4ed [bonding]
[ 91.483375] [<ffffffff81046009>] worker_thread+0x1ef/0x2e2
[ 91.489212] [<ffffffff81045fb4>] ? worker_thread+0x19a/0x2e2
[ 91.495227] [<ffffffffa029ddc8>] ? bond_mii_monitor+0x0/0x4ed [bonding]
[ 91.502192] [<ffffffff81049c71>] ? autoremove_wake_function+0x0/0x34
[ 91.508897] [<ffffffff81045e1a>] ? worker_thread+0x0/0x2e2
[ 91.514734] [<ffffffff810498bb>] kthread+0x7a/0x82
[ 91.519878] [<ffffffff81003714>] kernel_thread_helper+0x4/0x10
[ 91.526060] [<ffffffff81366ffc>] ? restore_args+0x0/0x30
[ 91.531723] [<ffffffff81049841>] ? kthread+0x0/0x82
[ 91.536953] [<ffffffff81003710>] ? kernel_thread_helper+0x0/0x10
[ 91.543343] bonding: bond0: making interface eth2 the new active one.
[ 91.550554] bonding: bond0: first active interface up!
[ 91.556859] ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready

No other patch applied. Just started netconsole over bonding, so no need
to pull the cable from slaves. Reproduced twice, one I got the
backtrace above, and on the other one the system hangs completely
after the BUG: scheduling message.

fbl

>
> Thanks!
>
> ----------------------->
>
> We should notify netconsole that bond is changing its slaves
> when we use active-backup mode.
>
> Signed-off-by: WANG Cong <amwang(a)redhat.com>
>
> ----
>

> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 5e12462..9494c02 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1199,6 +1199,7 @@ void bond_select_active_slave(struct bonding *bond)
>
> best_slave = bond_find_best_slave(bond);
> if (best_slave != bond->curr_active_slave) {
> + netdev_bonding_change(bond->dev, NETDEV_BONDING_DESLAVE);
> bond_change_active_slave(bond, best_slave);
> rv = bond_set_carrier(bond);
> if (!rv)
> @@ -2154,6 +2155,7 @@ static int bond_ioctl_change_active(struct net_device *bond_dev, struct net_devi
> (old_active) &&
> (new_active->link == BOND_LINK_UP) &&
> IS_UP(new_active->dev)) {
> + netdev_bonding_change(bond->dev, NETDEV_BONDING_DESLAVE);
> write_lock_bh(&bond->curr_slave_lock);
> bond_change_active_slave(bond, new_active);
> write_unlock_bh(&bond->curr_slave_lock);

--
Flavio
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Flavio Leitner on 31 May 2010 15:10

On Mon, May 31, 2010 at 01:56:52PM +0800, Cong Wang wrote:
> Hi, Flavio,
>
> Please use the attached patch instead, try to see if it solves
> all your problems.

I tried and it hangs. No backtraces this time.
The bond_change_active_slave() prints before NETDEV_BONDING_FAILOVER
notification, so I think it won't work.

Please, correct if I'm wrong, but when a failover happens with your
patch applied, the netconsole would be disabled forever even with
another healthy slave, right?

fbl

>
> Thanks a lot!
>

> diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
> index ca142c4..2d1d594 100644
> --- a/drivers/net/netconsole.c
> +++ b/drivers/net/netconsole.c
> @@ -666,7 +666,8 @@ static int netconsole_netdev_event(struct notifier_block *this,
> struct net_device *dev = ptr;
>
> if (!(event == NETDEV_CHANGENAME || event == NETDEV_UNREGISTER ||
> - event == NETDEV_BONDING_DESLAVE || event == NETDEV_GOING_DOWN))
> + event == NETDEV_BONDING_DESLAVE || event == NETDEV_GOING_DOWN ||
> + event == NETDEV_BONDING_FAILOVER))
> goto done;
>
> spin_lock_irqsave(&target_list_lock, flags);
> @@ -682,6 +683,7 @@ static int netconsole_netdev_event(struct notifier_block *this,
> /* Fall through */
> case NETDEV_GOING_DOWN:
> case NETDEV_BONDING_DESLAVE:
> + case NETDEV_BONDING_FAILOVER:
> nt->enabled = 0;
> break;
> }

--
Flavio
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: David Miller on 7 Jun 2010 06:10

From: Cong Wang <amwang(a)redhat.com>
Date: Mon, 07 Jun 2010 17:57:49 +0800

> Hmm, I still feel like this way is ugly, although it may work.
> I guess David doesn't like it either.

Of course I don't like it. :-)

I suspect the locking scheme will need to be changed.

Besides, if we're going to hack this up and do write lock attempts in
the read locking paths, there is no point in using a rwlock any more.
And I'm personally in disfavor of all rwlock usage anyways (it dirties
the cacheline for readers just as equally for writers, and if the
critically protected code path is short enough, that shared cache
line atomic operation will be the predominant cost).

So I'd say, 1) make this a spinlock and 2) try to use RCU for the
read path.

That would fix everything.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev |
Pages: 1 2
Prev: [PATCH -tip 2/2] [BUGFIX] kprobes/x86: Fix removed int3 checking order
Next: Fix migration races in rmap_walk() V2