From: Michael Leun on
On Thu, 05 Aug 2010 02:51:29 -0700
ebiederm(a)xmission.com (Eric W. Biederman) wrote:

> >> > Jul 10 20:02:36 doris kernel: unregister_netdevice: waiting for
> >> > lo to become free. Usage count = 3 [repeated]
> >>
> >> How many times?
> >
> > Unfortunately looks like indefinitely. Never watched longer so far
> > (rebooted soon), but I'm seeing this message now repeated every 10
> > secs for ~10 minutes on a idle system.
>
> Ugh. A real bug then. These can be a pain to track down and fix. I
> think the last one of these I tracked down took a couple of weeks. I
> will start digging in when I get back from vacation.

OK, fortunately (hopefully) you have not put to much time onto that so
far - because everything I told about usage of tun and difference
between ssh and openvpn is complete nonsense.

I happen to have an script in that openvpn config, which puts an ipv6
address on the vpn device.

Putting an ipv6 address on a device seems to be the trigger:

OrigNS > # ip link add type veth
OrigNS > # ip link set dev veth0 up
OrigNS > # unshare -n /bin/bash
NewNS > # echo $$
<SomePID>
OrigNS > # ip link set dev veth1 netns <SomePID> # this, of course is on a different terminal
NewNS > # ip link set dev veth1 up
NewNS > # ip -6 addr add dev veth1 fd50:dead:beef::1/64
NewNS > # exit

Yields

kernel: unregister_netdevice: waiting for veth1 to become free. Usage count = 3

Oh - its veth1 this time, not lo - add an "ip link set up dev lo" in the above scenario just after the unshare, and you get the message with lo.

One might ask, if

> # unshare -n /bin/bash
> # ip link set up dev lo
> # ip -6 addr add dev veth1 fd50:dead:beef::1/64
> # exit

also does the trick, so I tried it - and it does NOT.

In the above scenario, not setting veth0 and veth1 up also makes it not happen. Only setting veth1 up also is not enough (seems to need to be "really up" what as you shurely know with veth is only the case when both sides are up).

I hope, this makes it somewhat easier to track that down.

--
MfG,

Michael Leun

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Miller on
From: ebiederm(a)xmission.com (Eric W. Biederman)
Date: Thu, 05 Aug 2010 12:57:59 -0700

> I wonder what has changed with ipv6 recently.

There was a recent fix to the IGMP snooping code we have in
the bridging layer, if parsing of an ipv6 IGMP packet failed
we'd leak the packet (and thus references to whatever device
it referenced).

commit 6d1d1d398cb7db7a12c5d652d50f85355345234f
Author: Herbert Xu <herbert(a)gondor.apana.org.au>
Date: Thu Jul 29 01:12:31 2010 +0000

bridge: Fix skb leak when multicast parsing fails on TX

On the bridge TX path we're leaking an skb when br_multicast_rcv
returns an error.

Reported-by: David Lamparter <equinox(a)diac24.net>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: David S. Miller <davem(a)davemloft.net>

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 4cec805..f49bcd9 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -48,8 +48,10 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)

rcu_read_lock();
if (is_multicast_ether_addr(dest)) {
- if (br_multicast_rcv(br, NULL, skb))
+ if (br_multicast_rcv(br, NULL, skb)) {
+ kfree_skb(skb);
goto out;
+ }

mdst = br_mdb_get(br, skb);
if (mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb))
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: lkml20100708 on
On Thu, 05 Aug 2010 13:11:28 -0700 (PDT)
David Miller <davem(a)davemloft.net> wrote:

> From: ebiederm(a)xmission.com (Eric W. Biederman)
> Date: Thu, 05 Aug 2010 12:57:59 -0700
>
> > I wonder what has changed with ipv6 recently.
>
> There was a recent fix to the IGMP snooping code we have in
> the bridging layer, if parsing of an ipv6 IGMP packet failed
> we'd leak the packet (and thus references to whatever device
> it referenced).
>
> commit 6d1d1d398cb7db7a12c5d652d50f85355345234f
[...]

But this patch is not in 2.6.35 and therefore cannot make the
difference Eric sees (belives to see) between his modified 2.6.32 and
2.6.35.

Also, this patch, if I understand that correctly, only changes bridging
and in my scenario bridge.ko (have it as module) was not even loaded,
so applying this patch should not make any difference for the bug I
see, or do I overlook something?

So, I guess, your answer was general information to Erics question what
changed with ipv6, not related to that bug we seek in particular?

--
MfG,

Michael Leun

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael Leun on
On Thu, 05 Aug 2010 12:57:59 -0700
ebiederm(a)xmission.com (Eric W. Biederman) wrote:

> What puzzles me is that on a slightly patched 2.6.32 (so sysfs works)
> and I am doing very similar things (openvpn tunnels, ipv6 to the
> network as a whole etc), and I am not seeing the infinite
> unregister_netdevice: messages you are talking about.

Hmmm, I think there are 2 possibilities:

- You send me a patch against plain 2.6.32, so I can check my
scenarios against that kernel

or

- You could try yourself, its really just that few lines against a
fresh booted system in a clean, easy to reproduce state

(Only, if you think that would yield useful information, of course).

> When a network device is removed most references to it are redirected
> to the loopback device so a normal network device should not see the
> worst of the problems. That is why lo showed up.
>
> In that context I'm a bit surprised you managed trigger a problem on
> veth1.

Difference was, when that message showed up with veth1, lo in that
namespace was down while testing. When lo was up it showed up on lo.

--
MfG,

Michael Leun

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/