From: Eric Dumazet on
Eric Dumazet a �crit :
> Francis Moreau a �crit :
>> Hello Eric,
>>
>> It seems I still have a related bug, please have a look to the following oops.
>>
>> This happened on a 2.6.32-rc5 where your patch is included.
>>
>> [107304.558821] nfsd: last server has exited, flushing export cache
>> [107304.558848] ------------[ cut here ]------------
>> [107304.558858] WARNING: at net/ipv4/af_inet.c:153
>> inet_sock_destruct+0x161/0x17c()
>> [107304.558862] Hardware name: P5K-VM
>> [107304.558865] Modules linked in: kvm_intel kvm jfs loop nfsd lockd
>> nfs_acl auth_rpcgss exportfs sunrpc [last unloaded: microcode]
>> [107304.558889] Pid: 8198, comm: nfsd Tainted: G M 2.6.32-rc5 #25
>> [107304.558892] Call Trace:
>> [107304.558899] [<ffffffff81429f19>] ? inet_sock_destruct+0x161/0x17c
>> [107304.558907] [<ffffffff810487e9>] warn_slowpath_common+0x7c/0xa9
>> [107304.558914] [<ffffffff8104882a>] warn_slowpath_null+0x14/0x16
>> [107304.558920] [<ffffffff81429f19>] inet_sock_destruct+0x161/0x17c
>> [107304.558927] [<ffffffff813c8741>] __sk_free+0x23/0xe7
>> [107304.558933] [<ffffffff813c8881>] sk_free+0x1f/0x21
>> [107304.558939] [<ffffffff813c894b>] sk_common_release+0xc8/0xcd
>> [107304.558944] [<ffffffff81420b59>] udp_lib_close+0xe/0x10
>> [107304.558951] [<ffffffff814299bf>] inet_release+0x55/0x5c
>> [107304.558957] [<ffffffff813c5aa9>] sock_release+0x1f/0x71
>> [107304.558962] [<ffffffff813c5b22>] sock_close+0x27/0x2b
>> [107304.558968] [<ffffffff810eb60f>] __fput+0xfb/0x1c0
>> [107304.558973] [<ffffffff810eb6f1>] fput+0x1d/0x1f
>> [107304.558995] [<ffffffffa0013e23>] svc_sock_free+0x40/0x56 [sunrpc]
>> [107304.559018] [<ffffffffa001f392>] svc_xprt_free+0x43/0x53 [sunrpc]
>> [107304.559038] [<ffffffffa001f34f>] ? svc_xprt_free+0x0/0x53 [sunrpc]
>> [107304.559048] [<ffffffff811d9275>] kref_put+0x43/0x4f
>> [107304.559069] [<ffffffffa001e67a>] svc_close_xprt+0x55/0x5e [sunrpc]
>> [107304.559088] [<ffffffffa001e6d3>] svc_close_all+0x50/0x69 [sunrpc]
>> [107304.559107] [<ffffffffa0012a2b>] svc_destroy+0x9e/0x142 [sunrpc]
>> [107304.559126] [<ffffffffa0012b88>] svc_exit_thread+0xb9/0xc2 [sunrpc]
>> [107304.559138] [<ffffffffa008981b>] ? nfsd+0x0/0x13f [nfsd]
>> [107304.559149] [<ffffffffa0089940>] nfsd+0x125/0x13f [nfsd]
>> [107304.559157] [<ffffffff810685e3>] kthread+0x82/0x8a
>> [107304.559164] [<ffffffff8100c13a>] child_rip+0xa/0x20
>> [107304.559172] [<ffffffff8100baad>] ? restore_args+0x0/0x30
>> [107304.559179] [<ffffffff81068561>] ? kthread+0x0/0x8a
>> [107304.559185] [<ffffffff8100c130>] ? child_rip+0x0/0x20
>> [107304.559191] ---[ end trace c107131f4762168c ]---
>> [107304.927931] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state
>> recovery directory
>> [107304.932765] NFSD: starting 90-second grace period
>>
>
> This oops occurring again and again with SUNRPC finally gave me the right pointer.
>
> David, we added two years ago memory accounting to UDP, and this changed
> requirements about calling skb_free_datagram() in the right context.
>
> I wish we had an ASSERT_SOCK_LOCKED() debugging facility :(
>
> Francis, would you please test following patch ?

Here is the updated patch (against linux-2.6) , in case other people are interested to test it.

[PATCH take2] net: fix sk_forward_alloc corruption

On UDP sockets, we must call skb_free_datagram() with socket locked,
or risk sk_forward_alloc corruption. This requirement is not respected
in SUNRPC.

Add a convenient helper, skb_free_datagram_locked() and use it in SUNRPC

Reported-by: Francis Moreau <francis.moro(a)gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet(a)gmail.com>
---
include/linux/skbuff.h | 2 ++
net/core/datagram.c | 10 +++++++++-
net/ipv4/udp.c | 4 +---
net/ipv6/udp.c | 4 +---
net/sunrpc/svcsock.c | 10 +++++-----
5 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index df7b23a..266878f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1757,6 +1757,8 @@ extern int skb_copy_datagram_const_iovec(const struct sk_buff *from,
int to_offset,
int size);
extern void skb_free_datagram(struct sock *sk, struct sk_buff *skb);
+extern void skb_free_datagram_locked(struct sock *sk,
+ struct sk_buff *skb);
extern int skb_kill_datagram(struct sock *sk, struct sk_buff *skb,
unsigned int flags);
extern __wsum skb_checksum(const struct sk_buff *skb, int offset,
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 1c6cf3a..4ade301 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -224,6 +224,15 @@ void skb_free_datagram(struct sock *sk, struct sk_buff *skb)
consume_skb(skb);
sk_mem_reclaim_partial(sk);
}
+EXPORT_SYMBOL(skb_free_datagram);
+
+void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
+{
+ lock_sock(sk);
+ skb_free_datagram(sk, skb);
+ release_sock(sk);
+}
+EXPORT_SYMBOL(skb_free_datagram_locked);

/**
* skb_kill_datagram - Free a datagram skbuff forcibly
@@ -752,5 +761,4 @@ unsigned int datagram_poll(struct file *file, struct socket *sock,
EXPORT_SYMBOL(datagram_poll);
EXPORT_SYMBOL(skb_copy_and_csum_datagram_iovec);
EXPORT_SYMBOL(skb_copy_datagram_iovec);
-EXPORT_SYMBOL(skb_free_datagram);
EXPORT_SYMBOL(skb_recv_datagram);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d0d436d..0fa9f70 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -999,9 +999,7 @@ try_again:
err = ulen;

out_free:
- lock_sock(sk);
- skb_free_datagram(sk, skb);
- release_sock(sk);
+ skb_free_datagram_locked(sk, skb);
out:
return err;

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 3a60f12..cf538ed 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -288,9 +288,7 @@ try_again:
err = ulen;

out_free:
- lock_sock(sk);
- skb_free_datagram(sk, skb);
- release_sock(sk);
+ skb_free_datagram_locked(sk, skb);
out:
return err;

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index ccc5e83..1c246a4 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -111,7 +111,7 @@ static void svc_release_skb(struct svc_rqst *rqstp)
rqstp->rq_xprt_ctxt = NULL;

dprintk("svc: service %p, releasing skb %p\n", rqstp, skb);
- skb_free_datagram(svsk->sk_sk, skb);
+ skb_free_datagram_locked(svsk->sk_sk, skb);
}
}

@@ -578,7 +578,7 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
"svc: received unknown control message %d/%d; "
"dropping RPC reply datagram\n",
cmh->cmsg_level, cmh->cmsg_type);
- skb_free_datagram(svsk->sk_sk, skb);
+ skb_free_datagram_locked(svsk->sk_sk, skb);
return 0;
}

@@ -588,18 +588,18 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
if (csum_partial_copy_to_xdr(&rqstp->rq_arg, skb)) {
local_bh_enable();
/* checksum error */
- skb_free_datagram(svsk->sk_sk, skb);
+ skb_free_datagram_locked(svsk->sk_sk, skb);
return 0;
}
local_bh_enable();
- skb_free_datagram(svsk->sk_sk, skb);
+ skb_free_datagram_locked(svsk->sk_sk, skb);
} else {
/* we can use it in-place */
rqstp->rq_arg.head[0].iov_base = skb->data +
sizeof(struct udphdr);
rqstp->rq_arg.head[0].iov_len = len;
if (skb_checksum_complete(skb)) {
- skb_free_datagram(svsk->sk_sk, skb);
+ skb_free_datagram_locked(svsk->sk_sk, skb);
return 0;
}
rqstp->rq_xprt_ctxt = skb;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Miller on
From: Eric Dumazet <eric.dumazet(a)gmail.com>
Date: Fri, 30 Oct 2009 16:03:53 +0100

> [PATCH take2] net: fix sk_forward_alloc corruption
>
> On UDP sockets, we must call skb_free_datagram() with socket locked,
> or risk sk_forward_alloc corruption. This requirement is not respected
> in SUNRPC.
>
> Add a convenient helper, skb_free_datagram_locked() and use it in SUNRPC
>
> Reported-by: Francis Moreau <francis.moro(a)gmail.com>
> Signed-off-by: Eric Dumazet <eric.dumazet(a)gmail.com>

I've tentatively applied this to my net-2.6 tree, I won't
push it out until we have positive testing results.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric Dumazet on
David Miller a �crit :
> From: Eric Dumazet <eric.dumazet(a)gmail.com>
> Date: Fri, 30 Oct 2009 16:03:53 +0100
>
>> [PATCH take2] net: fix sk_forward_alloc corruption
>>
>> On UDP sockets, we must call skb_free_datagram() with socket locked,
>> or risk sk_forward_alloc corruption. This requirement is not respected
>> in SUNRPC.
>>
>> Add a convenient helper, skb_free_datagram_locked() and use it in SUNRPC
>>
>> Reported-by: Francis Moreau <francis.moro(a)gmail.com>
>> Signed-off-by: Eric Dumazet <eric.dumazet(a)gmail.com>
>
> I've tentatively applied this to my net-2.6 tree, I won't
> push it out until we have positive testing results.

I tested nfs/nfsd with fillowing additional debugging patch.

Without the fix (net: fix sk_forward_alloc corruption), I got warnings,
while with fix applied, I got no warnings. Its probably hard to hit
original bug since its a very small race window.

Trace without fix :

[ 226.686788] RPC: Registered udp transport module.
[ 226.686791] RPC: Registered tcp transport module.
[ 226.686792] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 226.851677] Installing knfsd (copyright (C) 1996 okir(a)monad.swb.de).
[ 226.869381] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
[ 226.869443] NFSD: unable to find recovery directory /var/lib/nfs/v4recovery
[ 226.869488] NFSD: starting 90-second grace period
[ 226.870564] svc: failed to register lockdv1 RPC service (errno 97).
[ 244.401237] ------------[ cut here ]------------
[ 244.401288] WARNING: at include/net/sock.h:886 sock_rfree+0x54/0x70()
[ 244.401334] Hardware name: ProLiant BL460c G1
[ 244.401379] Modules linked in: nfsd lockd auth_rpcgss sunrpc exportfs bonding ipv6
[ 244.401699] Pid: 5295, comm: nfsd Not tainted 2.6.32-rc3-00920-gbdd6be3-dirty #342
[ 244.401745] Call Trace:
[ 244.401784] [<c055355a>] ? printk+0x1d/0x23
[ 244.401827] [<c023c3f2>] warn_slowpath_common+0x72/0xa0
[ 244.401869] [<c04b0d54>] ? sock_rfree+0x54/0x70
[ 244.401909] [<c04b0d54>] ? sock_rfree+0x54/0x70
[ 244.401950] [<c023c43a>] warn_slowpath_null+0x1a/0x20
[ 244.401991] [<c04b0d54>] sock_rfree+0x54/0x70
[ 244.402033] [<c04b4cf5>] skb_release_head_state+0x45/0xe0
[ 244.402084] [<c04b49f0>] __kfree_skb+0x10/0x90
[ 244.402125] [<c04b4a8c>] consume_skb+0x1c/0x40
[ 244.402166] [<c04b7c62>] skb_free_datagram+0x12/0x70
[ 244.402217] [<fc9fa177>] svc_release_skb+0x37/0x60 [sunrpc]
[ 244.402261] [<c026f452>] ? module_put+0x62/0xf0
[ 244.402308] [<fca05c98>] svc_send+0x28/0xc0 [sunrpc]
[ 244.402356] [<fc9f8acb>] svc_process+0x22b/0x770 [sunrpc]
[ 244.402416] [<fd04895c>] nfsd+0xac/0x140 [nfsd]
[ 244.402484] [<fd0488b0>] ? nfsd+0x0/0x140 [nfsd]
[ 244.402543] [<c0257ab4>] kthread+0x74/0x80
[ 244.402602] [<c0257a40>] ? kthread+0x0/0x80
[ 244.402656] [<c020390f>] kernel_thread_helper+0x7/0x10
[ 244.402711] ---[ end trace e202cdb1b491aa15 ]---


Debugging patch :

Intent is to make sure sock is locked by user or by softirq handler,
unless we are currently dismantling socket (sk_refcnt==0)

To speedup this test, we could use several bits of sk_lock.owned instead
of one for the user case.
(one for lock owned by user, one for lowk owned by softirq, one in dismantle phase)

diff --git a/include/net/sock.h b/include/net/sock.h
index 55de3bd..08ecb0e 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -849,10 +849,15 @@ static inline int sk_rmem_schedule(struct sock *sk, int size)
__sk_mem_schedule(sk, size, SK_MEM_RECV);
}

+#define sock_owned_by_user(sk) ((sk)->sk_lock.owned)
+#define sock_owned(sk) ((sk)->sk_lock.owned || \
+ spin_is_locked(&(sk)->sk_lock.slock))
+
static inline void sk_mem_reclaim(struct sock *sk)
{
if (!sk_has_account(sk))
return;
+ WARN_ON(!(sock_owned(sk) || atomic_read(&sk->sk_refcnt) == 0));
if (sk->sk_forward_alloc >= SK_MEM_QUANTUM)
__sk_mem_reclaim(sk);
}
@@ -861,6 +866,7 @@ static inline void sk_mem_reclaim_partial(struct sock *sk)
{
if (!sk_has_account(sk))
return;
+ WARN_ON(!(sock_owned(sk) || atomic_read(&sk->sk_refcnt) == 0));
if (sk->sk_forward_alloc > SK_MEM_QUANTUM)
__sk_mem_reclaim(sk);
}
@@ -869,6 +875,7 @@ static inline void sk_mem_charge(struct sock *sk, int size)
{
if (!sk_has_account(sk))
return;
+ WARN_ON(!sock_owned(sk));
sk->sk_forward_alloc -= size;
}

@@ -876,6 +883,7 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
{
if (!sk_has_account(sk))
return;
+ WARN_ON(!sock_owned(sk));
sk->sk_forward_alloc += size;
}

@@ -900,7 +908,6 @@ static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb)
* Since ~2.3.5 it is also exclusive sleep lock serializing
* accesses from user process context.
*/
-#define sock_owned_by_user(sk) ((sk)->sk_lock.owned)

/*
* Macro so as to not evaluate some arguments when

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Francis Moreau on
On Fri, Oct 30, 2009 at 4:03 PM, Eric Dumazet <eric.dumazet(a)gmail.com> wrote:

[...]

>
> [PATCH take2] net: fix sk_forward_alloc corruption
>

With this patch applied 4 days ago, I haven't got any oops so far.
--
Francis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric Dumazet on
Francis Moreau a �crit :
> On Fri, Oct 30, 2009 at 4:03 PM, Eric Dumazet <eric.dumazet(a)gmail.com> wrote:
>
> [...]
>
>> [PATCH take2] net: fix sk_forward_alloc corruption
>>
>
> With this patch applied 4 days ago, I haven't got any oops so far.

Thanks a lot for testing and report Francis, I believe David will push it for stable
(if not already done)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/