From: Christoph Hellwig on
This should actually be on it's way to Linus for .35, shouldn't it?

On Thu, Jun 24, 2010 at 01:02:14PM +1000, npiggin(a)suse.de wrote:
> list_for_each_entry_safe is not suitable to protect against concurrent
> modification of the list. 6754af6 introduced a race in sb walking.
>
> list_for_each_entry can use the trick of pinning the current entry in
> the list before we drop and retake the lock because it subsequently
> follows cur->next. However list_for_each_entry_safe saves n=cur->next
> for following before entering the loop body, so when the lock is
> dropped, n may be deleted.
>
> Signed-off-by: Nick Piggin <npiggin(a)suse.de>
> ---
> fs/dcache.c | 2 ++
> fs/super.c | 6 ++++++
> include/linux/list.h | 15 +++++++++++++++
> 3 files changed, 23 insertions(+)
>
> Index: linux-2.6/fs/dcache.c
> ===================================================================
> --- linux-2.6.orig/fs/dcache.c
> +++ linux-2.6/fs/dcache.c
> @@ -590,6 +590,8 @@ static void prune_dcache(int count)
> up_read(&sb->s_umount);
> }
> spin_lock(&sb_lock);
> + /* lock was dropped, must reset next */
> + list_safe_reset_next(sb, n, s_list);
> count -= pruned;
> __put_super(sb);
> /* more work left to do? */
> Index: linux-2.6/fs/super.c
> ===================================================================
> --- linux-2.6.orig/fs/super.c
> +++ linux-2.6/fs/super.c
> @@ -374,6 +374,8 @@ void sync_supers(void)
> up_read(&sb->s_umount);
>
> spin_lock(&sb_lock);
> + /* lock was dropped, must reset next */
> + list_safe_reset_next(sb, n, s_list);
> __put_super(sb);
> }
> }
> @@ -405,6 +407,8 @@ void iterate_supers(void (*f)(struct sup
> up_read(&sb->s_umount);
>
> spin_lock(&sb_lock);
> + /* lock was dropped, must reset next */
> + list_safe_reset_next(sb, n, s_list);
> __put_super(sb);
> }
> spin_unlock(&sb_lock);
> @@ -585,6 +589,8 @@ static void do_emergency_remount(struct
> }
> up_write(&sb->s_umount);
> spin_lock(&sb_lock);
> + /* lock was dropped, must reset next */
> + list_safe_reset_next(sb, n, s_list);
> __put_super(sb);
> }
> spin_unlock(&sb_lock);
> Index: linux-2.6/include/linux/list.h
> ===================================================================
> --- linux-2.6.orig/include/linux/list.h
> +++ linux-2.6/include/linux/list.h
> @@ -544,6 +544,21 @@ static inline void list_splice_tail_init
> &pos->member != (head); \
> pos = n, n = list_entry(n->member.prev, typeof(*n), member))
>
> +/**
> + * list_safe_reset_next - reset a stale list_for_each_entry_safe loop
> + * @pos: the loop cursor used in the list_for_each_entry_safe loop
> + * @n: temporary storage used in list_for_each_entry_safe
> + * @member: the name of the list_struct within the struct.
> + *
> + * list_safe_reset_next is not safe to use in general if the list may be
> + * modified concurrently (eg. the lock is dropped in the loop body). An
> + * exception to this is if the cursor element (pos) is pinned in the list,
> + * and list_safe_reset_next is called after re-taking the lock and before
> + * completing the current iteration of the loop body.
> + */
> +#define list_safe_reset_next(pos, n, member) \
> + n = list_entry(pos->member.next, typeof(*pos), member)
> +
> /*
> * Double linked lists with a single pointer list head.
> * Mostly useful for hash tables where the two pointer list head is
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
---end quoted text---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on
On Tue, Jun 29, 2010 at 09:02:14AM -0400, Christoph Hellwig wrote:
> This should actually be on it's way to Linus for .35, shouldn't it?

Yeah, I was waiting for Al to reappear, but I think this is
probably the nicest way to solve the problem. Linus?
--
fs: fix superblock iteration race

list_for_each_entry_safe is not suitable to protect against concurrent
modification of the list. 6754af6 introduced a race in sb walking.

list_for_each_entry can use the trick of pinning the current entry in
the list before we drop and retake the lock because it subsequently
follows cur->next. However list_for_each_entry_safe saves n=cur->next
for following before entering the loop body, so when the lock is
dropped, n may be deleted.

Signed-off-by: Nick Piggin <npiggin(a)suse.de>
---
fs/dcache.c | 2 ++
fs/super.c | 6 ++++++
include/linux/list.h | 15 +++++++++++++++
3 files changed, 23 insertions(+)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -590,6 +590,8 @@ static void prune_dcache(int count)
up_read(&sb->s_umount);
}
spin_lock(&sb_lock);
+ /* lock was dropped, must reset next */
+ list_safe_reset_next(sb, n, s_list);
count -= pruned;
__put_super(sb);
/* more work left to do? */
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c
+++ linux-2.6/fs/super.c
@@ -374,6 +374,8 @@ void sync_supers(void)
up_read(&sb->s_umount);

spin_lock(&sb_lock);
+ /* lock was dropped, must reset next */
+ list_safe_reset_next(sb, n, s_list);
__put_super(sb);
}
}
@@ -405,6 +407,8 @@ void iterate_supers(void (*f)(struct sup
up_read(&sb->s_umount);

spin_lock(&sb_lock);
+ /* lock was dropped, must reset next */
+ list_safe_reset_next(sb, n, s_list);
__put_super(sb);
}
spin_unlock(&sb_lock);
@@ -585,6 +589,8 @@ static void do_emergency_remount(struct
}
up_write(&sb->s_umount);
spin_lock(&sb_lock);
+ /* lock was dropped, must reset next */
+ list_safe_reset_next(sb, n, s_list);
__put_super(sb);
}
spin_unlock(&sb_lock);
Index: linux-2.6/include/linux/list.h
===================================================================
--- linux-2.6.orig/include/linux/list.h
+++ linux-2.6/include/linux/list.h
@@ -544,6 +544,21 @@ static inline void list_splice_tail_init
&pos->member != (head); \
pos = n, n = list_entry(n->member.prev, typeof(*n), member))

+/**
+ * list_safe_reset_next - reset a stale list_for_each_entry_safe loop
+ * @pos: the loop cursor used in the list_for_each_entry_safe loop
+ * @n: temporary storage used in list_for_each_entry_safe
+ * @member: the name of the list_struct within the struct.
+ *
+ * list_safe_reset_next is not safe to use in general if the list may be
+ * modified concurrently (eg. the lock is dropped in the loop body). An
+ * exception to this is if the cursor element (pos) is pinned in the list,
+ * and list_safe_reset_next is called after re-taking the lock and before
+ * completing the current iteration of the loop body.
+ */
+#define list_safe_reset_next(pos, n, member) \
+ n = list_entry(pos->member.next, typeof(*pos), member)
+
/*
* Double linked lists with a single pointer list head.
* Mostly useful for hash tables where the two pointer list head is
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on
On Tue, Jun 29, 2010 at 7:56 AM, Nick Piggin <npiggin(a)suse.de> wrote:
> On Tue, Jun 29, 2010 at 09:02:14AM -0400, Christoph Hellwig wrote:
>> This should actually be on it's way to Linus for .35, shouldn't it?
>
> Yeah, I was waiting for Al to reappear, but I think this is
> probably the nicest way to solve the problem. Linus?

I'll apply it. We have a couple of oopses listed for the superblock
iterator, and I haven't heard from Al. And the patch looks obviously
fine, whether it's actually the cause of some of the bugs or not.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on
On Tue, Jun 29, 2010 at 10:35:47AM -0700, Linus Torvalds wrote:
> On Tue, Jun 29, 2010 at 7:56 AM, Nick Piggin <npiggin(a)suse.de> wrote:
> > On Tue, Jun 29, 2010 at 09:02:14AM -0400, Christoph Hellwig wrote:
> >> This should actually be on it's way to Linus for .35, shouldn't it?
> >
> > Yeah, I was waiting for Al to reappear, but I think this is
> > probably the nicest way to solve the problem. Linus?
>
> I'll apply it. We have a couple of oopses listed for the superblock
> iterator, and I haven't heard from Al. And the patch looks obviously
> fine, whether it's actually the cause of some of the bugs or not.

OK. I only have managed to get it into an infininte loop but I think
it would be surely possible to oops it because the next pointer can
be uninitialised memory at that point.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on
On Tue, Jun 29, 2010 at 10:41 AM, Nick Piggin <npiggin(a)suse.de> wrote:
> On Tue, Jun 29, 2010 at 10:35:47AM -0700, Linus Torvalds wrote:
>>
>> I'll apply it. We have a couple of oopses listed for the superblock
>> iterator, and I haven't heard from Al. And the patch looks obviously
>> fine, whether it's actually the cause of some of the bugs or not.
>
> OK. I only have managed to get it into an infininte loop but I think
> it would be surely possible to oops it because the next pointer can
> be uninitialised memory at that point.

Look for "2.6.35-rc3 oops trying to suspend" on lkml, for example. No
guarantee that it's the same thing, but it's "iterate_supers()"
getting an oops when it does "down_read(&sb->s_umount)". Which really
looks suspiciously like "sb" just being totally bogus, most likely
because of this same issue.

So I dunno, but I asked Al to look at it, and haven't heard back.

Regardless, I think your patch is the right thing to do (modulo any
syntactic issues - and I think your final version was the best of the
lot).

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/