From: Jens Axboe on
On 2010-07-06 01:34, Linus Torvalds wrote:
> On Mon, Jul 5, 2010 at 2:19 PM, Bill Davidsen <davidsen(a)tmr.com> wrote:
>>
>> Absent a small fix, and given that the big fix has a lot more testing than
>> any new patch might, in this case the quickie might be undesirable.
>> Particularly since posters here seem sure that code will be replaced in the
>> next version anyway, and lightly tested patch to obsolete code is actually
>> less conservative.
>
> I have to agree. Especially as the "big patch" just removes the
> fragile code that caused the problem in the first place. So in this
> case I do suspect that the bigger patch ends up being the safer one.

OK, it's up to you.

> But I obviously don't actually see the problem, so it would be good to
> get confirmation that Christoph's patch actually fixes things first.
> Ingo, does the one in this thread apply for you?

I'll update for-linus, Ingo can try that.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Linus Torvalds <torvalds(a)linux-foundation.org> wrote:

> On Mon, Jul 5, 2010 at 2:19 PM, Bill Davidsen <davidsen(a)tmr.com> wrote:
> >
> > Absent a small fix, and given that the big fix has a lot more testing than
> > any new patch might, in this case the quickie might be undesirable.
> > Particularly since posters here seem sure that code will be replaced in the
> > next version anyway, and lightly tested patch to obsolete code is actually
> > less conservative.
>
> I have to agree. Especially as the "big patch" just removes the fragile code
> that caused the problem in the first place. So in this case I do suspect
> that the bigger patch ends up being the safer one.

Yeah, i agree - especially since the smaller patch is still pretty large (not
a oneliner), plus it does not appear that the precise failure mode is fully
understood either.

> But I obviously don't actually see the problem, so it would be good to get
> confirmation that Christoph's patch actually fixes things first. Ingo, does
> the one in this thread apply for you?

Yes, the three larger patches survived overnight testing with 300+ iterations
and i did some other tests as well, which passed too. These are the patches i
applied:

a73dd720 writeback: remove writeback_inodes_wbc
9f98c0fa writeback: split writeback_inodes_wb
79338d2a writeback: simplify the write back thread queue

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On 2010-07-06 08:47, Ingo Molnar wrote:
>
> * Linus Torvalds <torvalds(a)linux-foundation.org> wrote:
>
>> On Mon, Jul 5, 2010 at 2:19 PM, Bill Davidsen <davidsen(a)tmr.com> wrote:
>>>
>>> Absent a small fix, and given that the big fix has a lot more testing than
>>> any new patch might, in this case the quickie might be undesirable.
>>> Particularly since posters here seem sure that code will be replaced in the
>>> next version anyway, and lightly tested patch to obsolete code is actually
>>> less conservative.
>>
>> I have to agree. Especially as the "big patch" just removes the fragile code
>> that caused the problem in the first place. So in this case I do suspect
>> that the bigger patch ends up being the safer one.
>
> Yeah, i agree - especially since the smaller patch is still pretty large (not
> a oneliner), plus it does not appear that the precise failure mode is fully
> understood either.

http://lkml.org/lkml/2010/7/5/232

It's about as straight forward as it can be :-)
It definitely fixes _a_ bug, but whether it's only that bug is not certain.
As long as Linus is fine with the larger fix, then I have no issues going
in that direction.

>> But I obviously don't actually see the problem, so it would be good to get
>> confirmation that Christoph's patch actually fixes things first. Ingo, does
>> the one in this thread apply for you?
>
> Yes, the three larger patches survived overnight testing with 300+ iterations
> and i did some other tests as well, which passed too. These are the patches i
> applied:
>
> a73dd720 writeback: remove writeback_inodes_wbc
> 9f98c0fa writeback: split writeback_inodes_wb
> 79338d2a writeback: simplify the write back thread queue

Great, I'll upstream these bits today. Thanks Ingo.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Jens Axboe <axboe(a)kernel.dk> wrote:

> On 2010-07-06 08:47, Ingo Molnar wrote:
> >
> > * Linus Torvalds <torvalds(a)linux-foundation.org> wrote:
> >
> >> On Mon, Jul 5, 2010 at 2:19 PM, Bill Davidsen <davidsen(a)tmr.com> wrote:
> >>>
> >>> Absent a small fix, and given that the big fix has a lot more testing than
> >>> any new patch might, in this case the quickie might be undesirable.
> >>> Particularly since posters here seem sure that code will be replaced in the
> >>> next version anyway, and lightly tested patch to obsolete code is actually
> >>> less conservative.
> >>
> >> I have to agree. Especially as the "big patch" just removes the fragile code
> >> that caused the problem in the first place. So in this case I do suspect
> >> that the bigger patch ends up being the safer one.
> >
> > Yeah, i agree - especially since the smaller patch is still pretty large (not
> > a oneliner), plus it does not appear that the precise failure mode is fully
> > understood either.
>
> http://lkml.org/lkml/2010/7/5/232
>
> It's about as straight forward as it can be :-)

Yeah, i meant Christoph's first patch. Yours looks simple enough but from the
discussion it wasnt clear (to me) yet whether it would fix the bug.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Hellwig on
On Tue, Jul 06, 2010 at 08:47:33AM +0200, Ingo Molnar wrote:
> Yeah, i agree - especially since the smaller patch is still pretty large (not
> a oneliner), plus it does not appear that the precise failure mode is fully
> understood either.

The two failure modes we saw are very precicely understood:

- access to ->state after it's been freed
- access to ->state after the caller which has the bdi_work struct on
stack has been woken up.

But I seriously fear that there are even more dragons hidden in this
maze of bit wakeups and rcu, so replacing it with a proper completion
seems by far safer to me.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/