From: Linus Torvalds on
On Thu, Jun 10, 2010 at 9:40 AM, Christoph Hellwig <hch(a)lst.de> wrote:
>
> Maybe give it a bit more beating in linux-next and send it off to Linus
> once he's back from his vacation?

Yes, I'd probably be happier with that. As long as it's "just" a
performance regression (and clearly not one that actually affects most
people all that noticeably), I'd much rather not have to worry about
any potential new issues being introduced right now.

So I'm more looking for "let's fix catastrophic events" for -rc3.
After I get back, I'm way more open to "let's fix any real bugs",
because then I can at least react to any fallout.

(I'm planning on trying to roughly keep up on email etc - I'm today
doing all my email using gmail to test how well that works as an
alternative to my usual "inside two firewalls" behavior. But I'm
seriously trying to avoid even having a laptop with me, because if I
have something where I can do kernel development and git pulls, I know
I'm not going to be just on vacation).

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On 2010-06-10 18:40, Christoph Hellwig wrote:
> On Thu, Jun 10, 2010 at 06:25:17PM +0200, Jens Axboe wrote:
>> I agree, it's late and it makes me nervous too. I had them cook for
>> a day, didn't see any problems. And Christoph would not send it in
>> unless it passes at least xfs qa, which is what found the problems
>> last time (the ones we reverted).
>>
>> It's fixing a regression where umount takes a LONG time if you have
>> a lot of dirty inodes, since it basically degenerates to a data
>> integrity writeback instead of a simple WB_SYNC_NONE. If it wasn't
>> fixing a nasty regression (the distros are all wanting a real fix
>> for this, it's a user problem), I would not be submitting this code
>> at this point in time.
>
> Maybe give it a bit more beating in linux-next and send it off to Linus
> once he's back from his vacation?

We can do that. Linus, I'll split split off the writeback parts and send
you a new pull request.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On 2010-06-10 18:55, Linus Torvalds wrote:
> On Thu, Jun 10, 2010 at 9:25 AM, Jens Axboe <jaxboe(a)fusionio.com> wrote:
>>
>> It's fixing a regression where umount takes a LONG time if you have
>> a lot of dirty inodes, since it basically degenerates to a data
>> integrity writeback instead of a simple WB_SYNC_NONE. If it wasn't
>> fixing a nasty regression (the distros are all wanting a real fix
>> for this, it's a user problem), I would not be submitting this code
>> at this point in time.
>
> I'm not sure if you noticed, we had a separate thread with Dave
> Chinner that resulted in three hopefully fairly minimal patches going
> in instead.
>
> See commits
>
> git log -3 d87815cb2090
>
> and I thought that last one (first one applied: "pay attention to
> wbc->nr_to_write") was the one that had fixed the worst XFS issues.
>
> But maybe it was an unrelated thing.

That's a different bug.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mark Lord on
On 10/06/10 12:44 PM, Brian Bloniarz wrote:
> On 06/10/2010 12:25 PM, Jens Axboe wrote:
>> On 2010-06-10 17:55, Linus Torvalds wrote:
>>> On Thu, Jun 10, 2010 at 6:44 AM, Jens Axboe<jaxboe(a)fusionio.com> wrote:
>>>>
>>>> - A set of patches fixing the WB_SYNC_NONE writeback from Christoph. So
>>>> we should finally have both functional and working WB_SYNC_NONE from
>>>> umount context.
>>>
>>> I _really_ think this is too late, considering how broken it has been.
>>> We already reverted the WB_SYNC_NONE things exactly because it didn't
>>> work, didn't we? I'm going to be off-line in two days, and this part
>>> of the pull request really makes me nervous, if only simply because of
>>> the history of it all (ie it's always been broken, why shouldn't it be
>>> broken now?).
>>>
>>> IOW, that's a lot of scary changes, that have historically not been
>>> safe or sufficiently tested, and have caused problems for various
>>> filesystems. Convince me why they should suddenly be ok to merge?
>>
>> I agree, it's late and it makes me nervous too. I had them cook for
>> a day, didn't see any problems. And Christoph would not send it in
>> unless it passes at least xfs qa, which is what found the problems
>> last time (the ones we reverted).
>>
>> It's fixing a regression where umount takes a LONG time if you have
>> a lot of dirty inodes, since it basically degenerates to a data
>> integrity writeback instead of a simple WB_SYNC_NONE. If it wasn't
>> fixing a nasty regression (the distros are all wanting a real fix
>> for this, it's a user problem), I would not be submitting this code
>> at this point in time.
>>
>
> Reinforcing that last point: from what I could figure out, Fedora 13
> is shipping the buggy WB_SYNC_NONE patch currently. Ubuntu 10.04 is
> shipping an in-kernel workaround that has serious performance
> drawbacks.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=15906 has links to the
> downstream bugs.
...

Jens, this bug has been biting my servers badly here for the past
few months -- umount after a backup (from ext4 to ext4) takes 3-4 minutes
instead of the expected 3-4 seconds.

Is there a patch file for this against 2.6.34 that I (and others) could use?

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On 2010-06-28 01:10, Mark Lord wrote:
> On 10/06/10 12:44 PM, Brian Bloniarz wrote:
>> On 06/10/2010 12:25 PM, Jens Axboe wrote:
>>> On 2010-06-10 17:55, Linus Torvalds wrote:
>>>> On Thu, Jun 10, 2010 at 6:44 AM, Jens Axboe<jaxboe(a)fusionio.com> wrote:
>>>>>
>>>>> - A set of patches fixing the WB_SYNC_NONE writeback from Christoph. So
>>>>> we should finally have both functional and working WB_SYNC_NONE from
>>>>> umount context.
>>>>
>>>> I _really_ think this is too late, considering how broken it has been.
>>>> We already reverted the WB_SYNC_NONE things exactly because it didn't
>>>> work, didn't we? I'm going to be off-line in two days, and this part
>>>> of the pull request really makes me nervous, if only simply because of
>>>> the history of it all (ie it's always been broken, why shouldn't it be
>>>> broken now?).
>>>>
>>>> IOW, that's a lot of scary changes, that have historically not been
>>>> safe or sufficiently tested, and have caused problems for various
>>>> filesystems. Convince me why they should suddenly be ok to merge?
>>>
>>> I agree, it's late and it makes me nervous too. I had them cook for
>>> a day, didn't see any problems. And Christoph would not send it in
>>> unless it passes at least xfs qa, which is what found the problems
>>> last time (the ones we reverted).
>>>
>>> It's fixing a regression where umount takes a LONG time if you have
>>> a lot of dirty inodes, since it basically degenerates to a data
>>> integrity writeback instead of a simple WB_SYNC_NONE. If it wasn't
>>> fixing a nasty regression (the distros are all wanting a real fix
>>> for this, it's a user problem), I would not be submitting this code
>>> at this point in time.
>>>
>>
>> Reinforcing that last point: from what I could figure out, Fedora 13
>> is shipping the buggy WB_SYNC_NONE patch currently. Ubuntu 10.04 is
>> shipping an in-kernel workaround that has serious performance
>> drawbacks.
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=15906 has links to the
>> downstream bugs.
> ..
>
> Jens, this bug has been biting my servers badly here for the past
> few months -- umount after a backup (from ext4 to ext4) takes 3-4 minutes
> instead of the expected 3-4 seconds.
>
> Is there a patch file for this against 2.6.34 that I (and others) could use?

It's the patch series from Christoph in my for-linus branch, I intend
to push it upstream when Linus is back and taking patches.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/