From: Jan Kara on
On Tue 03-08-10 12:55:20, Jan Kara wrote:
> On Tue 03-08-10 11:01:25, Wu Fengguang wrote:
> > On Tue, Aug 03, 2010 at 04:51:52AM +0800, Jan Kara wrote:
> > > On Fri 30-07-10 12:03:06, Wu Fengguang wrote:
> > > > On Fri, Jul 30, 2010 at 12:20:27AM +0800, Jan Kara wrote:
> > > > > On Thu 29-07-10 19:51:44, Wu Fengguang wrote:
> > > > > > The periodic/background writeback can run forever. So when any
> > > > > > sync work is enqueued, increase bdi->sync_works to notify the
> > > > > > active non-sync works to exit. Non-sync works queued after sync
> > > > > > works won't be affected.
> > > > > Hmm, wouldn't it be simpler logic to just make for_kupdate and
> > > > > for_background work always yield when there's some other work to do (as
> > > > > they are livelockable from the definition of the target they have) and
> > > > > make sure any other work isn't livelockable?
> > > >
> > > > Good idea!
> > > >
> > > > > The only downside is that
> > > > > non-livelockable work cannot be "fair" in the sense that we cannot switch
> > > > > inodes after writing MAX_WRITEBACK_PAGES.
> > > >
> > > > Cannot switch indoes _before_ finish with the current
> > > > MAX_WRITEBACK_PAGES batch?
> > > Well, even after writing all those MAX_WRITEBACK_PAGES. Because what you
> > > want to do in a non-livelockable work is: take inode, write it, never look at
> > > it again for this work. Because if you later return to the inode, it can
> > > have newer dirty pages and thus you cannot really avoid livelock. Of
> > > course, this all assumes .nr_to_write isn't set to something small. That
> > > avoids the livelock as well.
> >
> > I do have a poor man's solution that can handle this case.
> > https://kerneltrap.org/mailarchive/linux-fsdevel/2009/10/7/6476473/thread
> > It may do more extra works, but will stop livelock in theory.
> So I don't think sync work on it's own is a problem. There we can just
> give up any fairness and just go inode by inode. IMHO it's much simpler that
> way. The remaining types of work we have are "for_reclaim" and then ones
> triggered by filesystems to get rid of delayed allocated data. These cases
> can easily have well defined and low nr_to_write so they wouldn't be
> livelockable either. What do you think?
Fengguang, how about merging also the attached simple patch together with
my fix? With these two patches, I'm not able to trigger any sync livelock
while without one of them I hit them quite easily...

Honza
--
Jan Kara <jack(a)suse.cz>
SUSE Labs, CR
From: Wu Fengguang on
On Tue, Aug 03, 2010 at 08:39:22PM +0800, Jan Kara wrote:
> On Tue 03-08-10 12:55:20, Jan Kara wrote:
> > On Tue 03-08-10 11:01:25, Wu Fengguang wrote:
> > > On Tue, Aug 03, 2010 at 04:51:52AM +0800, Jan Kara wrote:
> > > > On Fri 30-07-10 12:03:06, Wu Fengguang wrote:
> > > > > On Fri, Jul 30, 2010 at 12:20:27AM +0800, Jan Kara wrote:
> > > > > > On Thu 29-07-10 19:51:44, Wu Fengguang wrote:
> > > > > > > The periodic/background writeback can run forever. So when any
> > > > > > > sync work is enqueued, increase bdi->sync_works to notify the
> > > > > > > active non-sync works to exit. Non-sync works queued after sync
> > > > > > > works won't be affected.
> > > > > > Hmm, wouldn't it be simpler logic to just make for_kupdate and
> > > > > > for_background work always yield when there's some other work to do (as
> > > > > > they are livelockable from the definition of the target they have) and
> > > > > > make sure any other work isn't livelockable?
> > > > >
> > > > > Good idea!
> > > > >
> > > > > > The only downside is that
> > > > > > non-livelockable work cannot be "fair" in the sense that we cannot switch
> > > > > > inodes after writing MAX_WRITEBACK_PAGES.
> > > > >
> > > > > Cannot switch indoes _before_ finish with the current
> > > > > MAX_WRITEBACK_PAGES batch?
> > > > Well, even after writing all those MAX_WRITEBACK_PAGES. Because what you
> > > > want to do in a non-livelockable work is: take inode, write it, never look at
> > > > it again for this work. Because if you later return to the inode, it can
> > > > have newer dirty pages and thus you cannot really avoid livelock. Of
> > > > course, this all assumes .nr_to_write isn't set to something small. That
> > > > avoids the livelock as well.
> > >
> > > I do have a poor man's solution that can handle this case.
> > > https://kerneltrap.org/mailarchive/linux-fsdevel/2009/10/7/6476473/thread
> > > It may do more extra works, but will stop livelock in theory.
> > So I don't think sync work on it's own is a problem. There we can just
> > give up any fairness and just go inode by inode. IMHO it's much simpler that
> > way. The remaining types of work we have are "for_reclaim" and then ones
> > triggered by filesystems to get rid of delayed allocated data. These cases
> > can easily have well defined and low nr_to_write so they wouldn't be
> > livelockable either. What do you think?
> Fengguang, how about merging also the attached simple patch together with
> my fix? With these two patches, I'm not able to trigger any sync livelock
> while without one of them I hit them quite easily...

This looks OK. However note that redirty_tail() can modify
dirtied_when unexpectedly. So the more we rely on wb_start, the more
possibility an inode is (wrongly) skipped by sync. I have a bunch of
patches to remove redirty_tail(). However they may not be good
candidates for 2.6.36..

Thanks,
Fengguang


> Honza
> --
> Jan Kara <jack(a)suse.cz>
> SUSE Labs, CR

> From e4b708115825bca6a1020eed2356e2aab0567e3a Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack(a)suse.cz>
> Date: Tue, 3 Aug 2010 10:35:02 +0000
> Subject: [PATCH] mm: Avoid resetting wb_start after each writeback round
>
> WB_SYNC_NONE writeback is done in rounds of 1024 pages so that we don't write
> out some huge inode for too long while starving writeout of other inodes. To
> avoid livelocks, we record time we started writeback in wbc->wb_start and do
> not write out inodes which were dirtied after this time. But currently,
> writeback_inodes_wb() resets wb_start each time it is called thus effectively
> invalidating this logic and making any WB_SYNC_NONE writeback prone to
> livelocks.
>
> This patch makes sure wb_start is set only once when we start writeback.
>
> Signed-off-by: Jan Kara <jack(a)suse.cz>
> ---
> fs/fs-writeback.c | 5 +++--
> 1 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 6bdc924..aa59394 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -530,7 +530,8 @@ void writeback_inodes_wb(struct bdi_writeback *wb,
> {
> int ret = 0;
>
> - wbc->wb_start = jiffies; /* livelock avoidance */
> + if (!wbc->wb_start)
> + wbc->wb_start = jiffies; /* livelock avoidance */
> spin_lock(&inode_lock);
> if (!wbc->for_kupdate || list_empty(&wb->b_io))
> queue_io(wb, wbc->older_than_this);
> @@ -559,7 +560,6 @@ static void __writeback_inodes_sb(struct super_block *sb,
> {
> WARN_ON(!rwsem_is_locked(&sb->s_umount));
>
> - wbc->wb_start = jiffies; /* livelock avoidance */
> spin_lock(&inode_lock);
> if (!wbc->for_kupdate || list_empty(&wb->b_io))
> queue_io(wb, wbc->older_than_this);
> @@ -625,6 +625,7 @@ static long wb_writeback(struct bdi_writeback *wb,
> wbc.range_end = LLONG_MAX;
> }
>
> + wbc.wb_start = jiffies; /* livelock avoidance */
> for (;;) {
> /*
> * Stop writeback when nr_pages has been consumed
> --
> 1.6.0.2
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jan Kara on
On Tue 03-08-10 20:59:24, Wu Fengguang wrote:
> On Tue, Aug 03, 2010 at 08:39:22PM +0800, Jan Kara wrote:
> > On Tue 03-08-10 12:55:20, Jan Kara wrote:
> > > On Tue 03-08-10 11:01:25, Wu Fengguang wrote:
> > > > On Tue, Aug 03, 2010 at 04:51:52AM +0800, Jan Kara wrote:
> > > > > On Fri 30-07-10 12:03:06, Wu Fengguang wrote:
> > > > > > On Fri, Jul 30, 2010 at 12:20:27AM +0800, Jan Kara wrote:
> > > > > > > On Thu 29-07-10 19:51:44, Wu Fengguang wrote:
> > > > > > > > The periodic/background writeback can run forever. So when any
> > > > > > > > sync work is enqueued, increase bdi->sync_works to notify the
> > > > > > > > active non-sync works to exit. Non-sync works queued after sync
> > > > > > > > works won't be affected.
> > > > > > > Hmm, wouldn't it be simpler logic to just make for_kupdate and
> > > > > > > for_background work always yield when there's some other work to do (as
> > > > > > > they are livelockable from the definition of the target they have) and
> > > > > > > make sure any other work isn't livelockable?
> > > > > >
> > > > > > Good idea!
> > > > > >
> > > > > > > The only downside is that
> > > > > > > non-livelockable work cannot be "fair" in the sense that we cannot switch
> > > > > > > inodes after writing MAX_WRITEBACK_PAGES.
> > > > > >
> > > > > > Cannot switch indoes _before_ finish with the current
> > > > > > MAX_WRITEBACK_PAGES batch?
> > > > > Well, even after writing all those MAX_WRITEBACK_PAGES. Because what you
> > > > > want to do in a non-livelockable work is: take inode, write it, never look at
> > > > > it again for this work. Because if you later return to the inode, it can
> > > > > have newer dirty pages and thus you cannot really avoid livelock. Of
> > > > > course, this all assumes .nr_to_write isn't set to something small. That
> > > > > avoids the livelock as well.
> > > >
> > > > I do have a poor man's solution that can handle this case.
> > > > https://kerneltrap.org/mailarchive/linux-fsdevel/2009/10/7/6476473/thread
> > > > It may do more extra works, but will stop livelock in theory.
> > > So I don't think sync work on it's own is a problem. There we can just
> > > give up any fairness and just go inode by inode. IMHO it's much simpler that
> > > way. The remaining types of work we have are "for_reclaim" and then ones
> > > triggered by filesystems to get rid of delayed allocated data. These cases
> > > can easily have well defined and low nr_to_write so they wouldn't be
> > > livelockable either. What do you think?
> > Fengguang, how about merging also the attached simple patch together with
> > my fix? With these two patches, I'm not able to trigger any sync livelock
> > while without one of them I hit them quite easily...
>
> This looks OK. However note that redirty_tail() can modify
> dirtied_when unexpectedly. So the more we rely on wb_start, the more
> possibility an inode is (wrongly) skipped by sync. I have a bunch of
> patches to remove redirty_tail(). However they may not be good
> candidates for 2.6.36..
Yes, I'm aware of this. But if I'm right, after your changes to the
logic in writeback_single_inode() Andrew has in his tree, we use
requeue_io() in case inode still has any dirty pages. Thus after these
patches we should be mostly fine. Shouldn't we?

Honza
--
Jan Kara <jack(a)suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Wu Fengguang on
> > Fengguang, how about merging also the attached simple patch together with
> > my fix? With these two patches, I'm not able to trigger any sync livelock
> > while without one of them I hit them quite easily...
>
> This looks OK. However note that redirty_tail() can modify
> dirtied_when unexpectedly. So the more we rely on wb_start, the more
> possibility an inode is (wrongly) skipped by sync. I have a bunch of
> patches to remove redirty_tail(). However they may not be good
> candidates for 2.6.36..

It looks that setting wb_start at the beginning of
writeback_inodes_wb() won't be easily affected by redirty_tail().

So

Reviewed-by: Wu Fengguang <fengguang.wu(a)intel.com>

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Wu Fengguang on
On Tue, Aug 03, 2010 at 09:44:31PM +0800, Wu Fengguang wrote:
> On Tue, Aug 03, 2010 at 09:22:16PM +0800, Wu Fengguang wrote:
> > > > Fengguang, how about merging also the attached simple patch together with
> > > > my fix? With these two patches, I'm not able to trigger any sync livelock
> > > > while without one of them I hit them quite easily...
> > >
> > > This looks OK. However note that redirty_tail() can modify
> > > dirtied_when unexpectedly. So the more we rely on wb_start, the more
> > > possibility an inode is (wrongly) skipped by sync. I have a bunch of
> > > patches to remove redirty_tail(). However they may not be good
> > > candidates for 2.6.36..
> >
> > It looks that setting wb_start at the beginning of
> > writeback_inodes_wb() won't be easily affected by redirty_tail().
>
> Except for this redirty_tail(), which may mess up the dirtied_when
> ordering in b_dirty and later on break the assumption of
> inode_dirtied_after(inode, wbc->wb_start).

Oh well, it seems still OK (it does sound crazy). Please ignore the
below patch, sorry for the noise..

Thanks,
Fengguang

> It can be replaced by a requeue_io() for now. Christoph mentioned a
> patchset to introduce sb->s_wb, which should be a better solution.
>
> Thanks,
> Fengguang
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index a178828..e56e68b 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -457,12 +457,7 @@ int generic_writeback_sb_inodes(struct super_block *sb,
>
> if (inode->i_sb != sb) {
> if (only_this_sb) {
> - /*
> - * We only want to write back data for this
> - * superblock, move all inodes not belonging
> - * to it back onto the dirty list.
> - */
> - redirty_tail(inode);
> + requeue_io(inode);
> continue;
> }
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/