From: Jan Kara on
On Thu 29-07-10 19:51:44, Wu Fengguang wrote:
> The periodic/background writeback can run forever. So when any
> sync work is enqueued, increase bdi->sync_works to notify the
> active non-sync works to exit. Non-sync works queued after sync
> works won't be affected.
Hmm, wouldn't it be simpler logic to just make for_kupdate and
for_background work always yield when there's some other work to do (as
they are livelockable from the definition of the target they have) and
make sure any other work isn't livelockable? The only downside is that
non-livelockable work cannot be "fair" in the sense that we cannot switch
inodes after writing MAX_WRITEBACK_PAGES.
I even had a patch for this but it's already outdated by now. But I
can refresh it if we decide this is the way to go.

Honza

>
> Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com>
> ---
> fs/fs-writeback.c | 13 +++++++++++++
> include/linux/backing-dev.h | 6 ++++++
> mm/backing-dev.c | 1 +
> 3 files changed, 20 insertions(+)
>
> --- linux-next.orig/fs/fs-writeback.c 2010-07-29 17:13:23.000000000 +0800
> +++ linux-next/fs/fs-writeback.c 2010-07-29 17:13:49.000000000 +0800
> @@ -80,6 +80,8 @@ static void bdi_queue_work(struct backin
>
> spin_lock(&bdi->wb_lock);
> list_add_tail(&work->list, &bdi->work_list);
> + if (work->for_sync)
> + atomic_inc(&bdi->wb.sync_works);
> spin_unlock(&bdi->wb_lock);
>
> /*
> @@ -633,6 +635,14 @@ static long wb_writeback(struct bdi_writ
> break;
>
> /*
> + * background/periodic works can run forever, need to abort
> + * on seeing any pending sync work, to prevent livelock it.
> + */
> + if (atomic_read(&wb->sync_works) &&
> + (work->for_background || work->for_kupdate))
> + break;
> +
> + /*
> * For background writeout, stop when we are below the
> * background dirty threshold
> */
> @@ -765,6 +775,9 @@ long wb_do_writeback(struct bdi_writebac
>
> wrote += wb_writeback(wb, work);
>
> + if (work->for_sync)
> + atomic_dec(&wb->sync_works);
> +
> /*
> * Notify the caller of completion if this is a synchronous
> * work item, otherwise just free it.
> --- linux-next.orig/include/linux/backing-dev.h 2010-07-29 17:13:23.000000000 +0800
> +++ linux-next/include/linux/backing-dev.h 2010-07-29 17:13:31.000000000 +0800
> @@ -50,6 +50,12 @@ struct bdi_writeback {
>
> unsigned long last_old_flush; /* last old data flush */
>
> + /*
> + * sync works queued, background works shall abort on seeing this,
> + * to prevent livelocking the sync works
> + */
> + atomic_t sync_works;
> +
> struct task_struct *task; /* writeback task */
> struct list_head b_dirty; /* dirty inodes */
> struct list_head b_io; /* parked for writeback */
> --- linux-next.orig/mm/backing-dev.c 2010-07-29 17:13:23.000000000 +0800
> +++ linux-next/mm/backing-dev.c 2010-07-29 17:13:31.000000000 +0800
> @@ -257,6 +257,7 @@ static void bdi_wb_init(struct bdi_write
>
> wb->bdi = bdi;
> wb->last_old_flush = jiffies;
> + atomic_set(&wb->sync_works, 0);
> INIT_LIST_HEAD(&wb->b_dirty);
> INIT_LIST_HEAD(&wb->b_io);
> INIT_LIST_HEAD(&wb->b_more_io);
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <jack(a)suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Wu Fengguang on
On Fri, Jul 30, 2010 at 12:20:27AM +0800, Jan Kara wrote:
> On Thu 29-07-10 19:51:44, Wu Fengguang wrote:
> > The periodic/background writeback can run forever. So when any
> > sync work is enqueued, increase bdi->sync_works to notify the
> > active non-sync works to exit. Non-sync works queued after sync
> > works won't be affected.
> Hmm, wouldn't it be simpler logic to just make for_kupdate and
> for_background work always yield when there's some other work to do (as
> they are livelockable from the definition of the target they have) and
> make sure any other work isn't livelockable?

Good idea!

> The only downside is that
> non-livelockable work cannot be "fair" in the sense that we cannot switch
> inodes after writing MAX_WRITEBACK_PAGES.

Cannot switch indoes _before_ finish with the current
MAX_WRITEBACK_PAGES batch?

> I even had a patch for this but it's already outdated by now. But I
> can refresh it if we decide this is the way to go.

I'm very interested in your old patch, would you post it? Let's see
which one is easier to work with :)

Thanks,
Fengguang

> > Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com>
> > ---
> > fs/fs-writeback.c | 13 +++++++++++++
> > include/linux/backing-dev.h | 6 ++++++
> > mm/backing-dev.c | 1 +
> > 3 files changed, 20 insertions(+)
> >
> > --- linux-next.orig/fs/fs-writeback.c 2010-07-29 17:13:23.000000000 +0800
> > +++ linux-next/fs/fs-writeback.c 2010-07-29 17:13:49.000000000 +0800
> > @@ -80,6 +80,8 @@ static void bdi_queue_work(struct backin
> >
> > spin_lock(&bdi->wb_lock);
> > list_add_tail(&work->list, &bdi->work_list);
> > + if (work->for_sync)
> > + atomic_inc(&bdi->wb.sync_works);
> > spin_unlock(&bdi->wb_lock);
> >
> > /*
> > @@ -633,6 +635,14 @@ static long wb_writeback(struct bdi_writ
> > break;
> >
> > /*
> > + * background/periodic works can run forever, need to abort
> > + * on seeing any pending sync work, to prevent livelock it.
> > + */
> > + if (atomic_read(&wb->sync_works) &&
> > + (work->for_background || work->for_kupdate))
> > + break;
> > +
> > + /*
> > * For background writeout, stop when we are below the
> > * background dirty threshold
> > */
> > @@ -765,6 +775,9 @@ long wb_do_writeback(struct bdi_writebac
> >
> > wrote += wb_writeback(wb, work);
> >
> > + if (work->for_sync)
> > + atomic_dec(&wb->sync_works);
> > +
> > /*
> > * Notify the caller of completion if this is a synchronous
> > * work item, otherwise just free it.
> > --- linux-next.orig/include/linux/backing-dev.h 2010-07-29 17:13:23.000000000 +0800
> > +++ linux-next/include/linux/backing-dev.h 2010-07-29 17:13:31.000000000 +0800
> > @@ -50,6 +50,12 @@ struct bdi_writeback {
> >
> > unsigned long last_old_flush; /* last old data flush */
> >
> > + /*
> > + * sync works queued, background works shall abort on seeing this,
> > + * to prevent livelocking the sync works
> > + */
> > + atomic_t sync_works;
> > +
> > struct task_struct *task; /* writeback task */
> > struct list_head b_dirty; /* dirty inodes */
> > struct list_head b_io; /* parked for writeback */
> > --- linux-next.orig/mm/backing-dev.c 2010-07-29 17:13:23.000000000 +0800
> > +++ linux-next/mm/backing-dev.c 2010-07-29 17:13:31.000000000 +0800
> > @@ -257,6 +257,7 @@ static void bdi_wb_init(struct bdi_write
> >
> > wb->bdi = bdi;
> > wb->last_old_flush = jiffies;
> > + atomic_set(&wb->sync_works, 0);
> > INIT_LIST_HEAD(&wb->b_dirty);
> > INIT_LIST_HEAD(&wb->b_io);
> > INIT_LIST_HEAD(&wb->b_more_io);
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo(a)vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> Jan Kara <jack(a)suse.cz>
> SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jan Kara on
On Fri 30-07-10 12:03:06, Wu Fengguang wrote:
> On Fri, Jul 30, 2010 at 12:20:27AM +0800, Jan Kara wrote:
> > On Thu 29-07-10 19:51:44, Wu Fengguang wrote:
> > > The periodic/background writeback can run forever. So when any
> > > sync work is enqueued, increase bdi->sync_works to notify the
> > > active non-sync works to exit. Non-sync works queued after sync
> > > works won't be affected.
> > Hmm, wouldn't it be simpler logic to just make for_kupdate and
> > for_background work always yield when there's some other work to do (as
> > they are livelockable from the definition of the target they have) and
> > make sure any other work isn't livelockable?
>
> Good idea!
>
> > The only downside is that
> > non-livelockable work cannot be "fair" in the sense that we cannot switch
> > inodes after writing MAX_WRITEBACK_PAGES.
>
> Cannot switch indoes _before_ finish with the current
> MAX_WRITEBACK_PAGES batch?
Well, even after writing all those MAX_WRITEBACK_PAGES. Because what you
want to do in a non-livelockable work is: take inode, write it, never look at
it again for this work. Because if you later return to the inode, it can
have newer dirty pages and thus you cannot really avoid livelock. Of
course, this all assumes .nr_to_write isn't set to something small. That
avoids the livelock as well.

> > I even had a patch for this but it's already outdated by now. But I
> > can refresh it if we decide this is the way to go.
>
> I'm very interested in your old patch, would you post it? Let's see
> which one is easier to work with :)
OK, attached is the patch. I've rebased it against 2.6.35.

Honza
--
Jan Kara <jack(a)suse.cz>
SUSE Labs, CR
From: Wu Fengguang on
On Tue, Aug 03, 2010 at 04:51:52AM +0800, Jan Kara wrote:
> On Fri 30-07-10 12:03:06, Wu Fengguang wrote:
> > On Fri, Jul 30, 2010 at 12:20:27AM +0800, Jan Kara wrote:
> > > On Thu 29-07-10 19:51:44, Wu Fengguang wrote:
> > > > The periodic/background writeback can run forever. So when any
> > > > sync work is enqueued, increase bdi->sync_works to notify the
> > > > active non-sync works to exit. Non-sync works queued after sync
> > > > works won't be affected.
> > > Hmm, wouldn't it be simpler logic to just make for_kupdate and
> > > for_background work always yield when there's some other work to do (as
> > > they are livelockable from the definition of the target they have) and
> > > make sure any other work isn't livelockable?
> >
> > Good idea!
> >
> > > The only downside is that
> > > non-livelockable work cannot be "fair" in the sense that we cannot switch
> > > inodes after writing MAX_WRITEBACK_PAGES.
> >
> > Cannot switch indoes _before_ finish with the current
> > MAX_WRITEBACK_PAGES batch?
> Well, even after writing all those MAX_WRITEBACK_PAGES. Because what you
> want to do in a non-livelockable work is: take inode, write it, never look at
> it again for this work. Because if you later return to the inode, it can
> have newer dirty pages and thus you cannot really avoid livelock. Of
> course, this all assumes .nr_to_write isn't set to something small. That
> avoids the livelock as well.

I do have a poor man's solution that can handle this case.
https://kerneltrap.org/mailarchive/linux-fsdevel/2009/10/7/6476473/thread
It may do more extra works, but will stop livelock in theory.

A related question is, what if some for_reclaim works get enqueued?
Shall we postpone the sync work as well? The global sync is not likely
to hit the dirty pages in a small memcg, or may take long time. It
seems not a high priority task though.

> > > I even had a patch for this but it's already outdated by now. But I
> > > can refresh it if we decide this is the way to go.
> >
> > I'm very interested in your old patch, would you post it? Let's see
> > which one is easier to work with :)
> OK, attached is the patch. I've rebased it against 2.6.35.
> Honza
> --
> Jan Kara <jack(a)suse.cz>
> SUSE Labs, CR

> From a6df0d4db148f983fe756df4791409db28dff459 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack(a)suse.cz>
> Date: Mon, 2 Aug 2010 22:30:25 +0200
> Subject: [PATCH] mm: Stop background writeback if there is other work queued for the thread
>
> Background writeback and kupdate-style writeback are easily livelockable
> (from a definition of their target). This is inconvenient because it can
> make sync(1) stall forever waiting on its queued work to be finished.
> Fix the problem by interrupting background and kupdate writeback if there
> is some other work to do. We can return to them after completing all the
> queued work.
>
> Signed-off-by: Jan Kara <jack(a)suse.cz>
> ---
> fs/fs-writeback.c | 8 ++++++++
> 1 files changed, 8 insertions(+), 0 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index d5be169..542471e 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -633,6 +633,14 @@ static long wb_writeback(struct bdi_writeback *wb,
> break;
>
> /*
> + * Background writeout and kupdate-style writeback are
> + * easily livelockable. Stop them if there is other work
> + * to do so that e.g. sync can proceed.
> + */
> + if ((work->for_background || work->for_kupdate) &&
> + !list_empty(&wb->bdi->work_list))
> + break;
> + /*

I like it. It's much simpler.

Reviewed-by: Wu Fengguang <fengguang.wu(a)intel.com>


Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jan Kara on
On Tue 03-08-10 11:01:25, Wu Fengguang wrote:
> On Tue, Aug 03, 2010 at 04:51:52AM +0800, Jan Kara wrote:
> > On Fri 30-07-10 12:03:06, Wu Fengguang wrote:
> > > On Fri, Jul 30, 2010 at 12:20:27AM +0800, Jan Kara wrote:
> > > > On Thu 29-07-10 19:51:44, Wu Fengguang wrote:
> > > > > The periodic/background writeback can run forever. So when any
> > > > > sync work is enqueued, increase bdi->sync_works to notify the
> > > > > active non-sync works to exit. Non-sync works queued after sync
> > > > > works won't be affected.
> > > > Hmm, wouldn't it be simpler logic to just make for_kupdate and
> > > > for_background work always yield when there's some other work to do (as
> > > > they are livelockable from the definition of the target they have) and
> > > > make sure any other work isn't livelockable?
> > >
> > > Good idea!
> > >
> > > > The only downside is that
> > > > non-livelockable work cannot be "fair" in the sense that we cannot switch
> > > > inodes after writing MAX_WRITEBACK_PAGES.
> > >
> > > Cannot switch indoes _before_ finish with the current
> > > MAX_WRITEBACK_PAGES batch?
> > Well, even after writing all those MAX_WRITEBACK_PAGES. Because what you
> > want to do in a non-livelockable work is: take inode, write it, never look at
> > it again for this work. Because if you later return to the inode, it can
> > have newer dirty pages and thus you cannot really avoid livelock. Of
> > course, this all assumes .nr_to_write isn't set to something small. That
> > avoids the livelock as well.
>
> I do have a poor man's solution that can handle this case.
> https://kerneltrap.org/mailarchive/linux-fsdevel/2009/10/7/6476473/thread
> It may do more extra works, but will stop livelock in theory.
So I don't think sync work on it's own is a problem. There we can just
give up any fairness and just go inode by inode. IMHO it's much simpler that
way. The remaining types of work we have are "for_reclaim" and then ones
triggered by filesystems to get rid of delayed allocated data. These cases
can easily have well defined and low nr_to_write so they wouldn't be
livelockable either. What do you think?

> A related question is, what if some for_reclaim works get enqueued?
> Shall we postpone the sync work as well? The global sync is not likely
> to hit the dirty pages in a small memcg, or may take long time. It
> seems not a high priority task though.
I see some incentive to do this but the simple thing with for_background
and for_kupdate work is that they are essentially state-less and so they
can be easily (and automatically) restarted. It would be really hard to
implement something like this for sync and still avoid livelocks.

> > > > I even had a patch for this but it's already outdated by now. But I
> > > > can refresh it if we decide this is the way to go.
> > >
> > > I'm very interested in your old patch, would you post it? Let's see
> > > which one is easier to work with :)
> > OK, attached is the patch. I've rebased it against 2.6.35.
> > Honza
> > --
> > Jan Kara <jack(a)suse.cz>
> > SUSE Labs, CR
>
> > From a6df0d4db148f983fe756df4791409db28dff459 Mon Sep 17 00:00:00 2001
> > From: Jan Kara <jack(a)suse.cz>
> > Date: Mon, 2 Aug 2010 22:30:25 +0200
> > Subject: [PATCH] mm: Stop background writeback if there is other work queued for the thread
> >
> > Background writeback and kupdate-style writeback are easily livelockable
> > (from a definition of their target). This is inconvenient because it can
> > make sync(1) stall forever waiting on its queued work to be finished.
> > Fix the problem by interrupting background and kupdate writeback if there
> > is some other work to do. We can return to them after completing all the
> > queued work.
> >
> > Signed-off-by: Jan Kara <jack(a)suse.cz>
> > ---
> > fs/fs-writeback.c | 8 ++++++++
> > 1 files changed, 8 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index d5be169..542471e 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -633,6 +633,14 @@ static long wb_writeback(struct bdi_writeback *wb,
> > break;
> >
> > /*
> > + * Background writeout and kupdate-style writeback are
> > + * easily livelockable. Stop them if there is other work
> > + * to do so that e.g. sync can proceed.
> > + */
> > + if ((work->for_background || work->for_kupdate) &&
> > + !list_empty(&wb->bdi->work_list))
> > + break;
> > + /*
>
> I like it. It's much simpler.
>
> Reviewed-by: Wu Fengguang <fengguang.wu(a)intel.com>
Thanks. I think I'll try to get this merged via Jens' tree in this
merge window.

Honza
--
Jan Kara <jack(a)suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/