From: Jan Kara on
On Thu 22-07-10 13:09:32, Wu Fengguang wrote:
> A background flush work may run for ever. So it's reasonable for it to
> mimic the kupdate behavior of syncing old/expired inodes first.
>
> The policy is
> - enqueue all newly expired inodes at each queue_io() time
> - retry with halfed expire interval until get some inodes to sync
Hmm, this logic looks a bit arbitrary to me. What I actually don't like
very much about this that when there aren't inodes older than say 2
seconds, you'll end up queueing just inodes between 2s and 1s. So I'd
rather just queue inodes older than the limit and if there are none, just
queue all other dirty inodes.

Honza

> CC: Jan Kara <jack(a)suse.cz>
> Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com>
> ---
> fs/fs-writeback.c | 20 ++++++++++++++------
> 1 file changed, 14 insertions(+), 6 deletions(-)
>
> --- linux-next.orig/fs/fs-writeback.c 2010-07-22 12:56:42.000000000 +0800
> +++ linux-next/fs/fs-writeback.c 2010-07-22 13:07:51.000000000 +0800
> @@ -217,14 +217,14 @@ static void move_expired_inodes(struct l
> struct writeback_control *wbc)
> {
> unsigned long expire_interval = 0;
> - unsigned long older_than_this;
> + unsigned long older_than_this = 0; /* reset to kill gcc warning */
> LIST_HEAD(tmp);
> struct list_head *pos, *node;
> struct super_block *sb = NULL;
> struct inode *inode;
> int do_sb_sort = 0;
>
> - if (wbc->for_kupdate) {
> + if (wbc->for_kupdate || wbc->for_background) {
> expire_interval = msecs_to_jiffies(dirty_expire_interval * 10);
> older_than_this = jiffies - expire_interval;
> }
> @@ -232,8 +232,15 @@ static void move_expired_inodes(struct l
> while (!list_empty(delaying_queue)) {
> inode = list_entry(delaying_queue->prev, struct inode, i_list);
> if (expire_interval &&
> - inode_dirtied_after(inode, older_than_this))
> - break;
> + inode_dirtied_after(inode, older_than_this)) {
> + if (wbc->for_background &&
> + list_empty(dispatch_queue) && list_empty(&tmp)) {
> + expire_interval >>= 1;
> + older_than_this = jiffies - expire_interval;
> + continue;
> + } else
> + break;
> + }
> if (sb && sb != inode->i_sb)
> do_sb_sort = 1;
> sb = inode->i_sb;
> @@ -521,7 +528,8 @@ void writeback_inodes_wb(struct bdi_writ
>
> wbc->wb_start = jiffies; /* livelock avoidance */
> spin_lock(&inode_lock);
> - if (!wbc->for_kupdate || list_empty(&wb->b_io))
> +
> + if (!(wbc->for_kupdate || wbc->for_background) || list_empty(&wb->b_io))
> queue_io(wb, wbc);
>
> while (!list_empty(&wb->b_io)) {
> @@ -550,7 +558,7 @@ static void __writeback_inodes_sb(struct
>
> wbc->wb_start = jiffies; /* livelock avoidance */
> spin_lock(&inode_lock);
> - if (!wbc->for_kupdate || list_empty(&wb->b_io))
> + if (!(wbc->for_kupdate || wbc->for_background) || list_empty(&wb->b_io))
> queue_io(wb, wbc);
> writeback_sb_inodes(sb, wb, wbc, true);
> spin_unlock(&inode_lock);
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <jack(a)suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Wu Fengguang on
On Sat, Jul 24, 2010 at 02:15:21AM +0800, Jan Kara wrote:
> On Thu 22-07-10 13:09:32, Wu Fengguang wrote:
> > A background flush work may run for ever. So it's reasonable for it to
> > mimic the kupdate behavior of syncing old/expired inodes first.
> >
> > The policy is
> > - enqueue all newly expired inodes at each queue_io() time
> > - retry with halfed expire interval until get some inodes to sync
> Hmm, this logic looks a bit arbitrary to me. What I actually don't like
> very much about this that when there aren't inodes older than say 2
> seconds, you'll end up queueing just inodes between 2s and 1s. So I'd
> rather just queue inodes older than the limit and if there are none, just
> queue all other dirty inodes.

You are proposing

- expire_interval >>= 1;
+ expire_interval = 0;

IMO this does not really simplify code or concept. If we can get the
"smoother" behavior in original patch without extra cost, why not?

Thanks,
Fengguang


> > CC: Jan Kara <jack(a)suse.cz>
> > Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com>
> > ---
> > fs/fs-writeback.c | 20 ++++++++++++++------
> > 1 file changed, 14 insertions(+), 6 deletions(-)
> >
> > --- linux-next.orig/fs/fs-writeback.c 2010-07-22 12:56:42.000000000 +0800
> > +++ linux-next/fs/fs-writeback.c 2010-07-22 13:07:51.000000000 +0800
> > @@ -217,14 +217,14 @@ static void move_expired_inodes(struct l
> > struct writeback_control *wbc)
> > {
> > unsigned long expire_interval = 0;
> > - unsigned long older_than_this;
> > + unsigned long older_than_this = 0; /* reset to kill gcc warning */
> > LIST_HEAD(tmp);
> > struct list_head *pos, *node;
> > struct super_block *sb = NULL;
> > struct inode *inode;
> > int do_sb_sort = 0;
> >
> > - if (wbc->for_kupdate) {
> > + if (wbc->for_kupdate || wbc->for_background) {
> > expire_interval = msecs_to_jiffies(dirty_expire_interval * 10);
> > older_than_this = jiffies - expire_interval;
> > }
> > @@ -232,8 +232,15 @@ static void move_expired_inodes(struct l
> > while (!list_empty(delaying_queue)) {
> > inode = list_entry(delaying_queue->prev, struct inode, i_list);
> > if (expire_interval &&
> > - inode_dirtied_after(inode, older_than_this))
> > - break;
> > + inode_dirtied_after(inode, older_than_this)) {
> > + if (wbc->for_background &&
> > + list_empty(dispatch_queue) && list_empty(&tmp)) {
> > + expire_interval >>= 1;
> > + older_than_this = jiffies - expire_interval;
> > + continue;
> > + } else
> > + break;
> > + }
> > if (sb && sb != inode->i_sb)
> > do_sb_sort = 1;
> > sb = inode->i_sb;
> > @@ -521,7 +528,8 @@ void writeback_inodes_wb(struct bdi_writ
> >
> > wbc->wb_start = jiffies; /* livelock avoidance */
> > spin_lock(&inode_lock);
> > - if (!wbc->for_kupdate || list_empty(&wb->b_io))
> > +
> > + if (!(wbc->for_kupdate || wbc->for_background) || list_empty(&wb->b_io))
> > queue_io(wb, wbc);
> >
> > while (!list_empty(&wb->b_io)) {
> > @@ -550,7 +558,7 @@ static void __writeback_inodes_sb(struct
> >
> > wbc->wb_start = jiffies; /* livelock avoidance */
> > spin_lock(&inode_lock);
> > - if (!wbc->for_kupdate || list_empty(&wb->b_io))
> > + if (!(wbc->for_kupdate || wbc->for_background) || list_empty(&wb->b_io))
> > queue_io(wb, wbc);
> > writeback_sb_inodes(sb, wb, wbc, true);
> > spin_unlock(&inode_lock);
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo(a)vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> Jan Kara <jack(a)suse.cz>
> SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Wu Fengguang on
On Mon, Jul 26, 2010 at 06:57:37PM +0800, Mel Gorman wrote:
> On Thu, Jul 22, 2010 at 01:09:32PM +0800, Wu Fengguang wrote:
> > A background flush work may run for ever. So it's reasonable for it to
> > mimic the kupdate behavior of syncing old/expired inodes first.
> >
> > The policy is
> > - enqueue all newly expired inodes at each queue_io() time
> > - retry with halfed expire interval until get some inodes to sync
> >
> > CC: Jan Kara <jack(a)suse.cz>
> > Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com>
>
> Ok, intuitively this would appear to tie into pageout where we want
> older inodes to be cleaned first by background flushers to limit the
> number of dirty pages encountered by page reclaim. If this is accurate,
> it should be detailed in the changelog.

Good suggestion. I'll add these lines:

This is to help reduce the number of dirty pages encountered by page
reclaim, eg. the pageout() calls. Normally older inodes contain older
dirty pages, which are more close to the end of the LRU lists. So
syncing older inodes first helps reducing the dirty pages reached by
the page reclaim code.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jan Kara on
On Mon 26-07-10 19:51:53, Wu Fengguang wrote:
> On Sat, Jul 24, 2010 at 02:15:21AM +0800, Jan Kara wrote:
> > On Thu 22-07-10 13:09:32, Wu Fengguang wrote:
> > > A background flush work may run for ever. So it's reasonable for it to
> > > mimic the kupdate behavior of syncing old/expired inodes first.
> > >
> > > The policy is
> > > - enqueue all newly expired inodes at each queue_io() time
> > > - retry with halfed expire interval until get some inodes to sync
> > Hmm, this logic looks a bit arbitrary to me. What I actually don't like
> > very much about this that when there aren't inodes older than say 2
> > seconds, you'll end up queueing just inodes between 2s and 1s. So I'd
> > rather just queue inodes older than the limit and if there are none, just
> > queue all other dirty inodes.
>
> You are proposing
>
> - expire_interval >>= 1;
> + expire_interval = 0;
>
> IMO this does not really simplify code or concept. If we can get the
> "smoother" behavior in original patch without extra cost, why not?
I agree there's no substantial code simplification. But I see a
substantial "behavior" simplification (just two sweeps instead of 10 or
so). But I don't really insist on the two sweeps, it's just that I don't
see a justification for the exponencial back off here... I mean what's the
point if the interval we queue gets really small? Why not just use
expire_interval/2 as a step if you want a smoother behavior?

Honza
> > > CC: Jan Kara <jack(a)suse.cz>
> > > Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com>
> > > ---
> > > fs/fs-writeback.c | 20 ++++++++++++++------
> > > 1 file changed, 14 insertions(+), 6 deletions(-)
> > >
> > > --- linux-next.orig/fs/fs-writeback.c 2010-07-22 12:56:42.000000000 +0800
> > > +++ linux-next/fs/fs-writeback.c 2010-07-22 13:07:51.000000000 +0800
> > > @@ -217,14 +217,14 @@ static void move_expired_inodes(struct l
> > > struct writeback_control *wbc)
> > > {
> > > unsigned long expire_interval = 0;
> > > - unsigned long older_than_this;
> > > + unsigned long older_than_this = 0; /* reset to kill gcc warning */
> > > LIST_HEAD(tmp);
> > > struct list_head *pos, *node;
> > > struct super_block *sb = NULL;
> > > struct inode *inode;
> > > int do_sb_sort = 0;
> > >
> > > - if (wbc->for_kupdate) {
> > > + if (wbc->for_kupdate || wbc->for_background) {
> > > expire_interval = msecs_to_jiffies(dirty_expire_interval * 10);
> > > older_than_this = jiffies - expire_interval;
> > > }
> > > @@ -232,8 +232,15 @@ static void move_expired_inodes(struct l
> > > while (!list_empty(delaying_queue)) {
> > > inode = list_entry(delaying_queue->prev, struct inode, i_list);
> > > if (expire_interval &&
> > > - inode_dirtied_after(inode, older_than_this))
> > > - break;
> > > + inode_dirtied_after(inode, older_than_this)) {
> > > + if (wbc->for_background &&
> > > + list_empty(dispatch_queue) && list_empty(&tmp)) {
> > > + expire_interval >>= 1;
> > > + older_than_this = jiffies - expire_interval;
> > > + continue;
> > > + } else
> > > + break;
> > > + }
> > > if (sb && sb != inode->i_sb)
> > > do_sb_sort = 1;
> > > sb = inode->i_sb;
> > > @@ -521,7 +528,8 @@ void writeback_inodes_wb(struct bdi_writ
> > >
> > > wbc->wb_start = jiffies; /* livelock avoidance */
> > > spin_lock(&inode_lock);
> > > - if (!wbc->for_kupdate || list_empty(&wb->b_io))
> > > +
> > > + if (!(wbc->for_kupdate || wbc->for_background) || list_empty(&wb->b_io))
> > > queue_io(wb, wbc);
> > >
> > > while (!list_empty(&wb->b_io)) {
> > > @@ -550,7 +558,7 @@ static void __writeback_inodes_sb(struct
> > >
> > > wbc->wb_start = jiffies; /* livelock avoidance */
> > > spin_lock(&inode_lock);
> > > - if (!wbc->for_kupdate || list_empty(&wb->b_io))
> > > + if (!(wbc->for_kupdate || wbc->for_background) || list_empty(&wb->b_io))
> > > queue_io(wb, wbc);
> > > writeback_sb_inodes(sb, wb, wbc, true);
> > > spin_unlock(&inode_lock);
> > >
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > > the body of a message to majordomo(a)vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > --
> > Jan Kara <jack(a)suse.cz>
> > SUSE Labs, CR
--
Jan Kara <jack(a)suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jan Kara on
On Mon 26-07-10 20:00:11, Wu Fengguang wrote:
> On Mon, Jul 26, 2010 at 06:57:37PM +0800, Mel Gorman wrote:
> > On Thu, Jul 22, 2010 at 01:09:32PM +0800, Wu Fengguang wrote:
> > > A background flush work may run for ever. So it's reasonable for it to
> > > mimic the kupdate behavior of syncing old/expired inodes first.
> > >
> > > The policy is
> > > - enqueue all newly expired inodes at each queue_io() time
> > > - retry with halfed expire interval until get some inodes to sync
> > >
> > > CC: Jan Kara <jack(a)suse.cz>
> > > Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com>
> >
> > Ok, intuitively this would appear to tie into pageout where we want
> > older inodes to be cleaned first by background flushers to limit the
> > number of dirty pages encountered by page reclaim. If this is accurate,
> > it should be detailed in the changelog.
>
> Good suggestion. I'll add these lines:
>
> This is to help reduce the number of dirty pages encountered by page
> reclaim, eg. the pageout() calls. Normally older inodes contain older
> dirty pages, which are more close to the end of the LRU lists. So
Well, this kind of implicitely assumes that once page is written, it
doesn't get accessed anymore, right? Which I imagine is often true but
not for all workloads... Anyway I think this behavior is a good start
also because it is kind of natural to users to see "old" files written
first.

> syncing older inodes first helps reducing the dirty pages reached by
> the page reclaim code.

Honza
--
Jan Kara <jack(a)suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/