writeback: dont redirty tail an inode with dirty pages [Kernel]

Prev: perf: Add DWARF register lookup for SH
Next: linux-next: build failure after merge of the vfs tree

From: Dave Chinner on 11 Jul 2010 22:10

On Sun, Jul 11, 2010 at 10:07:00AM +0800, Wu Fengguang wrote:
> This avoids delaying writeback for an expired (XFS) inode with lots of
> dirty pages, but no active dirtier at the moment. Previously we only do
> that for the kupdate case.
>
> CC: Dave Chinner <david(a)fromorbit.com>
> CC: Christoph Hellwig <hch(a)infradead.org>
> Acked-by: Jan Kara <jack(a)suse.cz>
> Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com>
> ---
> fs/fs-writeback.c | 20 +++++++-------------
> 1 file changed, 7 insertions(+), 13 deletions(-)
>
> --- linux-next.orig/fs/fs-writeback.c 2010-07-11 08:53:44.000000000 +0800
> +++ linux-next/fs/fs-writeback.c 2010-07-11 08:57:35.000000000 +0800
> @@ -367,18 +367,7 @@ writeback_single_inode(struct inode *ino
> spin_lock(&inode_lock);
> inode->i_state &= ~I_SYNC;
> if (!(inode->i_state & I_FREEING)) {
> - if ((inode->i_state & I_DIRTY_PAGES) && wbc->for_kupdate) {
> - /*
> - * More pages get dirtied by a fast dirtier.
> - */
> - goto select_queue;
> - } else if (inode->i_state & I_DIRTY) {
> - /*
> - * At least XFS will redirty the inode during the
> - * writeback (delalloc) and on io completion (isize).
> - */
> - redirty_tail(inode);
> - } else if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> + if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> /*
> * We didn't write back all the pages. nfs_writepages()
> * sometimes bales out without doing anything. Redirty
> @@ -400,7 +389,6 @@ writeback_single_inode(struct inode *ino
> * soon as the queue becomes uncongested.
> */
> inode->i_state |= I_DIRTY_PAGES;
> -select_queue:
> if (wbc->nr_to_write <= 0) {
> /*
> * slice used up: queue for next turn
> @@ -423,6 +411,12 @@ select_queue:
> inode->i_state |= I_DIRTY_PAGES;
> redirty_tail(inode);
> }
> + } else if (inode->i_state & I_DIRTY) {
> + /*
> + * At least XFS will redirty the inode during the
> + * writeback (delalloc) and on io completion (isize).
> + */
> + redirty_tail(inode);

I'd drop the mention of XFS here - any filesystem that does delayed
allocation or unwritten extent conversion after Io completion will
cause this. Perhaps make the comment:

/*
* Filesystems can dirty the inode during writeback
* operations, such as delayed allocation during submission
* or metadata updates after data IO completion.
*/

Cheers,

Dave.
--
Dave Chinner
david(a)fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Wu Fengguang on 12 Jul 2010 11:40

> > + } else if (inode->i_state & I_DIRTY) {
> > + /*
> > + * At least XFS will redirty the inode during the
> > + * writeback (delalloc) and on io completion (isize).
> > + */
> > + redirty_tail(inode);
>
> I'd drop the mention of XFS here - any filesystem that does delayed
> allocation or unwritten extent conversion after Io completion will
> cause this. Perhaps make the comment:
>
> /*
> * Filesystems can dirty the inode during writeback
> * operations, such as delayed allocation during submission
> * or metadata updates after data IO completion.
> */

Thanks, comments updated accordingly.

---
writeback: don't redirty tail an inode with dirty pages

This avoids delaying writeback for an expired (XFS) inode with lots of
dirty pages, but no active dirtier at the moment. Previously we only do
that for the kupdate case.

CC: Dave Chinner <david(a)fromorbit.com>
CC: Christoph Hellwig <hch(a)infradead.org>
Acked-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com>
---
fs/fs-writeback.c | 22 +++++++++-------------
1 file changed, 9 insertions(+), 13 deletions(-)

--- linux-next.orig/fs/fs-writeback.c 2010-07-11 09:13:30.000000000 +0800
+++ linux-next/fs/fs-writeback.c 2010-07-12 23:26:06.000000000 +0800
@@ -367,18 +367,7 @@ writeback_single_inode(struct inode *ino
spin_lock(&inode_lock);
inode->i_state &= ~I_SYNC;
if (!(inode->i_state & I_FREEING)) {
- if ((inode->i_state & I_DIRTY_PAGES) && wbc->for_kupdate) {
- /*
- * More pages get dirtied by a fast dirtier.
- */
- goto select_queue;
- } else if (inode->i_state & I_DIRTY) {
- /*
- * At least XFS will redirty the inode during the
- * writeback (delalloc) and on io completion (isize).
- */
- redirty_tail(inode);
- } else if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
+ if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
/*
* We didn't write back all the pages. nfs_writepages()
* sometimes bales out without doing anything. Redirty
@@ -400,7 +389,6 @@ writeback_single_inode(struct inode *ino
* soon as the queue becomes uncongested.
*/
inode->i_state |= I_DIRTY_PAGES;
-select_queue:
if (wbc->nr_to_write <= 0) {
/*
* slice used up: queue for next turn
@@ -423,6 +411,14 @@ select_queue:
inode->i_state |= I_DIRTY_PAGES;
redirty_tail(inode);
}
+ } else if (inode->i_state & I_DIRTY) {
+ /*
+ * Filesystems can dirty the inode during writeback
+ * operations, such as delayed allocation during
+ * submission or metadata updates after data IO
+ * completion.
+ */
+ redirty_tail(inode);
} else if (atomic_read(&inode->i_count)) {
/*
* The inode is clean, inuse
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andrew Morton on 12 Jul 2010 18:20

On Mon, 12 Jul 2010 23:31:27 +0800
Wu Fengguang <fengguang.wu(a)intel.com> wrote:

> > > + } else if (inode->i_state & I_DIRTY) {
> > > + /*
> > > + * At least XFS will redirty the inode during the
> > > + * writeback (delalloc) and on io completion (isize).
> > > + */
> > > + redirty_tail(inode);
> >
> > I'd drop the mention of XFS here - any filesystem that does delayed
> > allocation or unwritten extent conversion after Io completion will
> > cause this. Perhaps make the comment:
> >
> > /*
> > * Filesystems can dirty the inode during writeback
> > * operations, such as delayed allocation during submission
> > * or metadata updates after data IO completion.
> > */
>
> Thanks, comments updated accordingly.
>
> ---
> writeback: don't redirty tail an inode with dirty pages
>
> This avoids delaying writeback for an expired (XFS) inode with lots of
> dirty pages, but no active dirtier at the moment. Previously we only do
> that for the kupdate case.
>

You didn't actually explain the _reason_ for making this change.
Please always do that.

The patch is... surprisingly complicated, although the end result
looks OK. This is not aided by the partial duplication between
mapping_tagged(PAGECACHE_TAG_DIRTY) and I_DIRTY_PAGES. I don't think
we can easily remove I_DIRTY_PAGES because it's used for the
did-someone-just-dirty-a-page test here.

This code is way too complex and fragile and I fear that anything we do
to it will break something :(

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Wu Fengguang on 15 Jul 2010 11:40

On Tue, Jul 13, 2010 at 06:13:17AM +0800, Andrew Morton wrote:
> On Mon, 12 Jul 2010 23:31:27 +0800
> Wu Fengguang <fengguang.wu(a)intel.com> wrote:
>
> > > > + } else if (inode->i_state & I_DIRTY) {
> > > > + /*
> > > > + * At least XFS will redirty the inode during the
> > > > + * writeback (delalloc) and on io completion (isize).
> > > > + */
> > > > + redirty_tail(inode);
> > >
> > > I'd drop the mention of XFS here - any filesystem that does delayed
> > > allocation or unwritten extent conversion after Io completion will
> > > cause this. Perhaps make the comment:
> > >
> > > /*
> > > * Filesystems can dirty the inode during writeback
> > > * operations, such as delayed allocation during submission
> > > * or metadata updates after data IO completion.
> > > */
> >
> > Thanks, comments updated accordingly.
> >
> > ---
> > writeback: don't redirty tail an inode with dirty pages
> >
> > This avoids delaying writeback for an expired (XFS) inode with lots of
> > dirty pages, but no active dirtier at the moment. Previously we only do
> > that for the kupdate case.
> >
>
> You didn't actually explain the _reason_ for making this change.
> Please always do that.

OK. It's actually extending commit b3af9468ae from the kupdate-only case to
both kupdate and !kupdate cases.

The commit documented the reason:

Debug traces show that in per-bdi writeback, the inode under writeback
almost always get redirtied by a busy dirtier. We used to call
redirty_tail() in this case, which could delay inode for up to 30s.

This is unacceptable because it now happens so frequently for plain cp/dd,
that the accumulated delays could make writeback of big files very slow.

So let's distinguish between data redirty and metadata only redirty.
The first one is caused by a busy dirtier, while the latter one could
happen in XFS, NFS, etc. when they are doing delalloc or updating isize.

Commit b3af9468ae only does that for kupdate case because requeue_io() was
only called in the kupdate case. Now we are merging the kupdate and !kupdate
cases in patch 6/6 (why not?), so is this patch.

> The patch is... surprisingly complicated, although the end result
> looks OK. This is not aided by the partial duplication between
> mapping_tagged(PAGECACHE_TAG_DIRTY) and I_DIRTY_PAGES. I don't think
> we can easily remove I_DIRTY_PAGES because it's used for the
> did-someone-just-dirty-a-page test here.

I double checked I_DIRTY_PAGES. The main difference to PAGECACHE_TAG_DIRTY is:
I_DIRTY_PAGES (at the line removed by this patch) means there are _new_ pages
get dirtied during writeback, while PAGECACHE_TAG_DIRTY means there are dirty
pages. In this sense, if the I_DIRTY_PAGES handling is the same as
PAGECACHE_TAG_DIRTY, the code can be merged into PAGECACHE_TAG_DIRTY, as this
patch does.

The other minor differences are

- in *_set_page_dirty*(), PAGECACHE_TAG_DIRTY is set racelessly, while
I_DIRTY_PAGES might be set on the inode for a page just truncated.
The difference has no real impact on this patch (it's actually
slightly better now).

- afs_fsync() always set I_DIRTY_PAGES after calling afs_writepages().
The call was there in the first day (introduce by David Howells).
What was the intention, hmm..?

> This code is way too complex and fragile and I fear that anything we do
> to it will break something :(

Agreed. Let's try to simplify it :)

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

|
Pages: 1
Prev: perf: Add DWARF register lookup for SH
Next: linux-next: build failure after merge of the vfs tree