NFS: introduce writeback wait queue [Kernel]

Prev: 2.6.31 regression sis5513 PIO Mode 0 hang
Next: [PATCH] NFS: introduce writeback wait queue

From: Myklebust, Trond on 5 Oct 2009 07:10

On Oct 5, 2009, at 3:11, "Wu Fengguang" <fengguang.wu(a)intel.com> wrote:

> Hi all,
>
> This version makes two standalone functions for easier reuse.
>
> Before patch, nr_writeback is near 1G on my 2GB laptop:
>
> nr_writeback nr_dirty nr_unstable
> 203994 2 154469
> 203994 2 154469
>
> After patch, nr_writeback is limited to nfs_congestion_kb=42MB.
>
> nr_writeback nr_dirty nr_unstable
> 11180 34195 11754
> 9865 36821 8234
> 10137 36695 9338
>
> One minor problem I noticed is, NFS writeback is not very smooth.
> This per 0.1s sampled trace shows that it can sometimes stuck for
> up to 0.5s:
>
> nr_writeback nr_dirty nr_unstable
> 11055 37408 9599
> 10311 37315 10529
> 10869 35920 11459
> 10869 35920 11459
> 10869 35920 11459
> 10869 35920 11459
> 10869 35920 11459
> 10838 35891 10042
> 10466 35891 10414
> 10900 34744 11437
> 10249 34744 12088
> 10249 34744 12088
> 10249 34744 12088
> 10249 34744 12088
> 10249 34744 12088
> 10249 34744 12088
> 10133 34743 10663
> 10505 34743 11035
> 10970 34991 11345
> 10691 34991 11593
> 10691 34991 11593
> 10691 34991 11593
> 10691 34991 11593
> 10691 34991 11593
>
> Trond, I guess nr_writeback/nr_unstable are decreased in async RPC
> "complete" events. It is understandable that nr_dirty can sometimes
> stuck on local waits, but the "local determined" nr_dirty and "remote
> determined" nr_writeback/nr_unstable tend to stuck at the same time?
> Did I miss something (that could be obvious to you)?

It looks (at 7am in the morning after getting up at 4:30am) as though
the number of unstable pages is remaining constant, which would mean
that we're sending a lot of COMMIT requests (see nfsstat). Since
COMMIT is essentially an fsync call, it means that the server is going
to be slower.

Cheers
Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Wu Fengguang on 5 Oct 2009 09:20

On Mon, Oct 05, 2009 at 06:55:54PM +0800, Myklebust, Trond wrote:
> On Oct 5, 2009, at 3:40, "Wu Fengguang" <fengguang.wu(a)intel.com> wrote:
>
> > On Mon, Oct 05, 2009 at 03:35:51PM +0800, Wu Fengguang wrote:
> >> Trond, I see this trace on linux-next. There are no more dirty pages
> >> when `cp' aborts after filling up the partition:
> >>
> >> cp: writing `/mnt/test/zero3': No space left on device
> >>
> >> I noticed that since then nr_writeback is decreased very slowly
> >> (~100 pages per second). Looks like an interesting behavior.
> >
> > In the mean time, there are constant 7-8MB/s writes in the NFS server.
> > The network flow is much smaller ~400K/s. How can I debug this issue?
>
> Hi Fengguang
>
> This is deliberate behaviour. When asynchronous writes start recieving
> errors, then we switch to synchronous write mode until the error
> condition clears.

Ah yes. After ENOSPC, with nfsstat I saw the client side write/commit
numbers remain constant, while the server side write number increases
~200 per-second, and commit number also remain static. When all client
side nr_writeback drops to 0, the server side write number also stops.

> The reason is for doing so is firstly because some filesystems (XFS)
> perform very poorly under ENOSPC, and so it takes forever to write
> back pages (we don't want to cancel all writebacks for temporary
> conditions like ENOSPC). It also allows us to deliver the errors more
> promptly to the application.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Wu Fengguang on 5 Oct 2009 10:00

On Mon, Oct 05, 2009 at 07:01:10PM +0800, Myklebust, Trond wrote:
> On Oct 5, 2009, at 3:11, "Wu Fengguang" <fengguang.wu(a)intel.com> wrote:
>
> > Hi all,
> >
> > This version makes two standalone functions for easier reuse.
> >
> > Before patch, nr_writeback is near 1G on my 2GB laptop:
> >
> > nr_writeback nr_dirty nr_unstable
> > 203994 2 154469
> > 203994 2 154469
> >
> > After patch, nr_writeback is limited to nfs_congestion_kb=42MB.
> >
> > nr_writeback nr_dirty nr_unstable
> > 11180 34195 11754
> > 9865 36821 8234
> > 10137 36695 9338
> >
> > One minor problem I noticed is, NFS writeback is not very smooth.
> > This per 0.1s sampled trace shows that it can sometimes stuck for
> > up to 0.5s:
> >
> > nr_writeback nr_dirty nr_unstable
> > 11055 37408 9599
> > 10311 37315 10529
> > 10869 35920 11459
> > 10869 35920 11459
> > 10869 35920 11459
> > 10869 35920 11459
> > 10869 35920 11459
> > 10838 35891 10042
> > 10466 35891 10414
> > 10900 34744 11437
> > 10249 34744 12088
> > 10249 34744 12088
> > 10249 34744 12088
> > 10249 34744 12088
> > 10249 34744 12088
> > 10249 34744 12088
> > 10133 34743 10663
> > 10505 34743 11035
> > 10970 34991 11345
> > 10691 34991 11593
> > 10691 34991 11593
> > 10691 34991 11593
> > 10691 34991 11593
> > 10691 34991 11593
> >
> > Trond, I guess nr_writeback/nr_unstable are decreased in async RPC
> > "complete" events. It is understandable that nr_dirty can sometimes
> > stuck on local waits, but the "local determined" nr_dirty and "remote
> > determined" nr_writeback/nr_unstable tend to stuck at the same time?
> > Did I miss something (that could be obvious to you)?
>
> It looks (at 7am in the morning after getting up at 4:30am) as though

Wow early bird!

> the number of unstable pages is remaining constant, which would mean
> that we're sending a lot of COMMIT requests (see nfsstat). Since
> COMMIT is essentially an fsync call, it means that the server is going
> to be slower.

Here are the numbers:

Client nfs v3:
null getattr setattr lookup access readlink
0 0% 18007 3% 67 0% 1752 0% 2499 0% 109 0%
read write create mkdir symlink mknod
1742 0% 518695 95% 77 0% 0 0% 0 0% 2 0%
remove rmdir rename link readdir readdirplus
104 0% 0 0% 30 0% 40 0% 15 0% 178 0%
fsstat fsinfo pathconf commit
140 0% 2 0% 1 0% 2461 0%

Server nfs v3:
null getattr setattr lookup access readlink
2 0% 18000 3% 67 0% 1752 0% 2495 0% 109 0%
read write create mkdir symlink mknod
1742 0% 518695 95% 77 0% 0 0% 0 0% 2 0%
remove rmdir rename link readdir readdirplus
104 0% 0 0% 30 0% 40 0% 15 0% 178 0%
fsstat fsinfo pathconf commit
140 0% 3 0% 1 0% 2461 0%

I noticed that dd often sleep in nfs_updatepage/nfs_wait_on_request,
is it because it's doing 512byte writes and thus have to wait on
subsequent in-page writes? I guess this may hurt performance on big
network latency?

[ 268.020588] dd D ffff880002735460 2608 3688 3534 0x00000000
[ 268.020588] ffff8800777c3a38 0000000000000046 0000000000000000 ffff880002735460
[ 268.020588] 000000000000e388 ffff8800777c3fd8 ffff8800775346e0 ffffffff8192c8e0
[ 268.020588] ffff880077534a68 0000000000000082 00000000ffffc9fb ffff8800061d4758
[ 268.020588] Call Trace:
[ 268.020588] [<ffffffff8109365d>] ? trace_hardirqs_on+0xd/0x10
[ 268.020588] [<ffffffff8123dc7a>] nfs_wait_bit_killable+0x2a/0x40
[ 268.020588] [<ffffffff81695a82>] __wait_on_bit+0x62/0x90
[ 268.020588] [<ffffffff8123dc50>] ? nfs_wait_bit_killable+0x0/0x40
[ 268.020588] [<ffffffff8123dc50>] ? nfs_wait_bit_killable+0x0/0x40
[ 268.020588] [<ffffffff81695b29>] out_of_line_wait_on_bit+0x79/0x90
[ 268.020588] [<ffffffff8107e2f0>] ? wake_bit_function+0x0/0x50
[ 268.020588] [<ffffffff81243aef>] nfs_wait_on_request+0x2f/0x40
[ 268.020588] [<ffffffff812490a6>] nfs_updatepage+0x2e6/0x540
[ 268.020588] [<ffffffff81239dc2>] nfs_write_end+0x62/0x2c0
[ 268.020588] [<ffffffff810fd469>] generic_file_buffered_write+0x179/0x2a0
[ 268.020588] [<ffffffff810935f5>] ? trace_hardirqs_on_caller+0x155/0x1b0
[ 268.020588] [<ffffffff810fd99d>] __generic_file_aio_write+0x26d/0x440
[ 268.020588] [<ffffffff810fdbbe>] ? generic_file_aio_write+0x4e/0xd0
[ 268.020588] [<ffffffff810fdbd4>] generic_file_aio_write+0x64/0xd0
[ 268.020588] [<ffffffff8123ae66>] nfs_file_write+0x136/0x210
[ 268.020588] [<ffffffff8114d1e9>] do_sync_write+0xf9/0x140
[ 268.020588] [<ffffffff8107e2b0>] ? autoremove_wake_function+0x0/0x40
[ 268.020588] [<ffffffff8111905c>] ? might_fault+0x5c/0xb0
[ 268.020588] [<ffffffff8114de3f>] vfs_write+0xcf/0x1c0
[ 268.020588] [<ffffffff8114e035>] sys_write+0x55/0x90
[ 268.020588] [<ffffffff8100c0b2>] system_call_fastpath+0x16/0x1b

Thanks,
Fengguang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Wu Fengguang on 5 Oct 2009 20:20

On Mon, Oct 05, 2009 at 07:00:11PM +0800, Jens Axboe wrote:
> On Sun, Oct 04 2009, Wu Fengguang wrote:
> > Hi Jens,
> >
> > This is a bug fix for 2.6.32. Maybe other not block-queue based
> > filesystems will have similar issues ..
>
> Not that I'm aware of, the NFS use is fairly special. Given that this is

Sorry for the confusion. 2.6.32 is safe. I tested NFS, fuse and cifs.
NFS writes will be throttled at the dirty limit, and fuse/cifs see
near zero nr_writeback/nr_dirty numbers during heavy write.

NFS and fuse does set the bdi congestion state and somehow expects it to
backoff background flushing. This IO priority thing could be fixed for
next merged window.

> purely in the realm of nfs/, I'll let Trond decide how to include and
> push this (when a final patch is agreed upon).
>
> Thanks for looking into this!

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev |
Pages: 1 2
Prev: 2.6.31 regression sis5513 PIO Mode 0 hang
Next: [PATCH] NFS: introduce writeback wait queue