From: Vivek Goyal on
On Fri, Jul 02, 2010 at 03:58:13PM -0400, Jeff Moyer wrote:

[..]
> Changes from the last posting:
> - Yielding no longer expires the current queue. Instead, it sets up new
> requests from the target process so that they are issued in the yielding
> process' cfqq. This means that we don't need to worry about losing group
> or workload share.
> - Journal commits are now synchronous I/Os, which was required to get any
> sort of performance out of the fs_mark process in the presence of a
> competing reader.
> - WRITE_SYNC I/O no longer sets RQ_NOIDLE, for a similar reason.

Hi Jeff,

So this patchset relies on idling on WRITE_SYNC queues. Though in general
we don't have examples that why one should idle on processes doing WRITE_SYNC
IO because previous IO does not tell anything about the upcoming IO. I am
bringing up this point again to make sure that fundamentally we agree that
continue to idle on WRITE_SYNC is the right thing to do otherwise this patch
will fall apart.

I have yet to go through the patch in detail but allowing other queue to
dispatch requests in the same queue sounds like queue merging. So can
we use that semantics to say elv_merge_context() or elv_merge_queue()
instead of elv_yield(). In the code we can just merge the two queues when
the next request comes in and separate them out at the slice expiry I
guess.

Thanks
Vivek

> - I did test OCFS2, and it does experience performance improvements, though
> I forgot to record those.
>
> Previous postings can be found here:
> http://lkml.org/lkml/2010/4/1/344
> http://lkml.org/lkml/2010/4/7/325
> http://lkml.org/lkml/2010/4/14/394
> http://lkml.org/lkml/2010/5/18/365
> http://lkml.org/lkml/2010/6/22/338
>
> [1] http://lkml.org/lkml/2010/6/21/307
>
> [PATCH 1/6] block: Implement a blk_yield function to voluntarily give up the I/O scheduler.
> [PATCH 2/6] jbd: yield the device queue when waiting for commits
> [PATCH 3/6] jbd2: yield the device queue when waiting for journal commits
> [PATCH 4/6] jbd: use WRITE_SYNC for journal I/O
> [PATCH 5/6] jbd2: use WRITE_SYNC for journal I/O
> [PATCH 6/6] block: remove RQ_NOIDLE from WRITE_SYNC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Tao Ma on
Hi Jeff,

On 07/03/2010 03:58 AM, Jeff Moyer wrote:
> Hi,
>
> Running iozone or fs_mark with fsync enabled, the performance of CFQ is
> far worse than that of deadline for enterprise class storage when dealing
> with file sizes of 8MB or less. I used the following command line as a
> representative test case:
>
> fs_mark -S 1 -D 10000 -N 100000 -d /mnt/test/fs_mark -s 65536 -t 1 -w 4096 -F
>
I ran the script with "35-rc4 + this patch version" for an ocfs2 volume,
and get no hang now. Thanks for the work. I also have some number for
you. See below.
>
> Because the iozone process is issuing synchronous writes, it is put
> onto CFQ's SYNC service tree. The significance of this is that CFQ
> will idle for up to 8ms waiting for requests on such queues. So,
> what happens is that the iozone process will issue, say, 64KB worth
> of write I/O. That I/O will just land in the page cache. Then, the
> iozone process does an fsync which forces those I/Os to disk as
> synchronous writes. Then, the file system's fsync method is invoked,
> and for ext3/4, it calls log_start_commit followed by log_wait_commit.
> Because those synchronous writes were forced out in the context of the
> iozone process, CFQ will now idle on iozone's cfqq waiting for more I/O.
> However, iozone's progress is gated by the journal thread, now.
>
> With this patch series applied (in addition to the two other patches I
> sent [1]), CFQ now achieves 530.82 files / second.
>
> I also wanted to improve the performance of the fsync-ing process in the
> presence of a competing sequential reader. The workload I used for that
> was a fio job that did sequential buffered 4k reads while running the fs_mark
> process. The run-time was 30 seconds, except where otherwise noted.
>
> Deadline got 450 files/second while achieving a throughput of 78.2 MB/s for
> the sequential reader. CFQ, unpatched, did not finish an fs_mark run
> in 30 seconds. I had to bump the time of the test up to 5 minutes, and then
> CFQ saw an fs_mark performance of 6.6 files/second and sequential reader
> throughput of 137.2MB/s.
>
> The fs_mark process was being starved as the WRITE_SYNC I/O is marked
> with RQ_NOIDLE, and regular WRITES are part of the async workload by
> default. So, a single request would be served from either the fs_mark
> process or the journal thread, and then they would give up the I/O
> scheduler.
>
> After applying this patch set, CFQ can now perform 113.2 files/second while
> achieving a throughput of 78.6 MB/s for the sequential reader. In table
> form, the results (all averages of 5 runs) look like this:
>
> just just
> fs_mark fio mixed
> -------------------------------+--------------
> deadline 529.44 151.4 | 450.0 78.2
> vanilla cfq 107.88 164.4 | 6.6 137.2
> patched cfq 530.82 158.7 | 113.2 78.6
Just some updates from the test of ocfs2.
fs_mark
------------------------
deadline 386.3
vanilla cfq 59.7
patched cfq 366.2

So there is really a fantastic improvement at least from what fs_mark
gives us. Great thanks.

Regards,
Tao
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/