ext3/4: enhance fsync performance when using cfq [Kernel]

Prev: [Bug #15518] CONFIG_NO_BOOTMEM=y breaks boot on 32bit
Next: [Bug #15655] corrupt ext3 fs and partial freeze

From: Jens Axboe on 15 Apr 2010 06:40

On Wed, Apr 14 2010, Jeff Moyer wrote:
> Hi,
>
> The previous two postings can be found here:
> http://lkml.org/lkml/2010/4/1/344
> and here:
> http://lkml.org/lkml/2010/4/7/325
>
> The basic problem is that, when running iozone on smallish files (up to
> 8MB in size) and including fsync in the timings, deadline outperforms
> CFQ by a factor of about 5 for 64KB files, and by about 10% for 8MB
> files. From examining the blktrace data, it appears that iozone will
> issue an fsync() call, and subsequently wait until its CFQ timeslice
> has expired before the journal thread can run to actually commit data to
> disk.
>
> The approach taken to solve this problem is to implement a blk_yield call,
> which tells the I/O scheduler not to idle on this process' queue. The call
> is made from the jbd[2] log_wait_commit function.
>
> This patch set addresses previous concerns that the sync-noidle workload
> would be starved by keeping track of the average think time for that
> workload and using that to decide whether or not to yield the queue.
>
> My testing showed nothing but improvements for mixed workloads, though I
> wouldn't call the testing exhaustive. I'd still very much like feedback
> on the approach from jbd/jbd2 developers. Finally, I will continue to do
> performance analysis of the patches.

This is starting to look better. Can you share what tests you did? I
tried reproducing with fs_mark last time and could not.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jens Axboe on 15 Apr 2010 09:10

On Thu, Apr 15 2010, Jeff Moyer wrote:
> Jens Axboe <jens.axboe(a)oracle.com> writes:
>
> > On Wed, Apr 14 2010, Jeff Moyer wrote:
> >> Hi,
> >>
> >> The previous two postings can be found here:
> >> http://lkml.org/lkml/2010/4/1/344
> >> and here:
> >> http://lkml.org/lkml/2010/4/7/325
> >>
> >> The basic problem is that, when running iozone on smallish files (up to
> >> 8MB in size) and including fsync in the timings, deadline outperforms
> >> CFQ by a factor of about 5 for 64KB files, and by about 10% for 8MB
> >> files. From examining the blktrace data, it appears that iozone will
> >> issue an fsync() call, and subsequently wait until its CFQ timeslice
> >> has expired before the journal thread can run to actually commit data to
> >> disk.
> >>
> >> The approach taken to solve this problem is to implement a blk_yield call,
> >> which tells the I/O scheduler not to idle on this process' queue. The call
> >> is made from the jbd[2] log_wait_commit function.
> >>
> >> This patch set addresses previous concerns that the sync-noidle workload
> >> would be starved by keeping track of the average think time for that
> >> workload and using that to decide whether or not to yield the queue.
> >>
> >> My testing showed nothing but improvements for mixed workloads, though I
> >> wouldn't call the testing exhaustive. I'd still very much like feedback
> >> on the approach from jbd/jbd2 developers. Finally, I will continue to do
> >> performance analysis of the patches.
> >
> > This is starting to look better. Can you share what tests you did? I
> > tried reproducing with fs_mark last time and could not.
>
> Did you use the fs_mark command line I (think I) had posted? What
> storage were you using?

No, I didn't see any references to example command lines. I tested on a
few single disks, rotating and SSD. I expected the single spinning disk
to show the problem to some extent at least, but there was no difference
observed with 64kb blocks.

> I took Vivek's iostest and modified the mixed workload to do buffered
> random reader, buffered sequential reader, and buffered writer for all
> of 1, 2, 4, 8 and 16 threads each.
>
> The initial problem was reported against iozone, which can show the
> problem quite easily when run like so:
> iozone -s 64 -e -f /mnt/test/iozone.0 -i 0 -+n
>
> You can also just run iozone in auto mode, but that can take quite a
> while to complete.
>
> All of my tests for this round have been against a NetApp hardware
> RAID. I wanted to test against a simple sata disk as well, but have
> become swamped with other issues.
>
> I'll include all of this information in the next patch posting. Sorry
> about that.

No problem, I'll try the above.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jeff Moyer on 15 Apr 2010 09:10

Jens Axboe <jens.axboe(a)oracle.com> writes:

> On Wed, Apr 14 2010, Jeff Moyer wrote:
>> Hi,
>>
>> The previous two postings can be found here:
>> http://lkml.org/lkml/2010/4/1/344
>> and here:
>> http://lkml.org/lkml/2010/4/7/325
>>
>> The basic problem is that, when running iozone on smallish files (up to
>> 8MB in size) and including fsync in the timings, deadline outperforms
>> CFQ by a factor of about 5 for 64KB files, and by about 10% for 8MB
>> files. From examining the blktrace data, it appears that iozone will
>> issue an fsync() call, and subsequently wait until its CFQ timeslice
>> has expired before the journal thread can run to actually commit data to
>> disk.
>>
>> The approach taken to solve this problem is to implement a blk_yield call,
>> which tells the I/O scheduler not to idle on this process' queue. The call
>> is made from the jbd[2] log_wait_commit function.
>>
>> This patch set addresses previous concerns that the sync-noidle workload
>> would be starved by keeping track of the average think time for that
>> workload and using that to decide whether or not to yield the queue.
>>
>> My testing showed nothing but improvements for mixed workloads, though I
>> wouldn't call the testing exhaustive. I'd still very much like feedback
>> on the approach from jbd/jbd2 developers. Finally, I will continue to do
>> performance analysis of the patches.
>
> This is starting to look better. Can you share what tests you did? I
> tried reproducing with fs_mark last time and could not.

Did you use the fs_mark command line I (think I) had posted? What
storage were you using?

I took Vivek's iostest and modified the mixed workload to do buffered
random reader, buffered sequential reader, and buffered writer for all
of 1, 2, 4, 8 and 16 threads each.

The initial problem was reported against iozone, which can show the
problem quite easily when run like so:
iozone -s 64 -e -f /mnt/test/iozone.0 -i 0 -+n

You can also just run iozone in auto mode, but that can take quite a
while to complete.

All of my tests for this round have been against a NetApp hardware
RAID. I wanted to test against a simple sata disk as well, but have
become swamped with other issues.

I'll include all of this information in the next patch posting. Sorry
about that.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jeff Moyer on 15 Apr 2010 09:20

Jens Axboe <jens.axboe(a)oracle.com> writes:

>> > This is starting to look better. Can you share what tests you did? I
>> > tried reproducing with fs_mark last time and could not.
>>
>> Did you use the fs_mark command line I (think I) had posted? What
>> storage were you using?
>
> No, I didn't see any references to example command lines. I tested on a
> few single disks, rotating and SSD. I expected the single spinning disk
> to show the problem to some extent at least, but there was no difference
> observed with 64kb blocks.

Boy, I'm really slipping. Try this one:

../fs_mark -S 1 -D 100 -N 1000 -d /mnt/test/fs_mark -s 65536 -t 1 -w 4096

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jens Axboe on 15 Apr 2010 10:10

On Thu, Apr 15 2010, Jeff Moyer wrote:
> Jens Axboe <jens.axboe(a)oracle.com> writes:
>
> >> > This is starting to look better. Can you share what tests you did? I
> >> > tried reproducing with fs_mark last time and could not.
> >>
> >> Did you use the fs_mark command line I (think I) had posted? What
> >> storage were you using?
> >
> > No, I didn't see any references to example command lines. I tested on a
> > few single disks, rotating and SSD. I expected the single spinning disk
> > to show the problem to some extent at least, but there was no difference
> > observed with 64kb blocks.
>
> Boy, I'm really slipping. Try this one:
>
> ./fs_mark -S 1 -D 100 -N 1000 -d /mnt/test/fs_mark -s 65536 -t 1 -w 4096

Thanks Jeff, I'll give it a spin :-)

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: [Bug #15518] CONFIG_NO_BOOTMEM=y breaks boot on 32bit
Next: [Bug #15655] corrupt ext3 fs and partial freeze