From: Justin Piszcz on
Hello,

Is it possible to 'optimize' ext4 so it is as fast as XFS for writes?
I see about half the performance as XFS for sequential writes.

I have checked the doc and tried several options, a few of which are shown
below (I have also tried the commit/journal_async/etc options but none of
them get the write speeds anywhere near XFS)?

Sure 'dd' is not a real benchmark, etc, etc, but with 10Gbps between 2
hosts I get 550MiB/s+ on reads from EXT4 but only 100-200MiB/s write.

When it was XFS I used to get 400-600MiB/s for writes for the same RAID
volume.

How do I 'speed' up ext4? Is it possible?

raid0_11 disks: (XFS)
# /dev/md0 /r1 xfs noatime 0 1
p63:/r1# dd if=/dev/zero of=bigfile1 bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 18.1021 s, 593 MB/s
p63:/r1#

raid0_11 disks: (EXT4)
# /dev/md0 /r1 ext4 noatime 0 1
# dd if=/dev/zero of=file bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 35.3741 s, 304 MB/s
p63:/r1#

Other tests (ext4)
p63:~# mount /dev/md0 /r1 -o data=writeback
p63:~# cd /r1
p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 39.8746 s, 269 MB/s
p63:/r1#

p63:~# mount /dev/md0 /r1 -o data=writeback,nobarrier
p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 40.0656 s, 268 MB/s

Justin.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dmitry Monakhov on
Justin Piszcz <jpiszcz(a)lucidpixels.com> writes:

> Hello,
>
> Is it possible to 'optimize' ext4 so it is as fast as XFS for writes?
> I see about half the performance as XFS for sequential writes.
>
> I have checked the doc and tried several options, a few of which are shown
> below (I have also tried the commit/journal_async/etc options but none of
> them get the write speeds anywhere near XFS)?
>
> Sure 'dd' is not a real benchmark, etc, etc, but with 10Gbps between 2
> hosts I get 550MiB/s+ on reads from EXT4 but only 100-200MiB/s write.
>
> When it was XFS I used to get 400-600MiB/s for writes for the same RAID
> volume.
>
> How do I 'speed' up ext4? Is it possible?
I don't know how to speedup, but i do know how to slowdown XFS :)
Seems that you forget to call fsync at the end of file write
In this case some data may reside in memory cache.
Please add "conv=fsync" or "conv=fdatasync" to the dd cmd.
And redone your measurements.
>
> raid0_11 disks: (XFS)
> # /dev/md0 /r1 xfs noatime 0 1
> p63:/r1# dd if=/dev/zero of=bigfile1 bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 18.1021 s, 593 MB/s
> p63:/r1#
>
> raid0_11 disks: (EXT4)
> # /dev/md0 /r1 ext4 noatime 0 1
> # dd if=/dev/zero of=file bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 35.3741 s, 304 MB/s
> p63:/r1#
>
> Other tests (ext4)
> p63:~# mount /dev/md0 /r1 -o data=writeback
> p63:~# cd /r1
> p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 39.8746 s, 269 MB/s
> p63:/r1#
>
> p63:~# mount /dev/md0 /r1 -o data=writeback,nobarrier
> p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 40.0656 s, 268 MB/s
>
> Justin.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric Sandeen on
Justin Piszcz wrote:
> Hello,
>
> Is it possible to 'optimize' ext4 so it is as fast as XFS for writes?
> I see about half the performance as XFS for sequential writes.
>
> I have checked the doc and tried several options, a few of which are shown
> below (I have also tried the commit/journal_async/etc options but none
> of them get the write speeds anywhere near XFS)?
>
> Sure 'dd' is not a real benchmark, etc, etc, but with 10Gbps between 2
> hosts I get 550MiB/s+ on reads from EXT4 but only 100-200MiB/s write.
>
> When it was XFS I used to get 400-600MiB/s for writes for the same RAID
> volume.
>
> How do I 'speed' up ext4? Is it possible?

Aside from Dmitry's suggestion to time sync as well (although for 10G, you are
likely not leaving much in cache) I'd ask:

What kernel version? what xfsprogs/e2fsprogs version?

Were the filesystems created to align with raid geometry?

mkfs.xfs has done that forever; mkfs.ext4 only will do so (automatically)
with recent kernel+e2fsprogs.

-Eric

> raid0_11 disks: (XFS)
> # /dev/md0 /r1 xfs noatime 0 1
> p63:/r1# dd if=/dev/zero of=bigfile1 bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 18.1021 s, 593 MB/s
> p63:/r1#
>
> raid0_11 disks: (EXT4)
> # /dev/md0 /r1 ext4 noatime 0 1
> # dd if=/dev/zero of=file bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 35.3741 s, 304 MB/s
> p63:/r1#
>
> Other tests (ext4)
> p63:~# mount /dev/md0 /r1 -o data=writeback
> p63:~# cd /r1
> p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 39.8746 s, 269 MB/s
> p63:/r1#
>
> p63:~# mount /dev/md0 /r1 -o data=writeback,nobarrier
> p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 40.0656 s, 268 MB/s
>
> Justin.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Justin Piszcz on


On Sat, 27 Feb 2010, Dmitry Monakhov wrote:

> Justin Piszcz <jpiszcz(a)lucidpixels.com> writes:
>
>> Hello,
>>
>> Is it possible to 'optimize' ext4 so it is as fast as XFS for writes?
>> I see about half the performance as XFS for sequential writes.
>>
>> I have checked the doc and tried several options, a few of which are shown
>> below (I have also tried the commit/journal_async/etc options but none of
>> them get the write speeds anywhere near XFS)?
>>
>> Sure 'dd' is not a real benchmark, etc, etc, but with 10Gbps between 2
>> hosts I get 550MiB/s+ on reads from EXT4 but only 100-200MiB/s write.
>>
>> When it was XFS I used to get 400-600MiB/s for writes for the same RAID
>> volume.
>>
>> How do I 'speed' up ext4? Is it possible?
> I don't know how to speedup, but i do know how to slowdown XFS :)
> Seems that you forget to call fsync at the end of file write
> In this case some data may reside in memory cache.
> Please add "conv=fsync" or "conv=fdatasync" to the dd cmd.
> And redone your measurements.

Hi,

First with a sync added in the total time (still 2x as fast)

EXT3:
p63:~# mount /dev/md0 -o nobarrier,data=writeback /r1
p63:~# cd /r1
p63:/r1# /usr/bin/time bash -c 'dd if=/dev/zero of=file bs=1M count=10240; sync'
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 35.4163 s, 303 MB/s
0.02user 19.85system 0:36.97elapsed 53%CPU (0avgtext+0avgdata 7296maxresident)k
0inputs+0outputs (5major+1145minor)pagefaults 0swaps

XFS:
p63:/r1# /usr/bin/time bash -c 'dd if=/dev/zero of=file bs=1M count=10240; sync'
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 18.08 s, 594 MB/s
0.03user 16.15system 0:18.67elapsed 86%CPU (0avgtext+0avgdata 7312maxresident)k
0inputs+0outputs (5major+1147minor)pagefaults 0swaps
p63:/r1#

Per your request: conv=fsync & conv=fdatasync


XFS:
p63:/r1# /usr/bin/time bash -c 'dd if=/dev/zero of=file bs=1M conv=fsync count=10240'
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 18.2142 s, 590 MB/s
0.03user 16.05system 0:18.21elapsed 88%CPU (0avgtext+0avgdata 7312maxresident)k
0inputs+0outputs (0major+832minor)pagefaults 0swaps
p63:/r1#

EXT3:
p63:/r1# /usr/bin/time bash -c 'dd if=/dev/zero of=file bs=1M conv=fdatasync count=10240'
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 39.5562 s, 271 MB/s

XFS:
p63:/r1# /usr/bin/time bash -c 'dd if=/dev/zero of=file bs=1M conv=fdatasync count=10240'
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 18.513 s, 580 MB/s
0.03user 16.25system 0:18.51elapsed 87%CPU (0avgtext+0avgdata 7312maxresident)k
0inputs+0outputs (5major+828minor)pagefaults 0swaps
p63:/r1#

p63:/r1# /usr/bin/time bash -c 'dd if=/dev/zero of=file bs=1M conv=fsync count=10240'
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 39.7859 s, 270 MB/s
0.02user 24.20system 0:39.79elapsed 60%CPU (0avgtext+0avgdata 7328maxresident)k
0inputs+0outputs (5major+829minor)pagefaults 0swaps
p63:/r1#

It is still 2x as fast?
Is there some other option I am missing here or is this correct?

Justin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric Sandeen on
Justin Piszcz wrote:
....

>> Were the filesystems created to align with raid geometry?
> Only default options were used except the mount options. If that is the
> culprit, I have some more testing to do, thanks, will look into it.
>
>>
>> mkfs.xfs has done that forever; mkfs.ext4 only will do so (automatically)
>> with recent kernel+e2fsprogs.
> How recent?

You're recent enough. :)

mkfs.ext4 output should include the stripe info if it was found.

printf(_("Block size=%u (log=%u)\n"), fs->blocksize,
s->s_log_block_size);
printf(_("Fragment size=%u (log=%u)\n"), fs->fragsize,
s->s_log_frag_size);
printf(_("Stride=%u blocks, Stripe width=%u blocks\n"),
s->s_raid_stride, s->s_raid_stripe_width);
printf(_("%u inodes, %llu blocks\n"), s->s_inodes_count,
ext2fs_blocks_count(s));

etc.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/