From: Eric Sandeen on
Eric Sandeen wrote:
> Justin Piszcz wrote:
> ...
>
>>> Were the filesystems created to align with raid geometry?
>> Only default options were used except the mount options. If that is the
>> culprit, I have some more testing to do, thanks, will look into it.
>>
>>> mkfs.xfs has done that forever; mkfs.ext4 only will do so (automatically)
>>> with recent kernel+e2fsprogs.
>> How recent?
>
> You're recent enough. :)

Oh, you need very recent util-linux-ng as well, and use libblkid from there
with:

[e2fsprogs] # ./configure --disable-libblkid

Otherwise you can just feed mkfs.ext4 stripe & stride manually.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Justin Piszcz on


On Fri, 26 Feb 2010, Eric Sandeen wrote:

> Eric Sandeen wrote:
>
> Oh, you need very recent util-linux-ng as well, and use libblkid from there
> with:
>
> [e2fsprogs] # ./configure --disable-libblkid
>
> Otherwise you can just feed mkfs.ext4 stripe & stride manually.
>
> -Eric
>

Hi,

Even when set, there is still poor performance:

http://busybox.net/~aldot/mkfs_stride.html
Raid Level: 0
Number of Physical Disks: 11
RAID chunk size (in KiB): 1024
number of filesystem blocks (in KiB)
mkfs.ext4 -b 4096 -E stride=256,stripe-width=2816

p63:~# /usr/bin/time mkfs.ext4 -b 4096 -E stride=256,stripe-width=2816 /dev/md0
mke2fs 1.41.10 (10-Feb-2009)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=256 blocks, Stripe width=2816 blocks
335765504 inodes, 1343055824 blocks
67152791 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
40987 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544

Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 38 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
p63:~#

p63:~# mount /dev/md0 /r1 -o nobarrier,data=writeback
p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 39.3674 s, 273 MB/s
p63:/r1#

Still very slow?

Let's try with some optimizations:
p63:/r1# mount /dev/md0 /r1 -o noatime,barrier=0,data=writeback,nobh,commit=100,nouser_xattr,nodelalloc,max_batch_time=0^C

Still not anywhere near 500-600MiB/s of XFS:
p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 30.4824 s, 352 MB/s
p63:/r1#

Am I doing something wrong/is there a flag I am missing that will speed it
up? Or is this performance for sequential writes on EXT4?

Justin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Justin Piszcz on


On Sat, 27 Feb 2010, Justin Piszcz wrote:

>
>
> On Fri, 26 Feb 2010, Eric Sandeen wrote:
>
> > Eric Sandeen wrote:
> >
> > Oh, you need very recent util-linux-ng as well, and use libblkid from there
> > with:
> >
> > [e2fsprogs] # ./configure --disable-libblkid
> >
> > Otherwise you can just feed mkfs.ext4 stripe & stride manually.
> >
> > -Eric
> >


I also tried with the default chunk size (64KiB) incase ext4 had a problem
with chunk sizes > 64KiB, the results were the same for ext4, I also tried
ext2 & ext3 as well just to see what their performance would be:

p63:~# mkfs.ext2 -b 4096 -E stride=16,stripe-width=176 /dev/md0
p63:~# mount /dev/md0 /r1
p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
10737418240 bytes (11 GB) copied, 19.9434 s, 538 MB/s
p63:/r1#

p63:~# mkfs.ext3 -b 4096 -E stride=16,stripe-width=176 /dev/md0
p63:~# mount /dev/md0 /r1
p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
10737418240 bytes (11 GB) copied, 31.0195 s, 346 MB/s

p63:~# mkfs.ext4 -b 4096 -E stride=16,stripe-width=176 /dev/md0
p63:~# mount /dev/md0 /r1
p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
10737418240 bytes (11 GB) copied, 35.3866 s, 303 MB/s

And, for comparison, XFS:
p63:~# mkfs.xfs -f /dev/md0 > /dev/null 2>&1
p63:~# mount /dev/md0 /r1
p63:~# cd /r1
p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 18.1527 s, 592 MB/s
p63:/r1#

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Justin Piszcz on


On Sat, 27 Feb 2010, Justin Piszcz wrote:

>
>
> On Sat, 27 Feb 2010, Justin Piszcz wrote:
>
> >
> >
> > On Fri, 26 Feb 2010, Eric Sandeen wrote:
> >

Hi,

I have found the same results on 2 different systems:

It seems to peak at ~350MiB/s performance on mdadm raid, whether
a RAID-5 or RAID-0 (two separate machines):

The only option I found that allows it to go from:
10737418240 bytes (11 GB) copied, 48.7335 s, 220 MB/s
to
10737418240 bytes (11 GB) copied, 30.5425 s, 352 MB/s

Is the -o nodelalloc option.

How come it is not breaking the 350MiB/s barrier is the question?

Justin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Justin Piszcz on


On Sat, 27 Feb 2010, Justin Piszcz wrote:

>
>
> On Sat, 27 Feb 2010, Justin Piszcz wrote:
>
> >
> >
> > On Sat, 27 Feb 2010, Justin Piszcz wrote:
> >
> > >
> > >
> > > On Fri, 26 Feb 2010, Eric Sandeen wrote:
> > >
>
> Hi,
>
> I have found the same results on 2 different systems:
>
> It seems to peak at ~350MiB/s performance on mdadm raid, whether
> a RAID-5 or RAID-0 (two separate machines):
>
> The only option I found that allows it to go from:
> 10737418240 bytes (11 GB) copied, 48.7335 s, 220 MB/s
> to
> 10737418240 bytes (11 GB) copied, 30.5425 s, 352 MB/s
>
> Is the -o nodelalloc option.
>
> How come it is not breaking the 350MiB/s barrier is the question?
>
> Justin.
>
>

Besides large sequential I/O, ext4 seems to be MUCH faster than XFS when
working with many small files.

EXT4

p63:/r1# sync; /usr/bin/time bash -c 'tar xf linux-2.6.33.tar; sync'
0.18user 2.43system 0:02.86elapsed 91%CPU (0avgtext+0avgdata 5216maxresident)k
0inputs+0outputs (0major+971minor)pagefaults 0swaps
linux-2.6.33 linux-2.6.33.tar
p63:/r1# sync; /usr/bin/time bash -c 'rm -rf linux-2.6.33; sync'
0.02user 0.98system 0:01.03elapsed 97%CPU (0avgtext+0avgdata 5216maxresident)k
0inputs+0outputs (0major+865minor)pagefaults 0swaps

XFS

p63:/r1# sync; /usr/bin/time bash -c 'tar xf linux-2.6.33.tar; sync'
0.20user 2.62system 1:03.90elapsed 4%CPU (0avgtext+0avgdata 5200maxresident)k
0inputs+0outputs (0major+970minor)pagefaults 0swaps
p63:/r1# sync; /usr/bin/time bash -c 'rm -rf linux-2.6.33; sync'
0.03user 2.02system 0:29.04elapsed 7%CPU (0avgtext+0avgdata 5200maxresident)k
0inputs+0outputs (0major+864minor)pagefaults 0swaps

So I guess that's the tradeoff, for massive I/O you should use XFS, else,
use EXT4?

I still would like to know however, why 350MiB/s seems to be the maximum
performance I can get from two different md raids (that easily do 600MiB/s
with XFS).

Is this a performance issue within ext4 and md-raid?
The problem does not exist with xfs and md-raid.

Justin.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/