From: Vivek Goyal on
On Sat, Jul 24, 2010 at 04:51:35AM -0400, Christoph Hellwig wrote:
> To me this sounds like slice_idle=0 is the right default then, as it
> gives useful behaviour for all systems linux runs on. Setups with
> more than a few spindles are for sure more common than setups making
> use of cgroups. Especially given that cgroups are more of a high end
> feature you'd rarely use on a single SATA spindle anyway. So setting
> a paramter to make this useful sounds like the much better option.
>

Setting slice_idle=0 should give very bad interactivity experience on
laptops/desktops having SATA disks. My previous tests showed that if
I start a buffered writer on the disk, then launching firefox took more
than 5 minutes.

So slice_idle=0 should not be default. It should be selectively done
on hardware with multiple spindles and where single cfq queue can't
keep all spindles busy.

> Especially given that the block cgroup code doesn't work particularly
> well in presence of barriers, which are on for any kind of real life
> production setup anyway.

True. I was hoping that on a battery backed up storage we shoudl not need
barriers. Last we talked about it, it sounded as if there might be some
bugs in file systems we need to fix before we can confidently say that
yes on battery backed up storage, one can mount file system (ext3, ext4,
xfs) with barrier disabled and still expect data integrity.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Vivek Goyal on
On Mon, Jul 26, 2010 at 02:58:16PM +0800, Gui Jianfeng wrote:

[..]
> Hi Vivek,
>
> I did some tests on single SATA disk on my desktop. With patches applied, seems no
> regression occurs till now, and have some performance improvement in case of
> "Direct Random Reader" mode. Here're some numbers on my box.
>

Thanks for testing Gui. "iostest" seems to be working for you. If you had
to some fixes to make it work on my boxes, do send those to me, and I can
commit those in my internal git tree.

After running the script, you can also run "iostest -R <result-dir>" and
that will generate a report. It will not have all this "Starting test..."
lines and looks nicer.

Good to know that you don't see any regressions on SATA disk in your
cgroup testing with this patchset. Little improvement in "drr" might
be due to the fact that with existing slice_idle=0, we can still do
some extra idling on service tree and first patch in the series (V4)
gets rid of that.

Thanks
Vivek

> Vallina kernel:
>
> Blkio is already mounted at /cgroup/blkio. Unmounting it
> DIR=/mnt/iostestmnt/fio DEV=/dev/sdb2
> GROUPMODE=1 NRGRP=4
> Will run workloads for increasing number of threads upto a max of 4
> Starting test for [drr] with set=1 numjobs=1 filesz=512M bs=32k runtime=30
> Starting test for [drr] with set=1 numjobs=2 filesz=512M bs=32k runtime=30
> Starting test for [drr] with set=1 numjobs=4 filesz=512M bs=32k runtime=30
> Finished test for workload [drr]
> Host=localhost.localdomain Kernel=2.6.35-rc4-Vivek-+
> GROUPMODE=1 NRGRP=4
> DIR=/mnt/iostestmnt/fio DEV=/dev/sdb2
> Workload=drr iosched=cfq Filesz=512M bs=32k
> group_isolation=1 slice_idle=0 group_idle=8 quantum=8
> =========================================================================
> AVERAGE[drr] [bw in KB/s]
> -------
> job Set NR cgrp1 cgrp2 cgrp3 cgrp4 total
> --- --- -- -----------------------------------
> drr 1 1 761 761 762 760 3044
> drr 1 2 185 420 727 1256 2588
> drr 1 4 180 371 588 863 2002
>
>
> Patched kernel:
>
> Blkio is already mounted at /cgroup/blkio. Unmounting it
> DIR=/mnt/iostestmnt/fio DEV=/dev/sdb2
> GROUPMODE=1 NRGRP=4
> Will run workloads for increasing number of threads upto a max of 4
> Starting test for [drr] with set=1 numjobs=1 filesz=512M bs=32k runtime=30
> Starting test for [drr] with set=1 numjobs=2 filesz=512M bs=32k runtime=30
> Starting test for [drr] with set=1 numjobs=4 filesz=512M bs=32k runtime=30
> Finished test for workload [drr]
> Host=localhost.localdomain Kernel=2.6.35-rc4-Vivek-+
> GROUPMODE=1 NRGRP=4
> DIR=/mnt/iostestmnt/fio DEV=/dev/sdb2
> Workload=drr iosched=cfq Filesz=512M bs=32k
> group_isolation=1 slice_idle=0 group_idle=8 quantum=8
> =========================================================================
> AVERAGE[drr] [bw in KB/s]
> -------
> job Set NR cgrp1 cgrp2 cgrp3 cgrp4 total
> --- --- -- -----------------------------------
> drr 1 1 323 671 1030 1378 3402
> drr 1 2 165 391 686 1144 2386
> drr 1 4 185 373 612 873 2043
>
> Thanks
> Gui
>
> >
> > Thanks
> > Vivek
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Vivek Goyal on
On Sat, Jul 24, 2010 at 11:07:07AM +0200, Corrado Zoccolo wrote:
> On Sat, Jul 24, 2010 at 10:51 AM, Christoph Hellwig <hch(a)infradead.org> wrote:
> > To me this sounds like slice_idle=0 is the right default then, as it
> > gives useful behaviour for all systems linux runs on.
> No, it will give bad performance on single disks, possibly worse than
> deadline (deadline at least sorts the requests between different
> queues, while CFQ with slice_idle=0 doesn't even do this for readers).

> Setting slice_idle to 0 should be considered only when a single
> sequential reader cannot saturate the disk bandwidth, and this happens
> only on smart enough hardware with large number of spindles.

I was thinking of writting a user space utility which can launch
increasing number of parallel direct/buffered reads from device and if
device can sustain more than 1 parallel reads with increasing throughput,
then it probably is good indicator that one might be better off with
slice_idle=0.

Will try that today...

Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/