IO scheduler based IO controller V10 [Kernel]

Prev: kernel : USB sound problem
Next: [PATCH 1/2] jsm: IRQ handlers doesn't need to have IRQ_DISABLED enabled

From: Jens Axboe on 27 Sep 2009 12:50

On Sun, Sep 27 2009, Mike Galbraith wrote:
> My dd vs load non-cached binary woes seem to be coming from backmerge.
>
> #if 0 /*MIKEDIDIT sand in gearbox?*/
> /*
> * See if our hash lookup can find a potential backmerge.
> */
> __rq = elv_rqhash_find(q, bio->bi_sector);
> if (__rq && elv_rq_merge_ok(__rq, bio)) {
> *req = __rq;
> return ELEVATOR_BACK_MERGE;
> }
> #endif

It's a given that not merging will provide better latency. We can't
disable that or performance will suffer A LOT on some systems. There are
ways to make it better, though. One would be to make the max request
size smaller, but that would also hurt for streamed workloads. Can you
try whether the below patch makes a difference? It will basically
disallow merges to a request that isn't the last one.

We should probably make the merging logic a bit more clever, since the
below wont work well for two (or more) streamed cases. I'll think a bit
about that.

Note this is totally untested!

diff --git a/block/elevator.c b/block/elevator.c
index 1975b61..d00a72b 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -497,9 +497,17 @@ int elv_merge(struct request_queue *q, struct request **req, struct bio *bio)
* See if our hash lookup can find a potential backmerge.
*/
__rq = elv_rqhash_find(q, bio->bi_sector);
- if (__rq && elv_rq_merge_ok(__rq, bio)) {
- *req = __rq;
- return ELEVATOR_BACK_MERGE;
+ if (__rq) {
+ /*
+ * If requests are queued behind this one, disallow merge. This
+ * prevents streaming IO from continually passing new IO.
+ */
+ if (elv_latter_request(q, __rq))
+ return ELEVATOR_NO_MERGE;
+ if (elv_rq_merge_ok(__rq, bio)) {
+ *req = __rq;
+ return ELEVATOR_BACK_MERGE;
+ }
}

if (e->ops->elevator_merge_fn)

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Corrado Zoccolo on 27 Sep 2009 13:10

Hi Vivek,
On Fri, Sep 25, 2009 at 10:26 PM, Vivek Goyal <vgoyal(a)redhat.com> wrote:
> On Fri, Sep 25, 2009 at 04:20:14AM +0200, Ulrich Lukas wrote:
>> Vivek Goyal wrote:
>> > Notes:
>> > - With vanilla CFQ, random writers can overwhelm a random reader.
>> > Bring down its throughput and bump up latencies significantly.
>>
>>
>> IIRC, with vanilla CFQ, sequential writing can overwhelm random readers,
>> too.
>>
>> I'm basing this assumption on the observations I made on both OpenSuse
>> 11.1 and Ubuntu 9.10 alpha6 which I described in my posting on LKML
>> titled: "Poor desktop responsiveness with background I/O-operations" of
>> 2009-09-20.
>> (Message ID: 4AB59CBB.8090907(a)datenparkplatz.de)
>>
>>
>> Thus, I'm posting this to show that your work is greatly appreciated,
>> given the rather disappointig status quo of Linux's fairness when it
>> comes to disk IO time.
>>
>> I hope that your efforts lead to a change in performance of current
>> userland applications, the sooner, the better.
>>
> [Please don't remove people from original CC list. I am putting them back.]
>
> Hi Ulrich,
>
> I quicky went through that mail thread and I tried following on my
> desktop.
>
> ##########################################
> dd if=/home/vgoyal/4G-file of=/dev/null &
> sleep 5
> time firefox
> # close firefox once gui pops up.
> ##########################################
>
> It was taking close to 1 minute 30 seconds to launch firefox and dd got
> following.
>
> 4294967296 bytes (4.3 GB) copied, 100.602 s, 42.7 MB/s
>
> (Results do vary across runs, especially if system is booted fresh. Don't
> know why...).
>
>
> Then I tried putting both the applications in separate groups and assign
> them weights 200 each.
>
> ##########################################
> dd if=/home/vgoyal/4G-file of=/dev/null &
> echo $! > /cgroup/io/test1/tasks
> sleep 5
> echo $$ > /cgroup/io/test2/tasks
> time firefox
> # close firefox once gui pops up.
> ##########################################
>
> Now I firefox pops up in 27 seconds. So it cut down the time by 2/3.
>
> 4294967296 bytes (4.3 GB) copied, 84.6138 s, 50.8 MB/s
>
> Notice that throughput of dd also improved.
>
> I ran the block trace and noticed in many a cases firefox threads
> immediately preempted the "dd". Probably because it was a file system
> request. So in this case latency will arise from seek time.
>
> In some other cases, threads had to wait for up to 100ms because dd was
> not preempted. In this case latency will arise both from waiting on queue
> as well as seek time.

I think cfq should already be doing something similar, i.e. giving
100ms slices to firefox, that alternate with dd, unless:
* firefox is too seeky (in this case, the idle window will be too small)
* firefox has too much think time.

To rule out the first case, what happens if you run the test with your
"fairness for seeky processes" patch?
To rule out the second case, what happens if you increase the slice_idle?

Thanks,
Corrado

>
> With cgroup thing, We will run 100ms slice for the group in which firefox
> is being launched and then give 100ms uninterrupted time slice to dd. So
> it should cut down on number of seeks happening and that's why we probably
> see this improvement.
>
> So grouping can help in such cases. May be you can move your X session in
> one group and launch the big IO in other group. Most likely you should
> have better desktop experience without compromising on dd thread output.

> Thanks
> Vivek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
__________________________________________________________________________

dott. Corrado Zoccolo mailto:czoccolo(a)gmail.com
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Mike Galbraith on 27 Sep 2009 14:20

On Sun, 2009-09-27 at 18:42 +0200, Jens Axboe wrote:
> On Sun, Sep 27 2009, Mike Galbraith wrote:
> > My dd vs load non-cached binary woes seem to be coming from backmerge.
> >
> > #if 0 /*MIKEDIDIT sand in gearbox?*/
> > /*
> > * See if our hash lookup can find a potential backmerge.
> > */
> > __rq = elv_rqhash_find(q, bio->bi_sector);
> > if (__rq && elv_rq_merge_ok(__rq, bio)) {
> > *req = __rq;
> > return ELEVATOR_BACK_MERGE;
> > }
> > #endif
>
> It's a given that not merging will provide better latency.

Yeah, absolutely everything I've diddled that reduces the size of queued
data improves the situation, which makes perfect sense. This one was a
bit unexpected. Front merges didn't hurt at all, back merges did, and
lots. After diddling the code a bit, I had the "well _duh_" moment.

> We can't
> disable that or performance will suffer A LOT on some systems. There are
> ways to make it better, though. One would be to make the max request
> size smaller, but that would also hurt for streamed workloads. Can you
> try whether the below patch makes a difference? It will basically
> disallow merges to a request that isn't the last one.

That's what all the looking I've done ends up at. Either you let the
disk be all it can be, and you pay in latency, or you don't, and you pay
in throughput.

> below wont work well for two (or more) streamed cases. I'll think a bit
> about that.

Cool, think away. I've been eyeballing and pondering how to know when
latency is going to become paramount. Absolutely nothing is happening,
even for "it's my root".

> Note this is totally untested!

I'll give it a shot first thing in the A.M.

Note: I tested my stable of kernels today (22->), and we are better off
dd vs read today than ever in this time span at least.

(i can't recall ever seeing a system where beating snot outta root
didn't hurt really bad... would be very nice though;)

> diff --git a/block/elevator.c b/block/elevator.c
> index 1975b61..d00a72b 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -497,9 +497,17 @@ int elv_merge(struct request_queue *q, struct request **req, struct bio *bio)
> * See if our hash lookup can find a potential backmerge.
> */
> __rq = elv_rqhash_find(q, bio->bi_sector);
> - if (__rq && elv_rq_merge_ok(__rq, bio)) {
> - *req = __rq;
> - return ELEVATOR_BACK_MERGE;
> + if (__rq) {
> + /*
> + * If requests are queued behind this one, disallow merge. This
> + * prevents streaming IO from continually passing new IO.
> + */
> + if (elv_latter_request(q, __rq))
> + return ELEVATOR_NO_MERGE;
> + if (elv_rq_merge_ok(__rq, bio)) {
> + *req = __rq;
> + return ELEVATOR_BACK_MERGE;
> + }
> }
>
> if (e->ops->elevator_merge_fn)
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Mike Galbraith on 28 Sep 2009 00:10

On Sun, 2009-09-27 at 20:16 +0200, Mike Galbraith wrote:
> On Sun, 2009-09-27 at 18:42 +0200, Jens Axboe wrote:

> I'll give it a shot first thing in the A.M.

> > diff --git a/block/elevator.c b/block/elevator.c
> > index 1975b61..d00a72b 100644
> > --- a/block/elevator.c
> > +++ b/block/elevator.c
> > @@ -497,9 +497,17 @@ int elv_merge(struct request_queue *q, struct request **req, struct bio *bio)
> > * See if our hash lookup can find a potential backmerge.
> > */
> > __rq = elv_rqhash_find(q, bio->bi_sector);
> > - if (__rq && elv_rq_merge_ok(__rq, bio)) {
> > - *req = __rq;
> > - return ELEVATOR_BACK_MERGE;
> > + if (__rq) {
> > + /*
> > + * If requests are queued behind this one, disallow merge. This
> > + * prevents streaming IO from continually passing new IO.
> > + */
> > + if (elv_latter_request(q, __rq))
> > + return ELEVATOR_NO_MERGE;
> > + if (elv_rq_merge_ok(__rq, bio)) {
> > + *req = __rq;
> > + return ELEVATOR_BACK_MERGE;
> > + }
> > }
> >
> > if (e->ops->elevator_merge_fn)

- = virgin tip v2.6.31-10215-ga3c9602
+ = with patchlet
Avg
dd pre 67.4 70.9 65.4 68.9 66.2 67.7-
65.9 68.5 69.8 65.2 65.8 67.0- Avg
70.4 70.3 65.1 66.4 70.1 68.4- 67.7-
73.1 64.6 65.3 65.3 64.9 66.6+ 65.6+ .968
63.8 67.9 65.2 65.1 64.4 65.2+
64.9 66.3 64.1 65.2 64.8 65.0+
perf stat 8.66 16.29 9.65 14.88 9.45 11.7-
15.36 9.71 15.47 10.44 12.93 12.7-
10.55 15.11 10.22 15.35 10.32 12.3- 12.2-
9.87 7.53 10.62 7.51 9.95 9.0+ 9.1+ .745
7.73 10.12 8.19 11.87 8.07 9.1+
11.04 7.62 10.14 8.13 10.23 9.4+
dd post 63.4 60.5 66.7 64.5 67.3 64.4-
64.4 66.8 64.3 61.5 62.0 63.8-
63.8 64.9 66.2 65.6 66.9 65.4- 64.5-
60.9 63.4 60.2 63.4 65.5 62.6+ 61.8+ .958
63.3 59.9 61.9 62.7 61.2 61.8+
60.1 63.7 59.5 61.5 60.6 61.0+

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Mike Galbraith on 28 Sep 2009 02:00

P.S.

On Mon, 2009-09-28 at 06:04 +0200, Mike Galbraith wrote:

> - = virgin tip v2.6.31-10215-ga3c9602
> + = with patchlet
> Avg
> dd pre 67.4 70.9 65.4 68.9 66.2 67.7-
> 65.9 68.5 69.8 65.2 65.8 67.0- Avg
> 70.4 70.3 65.1 66.4 70.1 68.4- 67.7-
> 73.1 64.6 65.3 65.3 64.9 66.6+ 65.6+ .968
> 63.8 67.9 65.2 65.1 64.4 65.2+
> 64.9 66.3 64.1 65.2 64.8 65.0+
> perf stat 8.66 16.29 9.65 14.88 9.45 11.7-
> 15.36 9.71 15.47 10.44 12.93 12.7-
> 10.55 15.11 10.22 15.35 10.32 12.3- 12.2-
> 9.87 7.53 10.62 7.51 9.95 9.0+ 9.1+ .745
> 7.73 10.12 8.19 11.87 8.07 9.1+
> 11.04 7.62 10.14 8.13 10.23 9.4+
> dd post 63.4 60.5 66.7 64.5 67.3 64.4-
> 64.4 66.8 64.3 61.5 62.0 63.8-
> 63.8 64.9 66.2 65.6 66.9 65.4- 64.5-
> 60.9 63.4 60.2 63.4 65.5 62.6+ 61.8+ .958
> 63.3 59.9 61.9 62.7 61.2 61.8+
> 60.1 63.7 59.5 61.5 60.6 61.0+

Deadline and noop fsc^W are less than wonderful choices for this load.

perf stat 12.82 7.19 8.49 5.76 9.32 anticipatory
16.24 175.82 154.38 228.97 147.16 noop
43.23 57.39 96.13 148.25 180.09 deadline
28.65 167.40 195.95 183.69 178.61 deadline v2.6.27.35

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Prev: kernel : USB sound problem
Next: [PATCH 1/2] jsm: IRQ handlers doesn't need to have IRQ_DISABLED enabled