From: Vivek Goyal on
On Mon, Sep 28, 2009 at 06:04:08AM +0200, Mike Galbraith wrote:
> On Sun, 2009-09-27 at 20:16 +0200, Mike Galbraith wrote:
> > On Sun, 2009-09-27 at 18:42 +0200, Jens Axboe wrote:
>
> > I'll give it a shot first thing in the A.M.
>
> > > diff --git a/block/elevator.c b/block/elevator.c
> > > index 1975b61..d00a72b 100644
> > > --- a/block/elevator.c
> > > +++ b/block/elevator.c
> > > @@ -497,9 +497,17 @@ int elv_merge(struct request_queue *q, struct request **req, struct bio *bio)
> > > * See if our hash lookup can find a potential backmerge.
> > > */
> > > __rq = elv_rqhash_find(q, bio->bi_sector);
> > > - if (__rq && elv_rq_merge_ok(__rq, bio)) {
> > > - *req = __rq;
> > > - return ELEVATOR_BACK_MERGE;
> > > + if (__rq) {
> > > + /*
> > > + * If requests are queued behind this one, disallow merge. This
> > > + * prevents streaming IO from continually passing new IO.
> > > + */
> > > + if (elv_latter_request(q, __rq))
> > > + return ELEVATOR_NO_MERGE;
> > > + if (elv_rq_merge_ok(__rq, bio)) {
> > > + *req = __rq;
> > > + return ELEVATOR_BACK_MERGE;
> > > + }
> > > }
> > >
> > > if (e->ops->elevator_merge_fn)
>
> - = virgin tip v2.6.31-10215-ga3c9602
> + = with patchlet
> Avg
> dd pre 67.4 70.9 65.4 68.9 66.2 67.7-
> 65.9 68.5 69.8 65.2 65.8 67.0- Avg
> 70.4 70.3 65.1 66.4 70.1 68.4- 67.7-
> 73.1 64.6 65.3 65.3 64.9 66.6+ 65.6+ .968
> 63.8 67.9 65.2 65.1 64.4 65.2+
> 64.9 66.3 64.1 65.2 64.8 65.0+
> perf stat 8.66 16.29 9.65 14.88 9.45 11.7-
> 15.36 9.71 15.47 10.44 12.93 12.7-
> 10.55 15.11 10.22 15.35 10.32 12.3- 12.2-
> 9.87 7.53 10.62 7.51 9.95 9.0+ 9.1+ .745
> 7.73 10.12 8.19 11.87 8.07 9.1+
> 11.04 7.62 10.14 8.13 10.23 9.4+
> dd post 63.4 60.5 66.7 64.5 67.3 64.4-
> 64.4 66.8 64.3 61.5 62.0 63.8-
> 63.8 64.9 66.2 65.6 66.9 65.4- 64.5-
> 60.9 63.4 60.2 63.4 65.5 62.6+ 61.8+ .958
> 63.3 59.9 61.9 62.7 61.2 61.8+
> 60.1 63.7 59.5 61.5 60.6 61.0+
>

Hmm.., so close to 25% reduction on average in completion time of konsole.
But this is in presece of writer. Does this help even in presence of 1 or
more sequential readers going?

So here latency seems to be coming from three sources.

- Wait in CFQ before request is dispatched (only in case of competing seq readers).
- seek latencies
- latencies because of bigger requests are already dispatched to disk.

So limiting the size of request will help with third factor but not with first
two factors and here seek latencies seem to be the biggest contributor.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mike Galbraith on
On Mon, 2009-09-28 at 17:35 +0200, Corrado Zoccolo wrote:

> Great.
> Can you try the attached patch (on top of 2.6.31)?
> It implements the alternative approach we discussed privately in july,
> and it addresses the possible latency increase that could happen with
> your patch.
>
> To summarize for everyone, we separate sync sequential queues, sync
> seeky queues and async queues in three separate RR strucutres, and
> alternate servicing requests between them.
>
> When servicing seeky queues (the ones that are usually penalized by
> cfq, for which no fairness is usually provided), we do not idle
> between them, but we do idle for the last queue (the idle can be
> exited when any seeky queue has requests). This allows us to allocate
> disk time globally for all seeky processes, and to reduce seeky
> processes latencies.
>
> I tested with 'konsole -e exit', while doing a sequential write with
> dd, and the start up time reduced from 37s to 7s, on an old laptop
> disk.

I was fiddling around trying to get IDLE class to behave at least, and
getting a bit frustrated. Class/priority didn't seem to make much if
any difference for konsole -e exit timings, and now I know why. I saw
the reference to Vivek's patch, and gave it a shot. Makes a large
difference.
Avg
perf stat 12.82 7.19 8.49 5.76 9.32 8.7 anticipatory
16.24 175.82 154.38 228.97 147.16 144.5 noop
43.23 57.39 96.13 148.25 180.09 105.0 deadline
9.15 14.51 9.39 15.06 9.90 11.6 cfq fairness=0 dd=nice 0
12.22 9.85 12.55 9.88 15.06 11.9 cfq fairness=0 dd=nice 19
9.77 13.19 11.78 17.40 9.51 11.9 cfq fairness=0 dd=SCHED_IDLE
4.59 2.74 4.70 3.45 4.69 4.0 cfq fairness=1 dd=nice 0
3.79 4.66 2.66 5.15 3.03 3.8 cfq fairness=1 dd=nice 19
2.79 4.73 2.79 4.02 2.50 3.3 cfq fairness=1 dd=SCHED_IDLE

I'll give your patch a spin as well.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mike Galbraith on
On Mon, 2009-09-28 at 13:48 -0400, Vivek Goyal wrote:

> Hmm.., so close to 25% reduction on average in completion time of konsole.
> But this is in presece of writer. Does this help even in presence of 1 or
> more sequential readers going?

Dunno, I've only tested sequential writer.

> So here latency seems to be coming from three sources.
>
> - Wait in CFQ before request is dispatched (only in case of competing seq readers).
> - seek latencies
> - latencies because of bigger requests are already dispatched to disk.
>
> So limiting the size of request will help with third factor but not with first
> two factors and here seek latencies seem to be the biggest contributor.

Yeah, seek latency seems to dominate.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Vivek Goyal on
On Mon, Sep 28, 2009 at 07:51:14PM +0200, Mike Galbraith wrote:
> On Mon, 2009-09-28 at 17:35 +0200, Corrado Zoccolo wrote:
>
> > Great.
> > Can you try the attached patch (on top of 2.6.31)?
> > It implements the alternative approach we discussed privately in july,
> > and it addresses the possible latency increase that could happen with
> > your patch.
> >
> > To summarize for everyone, we separate sync sequential queues, sync
> > seeky queues and async queues in three separate RR strucutres, and
> > alternate servicing requests between them.
> >
> > When servicing seeky queues (the ones that are usually penalized by
> > cfq, for which no fairness is usually provided), we do not idle
> > between them, but we do idle for the last queue (the idle can be
> > exited when any seeky queue has requests). This allows us to allocate
> > disk time globally for all seeky processes, and to reduce seeky
> > processes latencies.
> >
> > I tested with 'konsole -e exit', while doing a sequential write with
> > dd, and the start up time reduced from 37s to 7s, on an old laptop
> > disk.
>
> I was fiddling around trying to get IDLE class to behave at least, and
> getting a bit frustrated. Class/priority didn't seem to make much if
> any difference for konsole -e exit timings, and now I know why.

You seem to be testing kconsole timings against a writer. In case of a
writer prio will not make much of a difference as prio only adjusts length
of slice given to process and writers rarely get to use their slice
length. Reader immediately preemtps it...

I guess changing class to IDLE should have helped a bit as now this is
equivalent to setting the quantum to 1 and after dispatching one request
to disk, CFQ will always expire the writer once. So it might happen that
by the the reader preempted writer, we have less number of requests in
disk and lesser latency for this reader.

> I saw
> the reference to Vivek's patch, and gave it a shot. Makes a large
> difference.
> Avg
> perf stat 12.82 7.19 8.49 5.76 9.32 8.7 anticipatory
> 16.24 175.82 154.38 228.97 147.16 144.5 noop
> 43.23 57.39 96.13 148.25 180.09 105.0 deadline
> 9.15 14.51 9.39 15.06 9.90 11.6 cfq fairness=0 dd=nice 0
> 12.22 9.85 12.55 9.88 15.06 11.9 cfq fairness=0 dd=nice 19
> 9.77 13.19 11.78 17.40 9.51 11.9 cfq fairness=0 dd=SCHED_IDLE
> 4.59 2.74 4.70 3.45 4.69 4.0 cfq fairness=1 dd=nice 0
> 3.79 4.66 2.66 5.15 3.03 3.8 cfq fairness=1 dd=nice 19
> 2.79 4.73 2.79 4.02 2.50 3.3 cfq fairness=1 dd=SCHED_IDLE
>

Hmm.., looks like average latency went down only in case of fairness=1
and not in case of fairness=0. (Looking at previous mail, average vanilla
cfq latencies were around 12 seconds).

Are you running all this in root group or have you put writers and readers
into separate cgroups?

If everything is running in root group, then I am curious why latency went
down in case of fairness=1. The only thing fairness=1 parameter does is
that it lets complete all the requests from previous queue before start
dispatching from next queue. On top of this is valid only if no preemption
took place. In your test case, konsole should preempt the writer so
practically fairness=1 might not make much difference.

In fact now Jens has committed a patch which achieves the similar effect as
fairness=1 for async queues.

commit 5ad531db6e0f3c3c985666e83d3c1c4d53acccf9
Author: Jens Axboe <jens.axboe(a)oracle.com>
Date: Fri Jul 3 12:57:48 2009 +0200

cfq-iosched: drain device queue before switching to a sync queue

To lessen the impact of async IO on sync IO, let the device drain of
any async IO in progress when switching to a sync cfqq that has idling
enabled.


If everything is in separate cgroups, then we should have seen latency
improvements in case of fairness=0 case also. I am little perplexed here..

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mike Galbraith on
On Mon, 2009-09-28 at 14:18 -0400, Vivek Goyal wrote:
> On Mon, Sep 28, 2009 at 07:51:14PM +0200, Mike Galbraith wrote:

> I guess changing class to IDLE should have helped a bit as now this is
> equivalent to setting the quantum to 1 and after dispatching one request
> to disk, CFQ will always expire the writer once. So it might happen that
> by the the reader preempted writer, we have less number of requests in
> disk and lesser latency for this reader.

I expected SCHED_IDLE to be better than setting quantum to 1, because
max is quantum*4 if you aren't IDLE. But that's not what happened. I
just retested with all knobs set back to stock, fairness off, and
quantum set to 1 with everything running nice 0. 2.8 seconds avg :-/

> > I saw
> > the reference to Vivek's patch, and gave it a shot. Makes a large
> > difference.
> > Avg
> > perf stat 12.82 7.19 8.49 5.76 9.32 8.7 anticipatory
> > 16.24 175.82 154.38 228.97 147.16 144.5 noop
> > 43.23 57.39 96.13 148.25 180.09 105.0 deadline
> > 9.15 14.51 9.39 15.06 9.90 11.6 cfq fairness=0 dd=nice 0
> > 12.22 9.85 12.55 9.88 15.06 11.9 cfq fairness=0 dd=nice 19
> > 9.77 13.19 11.78 17.40 9.51 11.9 cfq fairness=0 dd=SCHED_IDLE
> > 4.59 2.74 4.70 3.45 4.69 4.0 cfq fairness=1 dd=nice 0
> > 3.79 4.66 2.66 5.15 3.03 3.8 cfq fairness=1 dd=nice 19
> > 2.79 4.73 2.79 4.02 2.50 3.3 cfq fairness=1 dd=SCHED_IDLE
> >
>
> Hmm.., looks like average latency went down only in case of fairness=1
> and not in case of fairness=0. (Looking at previous mail, average vanilla
> cfq latencies were around 12 seconds).

Yup.

> Are you running all this in root group or have you put writers and readers
> into separate cgroups?

No cgroups here.

> If everything is running in root group, then I am curious why latency went
> down in case of fairness=1. The only thing fairness=1 parameter does is
> that it lets complete all the requests from previous queue before start
> dispatching from next queue. On top of this is valid only if no preemption
> took place. In your test case, konsole should preempt the writer so
> practically fairness=1 might not make much difference.

fairness=1 very definitely makes a very large difference. All of those
cfq numbers were logged in back to back runs.

> In fact now Jens has committed a patch which achieves the similar effect as
> fairness=1 for async queues.

Yeah, I was there yesterday. I speculated that that would hurt my
reader, but rearranging things didn't help one bit. Playing with merge,
I managed to give dd ~7% more throughput, and injured poor reader even
more. (problem analysis via hammer/axe not always most effective;)

> commit 5ad531db6e0f3c3c985666e83d3c1c4d53acccf9
> Author: Jens Axboe <jens.axboe(a)oracle.com>
> Date: Fri Jul 3 12:57:48 2009 +0200
>
> cfq-iosched: drain device queue before switching to a sync queue
>
> To lessen the impact of async IO on sync IO, let the device drain of
> any async IO in progress when switching to a sync cfqq that has idling
> enabled.
>
>
> If everything is in separate cgroups, then we should have seen latency
> improvements in case of fairness=0 case also. I am little perplexed here..
>
> Thanks
> Vivek

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/