From: Mike Galbraith on
On Thu, 2009-10-01 at 20:58 +0200, Jens Axboe wrote:
> On Thu, Oct 01 2009, Mike Galbraith wrote:
> > > CIC_SEEK_THR is 8K jiffies so that would be 8seconds on 1000HZ system. Try
> > > using one "slice_idle" period of 8 ms. But it might turn out to be too
> > > short depending on the disk speed.
> >
> > Yeah, it is too short, as is even _400_ ms. Trouble is, by the time
> > some new task is determined to be seeky, the damage is already done.
> >
> > The below does better, though not as well as "just say no to overload"
> > of course ;-)
>
> So this essentially takes the "avoid impact from previous slice" to a
> new extreme, but idling even before dispatching requests from the new
> queue. We basically do two things to prevent this already - one is to
> only set the slice when the first request is actually serviced, and the
> other is to drain async requests completely before starting sync ones.
> I'm a bit surprised that the former doesn't solve the problem fully, I
> guess what happens is that if the drive has been flooded with writes, it
> may service the new read immediately and then return to finish emptying
> its writeback cache. This will cause an impact for any sync IO until
> that cache is flushed, and then cause that sync queue to not get as much
> service as it should have.

I did the stamping selection other than how long have we been solo based
on these possibly wrong speculations:

If we're in the idle window and doing the async drain thing, we've at
the spot where Vivek's patch helps a ton. Seemed like a great time to
limit the size of any io that may land in front of my sync reader to
plain "you are not alone" quantity.

If we've got sync io in flight, that should mean that my new or old
known seeky queue has been serviced at least once. There's likely to be
more on the way, so delay overloading then too.

The seeky bit is supposed to be the earlier "last time we saw a seeker"
thing, but known seeky is too late to help a new task at all unless you
turn off the overloading for ages, so I added the if incalculable check
for good measure, hoping that meant the task is new, may want to exec.

Stamping any place may (see below) possibly limit the size of the io the
reader can generate as well as writer, but I figured what's good for the
goose is good for the the gander, or it ain't really good. The overload
was causing the observed pain, definitely ain't good for both at these
times at least, so don't let it do that.

> Perhaps the "set slice on first complete" isn't working correctly? Or
> perhaps we just need to be more extreme.

Dunno, I was just tossing rocks and sticks at it.

I don't really understand the reasoning behind overloading: I can see
that allows cutting thicker slabs for the disk, but with the streaming
writer vs reader case, seems only the writers can do that. The reader
is unlikely to be alone isn't it? Seems to me that either dd, a flusher
thread or kjournald is going to be there with it, which gives dd a huge
advantage.. it has two proxies to help it squabble over disk, konsole
has none.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Fri, Oct 02 2009, Mike Galbraith wrote:
> On Thu, 2009-10-01 at 20:58 +0200, Jens Axboe wrote:
> > On Thu, Oct 01 2009, Mike Galbraith wrote:
> > > > CIC_SEEK_THR is 8K jiffies so that would be 8seconds on 1000HZ system. Try
> > > > using one "slice_idle" period of 8 ms. But it might turn out to be too
> > > > short depending on the disk speed.
> > >
> > > Yeah, it is too short, as is even _400_ ms. Trouble is, by the time
> > > some new task is determined to be seeky, the damage is already done.
> > >
> > > The below does better, though not as well as "just say no to overload"
> > > of course ;-)
> >
> > So this essentially takes the "avoid impact from previous slice" to a
> > new extreme, but idling even before dispatching requests from the new
> > queue. We basically do two things to prevent this already - one is to
> > only set the slice when the first request is actually serviced, and the
> > other is to drain async requests completely before starting sync ones.
> > I'm a bit surprised that the former doesn't solve the problem fully, I
> > guess what happens is that if the drive has been flooded with writes, it
> > may service the new read immediately and then return to finish emptying
> > its writeback cache. This will cause an impact for any sync IO until
> > that cache is flushed, and then cause that sync queue to not get as much
> > service as it should have.
>
> I did the stamping selection other than how long have we been solo based
> on these possibly wrong speculations:
>
> If we're in the idle window and doing the async drain thing, we've at
> the spot where Vivek's patch helps a ton. Seemed like a great time to
> limit the size of any io that may land in front of my sync reader to
> plain "you are not alone" quantity.

You can't be in the idle window and doing async drain at the same time,
the idle window doesn't start until the sync queue has completed a
request. Hence my above rant on device interference.

> If we've got sync io in flight, that should mean that my new or old
> known seeky queue has been serviced at least once. There's likely to be
> more on the way, so delay overloading then too.
>
> The seeky bit is supposed to be the earlier "last time we saw a seeker"
> thing, but known seeky is too late to help a new task at all unless you
> turn off the overloading for ages, so I added the if incalculable check
> for good measure, hoping that meant the task is new, may want to exec.
>
> Stamping any place may (see below) possibly limit the size of the io the
> reader can generate as well as writer, but I figured what's good for the
> goose is good for the the gander, or it ain't really good. The overload
> was causing the observed pain, definitely ain't good for both at these
> times at least, so don't let it do that.
>
> > Perhaps the "set slice on first complete" isn't working correctly? Or
> > perhaps we just need to be more extreme.
>
> Dunno, I was just tossing rocks and sticks at it.
>
> I don't really understand the reasoning behind overloading: I can see
> that allows cutting thicker slabs for the disk, but with the streaming
> writer vs reader case, seems only the writers can do that. The reader
> is unlikely to be alone isn't it? Seems to me that either dd, a flusher
> thread or kjournald is going to be there with it, which gives dd a huge
> advantage.. it has two proxies to help it squabble over disk, konsole
> has none.

That is true, async queues have a huge advantage over sync ones. But
sync vs async is only part of it, any combination of queued sync, queued
sync random etc have different ramifications on behaviour of the
individual queue.

It's not hard to make the latency good, the hard bit is making sure we
also perform well for all other scenarios.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mike Galbraith on
On Fri, 2009-10-02 at 10:04 +0200, Jens Axboe wrote:
> On Fri, Oct 02 2009, Mike Galbraith wrote:

> > If we're in the idle window and doing the async drain thing, we've at
> > the spot where Vivek's patch helps a ton. Seemed like a great time to
> > limit the size of any io that may land in front of my sync reader to
> > plain "you are not alone" quantity.
>
> You can't be in the idle window and doing async drain at the same time,
> the idle window doesn't start until the sync queue has completed a
> request. Hence my above rant on device interference.

I'll take your word for it.

/*
* Drain async requests before we start sync IO
*/
if (cfq_cfqq_idle_window(cfqq) && cfqd->rq_in_driver[BLK_RW_ASYNC])

Looked about the same to me as..

enable_idle = old_idle = cfq_cfqq_idle_window(cfqq);

...where Vivek prevented turning 1 into 0, so I stamped it ;-)

> > Dunno, I was just tossing rocks and sticks at it.
> >
> > I don't really understand the reasoning behind overloading: I can see
> > that allows cutting thicker slabs for the disk, but with the streaming
> > writer vs reader case, seems only the writers can do that. The reader
> > is unlikely to be alone isn't it? Seems to me that either dd, a flusher
> > thread or kjournald is going to be there with it, which gives dd a huge
> > advantage.. it has two proxies to help it squabble over disk, konsole
> > has none.
>
> That is true, async queues have a huge advantage over sync ones. But
> sync vs async is only part of it, any combination of queued sync, queued
> sync random etc have different ramifications on behaviour of the
> individual queue.
>
> It's not hard to make the latency good, the hard bit is making sure we
> also perform well for all other scenarios.

Yeah, that's why I'm trying to be careful about what I say, I know full
well this ain't easy to get right. I'm not even thinking of submitting
anything, it's just diagnostic testing.

WRT my who can overload theory, I instrumented for my own edification.

Overload totally forbidden, stamps ergo disabled.

fairness=0 11.3 avg (ie == virgin source)
fairness=1 2.8 avg

Back to virgin settings, instrument who is overloading during sequences of..
echo 2 > /proc/sys/vm/drop_caches
sh -c "perf stat -- konsole -e exit" 2>&1|tee -a $LOGFILE
...with dd continually running.

1 second counts for above.
....
[ 916.585880] od_sync: 0 od_async: 87 reject_sync: 0 reject_async: 37
[ 917.662585] od_sync: 0 od_async: 126 reject_sync: 0 reject_async: 53
[ 918.732872] od_sync: 0 od_async: 96 reject_sync: 0 reject_async: 22
[ 919.743730] od_sync: 0 od_async: 75 reject_sync: 0 reject_async: 15
[ 920.914549] od_sync: 0 od_async: 81 reject_sync: 0 reject_async: 17
[ 921.988198] od_sync: 0 od_async: 123 reject_sync: 0 reject_async: 30
....minutes long

(reject == fqq->dispatched >= 4 * max_dispatch)

Doing the same with firefox, I did see the burst below one time, dunno
what triggered that. I watched 6 runs, and only saw such a burst once.
Typically, numbers are the same as konsole, with a very rare 4 or
5 for sync sneaking in.

[ 1988.177758] od_sync: 0 od_async: 104 reject_sync: 0 reject_async: 48
[ 1992.291779] od_sync: 19 od_async: 83 reject_sync: 0 reject_async: 82
[ 1993.300850] od_sync: 79 od_async: 0 reject_sync: 28 reject_async: 0
[ 1994.313327] od_sync: 147 od_async: 104 reject_sync: 90 reject_async: 16
[ 1995.378025] od_sync: 14 od_async: 45 reject_sync: 0 reject_async: 2
[ 1996.456871] od_sync: 15 od_async: 74 reject_sync: 1 reject_async: 7
[ 1997.611226] od_sync: 0 od_async: 84 reject_sync: 0 reject_async: 14

Never noticed a sync overload watching a make -j4 for a couple minutes.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mike Galbraith on

> WRT my who can overload theory, I instrumented for my own edification.
>
> Overload totally forbidden, stamps ergo disabled.
>
> fairness=0 11.3 avg (ie == virgin source)
> fairness=1 2.8 avg

(oops, quantum was set to 16 as well there. not that it matters, but
for completeness)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Jens Axboe <jens.axboe(a)oracle.com> wrote:

> It's not hard to make the latency good, the hard bit is making sure we
> also perform well for all other scenarios.

Looking at the numbers from Mike:

| dd competing against perf stat -- konsole -e exec timings, 5 back to
| back runs
| Avg
| before 9.15 14.51 9.39 15.06 9.90 11.6
| after [+patch] 1.76 1.54 1.93 1.88 1.56 1.7

_PLEASE_ make read latencies this good - the numbers are _vastly_
better. We'll worry about the 'other' things _after_ we've reached good
latencies.

I thought this principle was a well established basic rule of Linux IO
scheduling. Why do we have to have a 'latency vs. bandwidth' discussion
again and again? I thought latency won hands down.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/