From: Jens Axboe on
On Fri, Oct 02 2009, Ingo Molnar wrote:
>
> * Jens Axboe <jens.axboe(a)oracle.com> wrote:
>
> > On Fri, Oct 02 2009, Ingo Molnar wrote:
> > >
> > > * Jens Axboe <jens.axboe(a)oracle.com> wrote:
> > >
> > > > It's not _that_ easy, it depends a lot on the access patterns. A
> > > > good example of that is actually the idling that we already do.
> > > > Say you have two applications, each starting up. If you start them
> > > > both at the same time and just care for the dumb low latency, then
> > > > you'll do one IO from each of them in turn. Latency will be good,
> > > > but throughput will be aweful. And this means that in 20s they are
> > > > both started, while with the slice idling and priority disk access
> > > > that CFQ does, you'd hopefully have both up and running in 2s.
> > > >
> > > > So latency is good, definitely, but sometimes you have to worry
> > > > about the bigger picture too. Latency is more than single IOs,
> > > > it's often for complete operation which may involve lots of IOs.
> > > > Single IO latency is a benchmark thing, it's not a real life
> > > > issue. And that's where it becomes complex and not so black and
> > > > white. Mike's test is a really good example of that.
> > >
> > > To the extent of you arguing that Mike's test is artificial (i'm not
> > > sure you are arguing that) - Mike certainly did not do an artificial
> > > test - he tested 'konsole' cache-cold startup latency, such as:
> >
> > [snip]
> >
> > I was saying the exact opposite, that Mike's test is a good example of
> > a valid test. It's not measuring single IO latencies, it's doing a
> > sequence of valid events and looking at the latency for those. It's
> > benchmarking the bigger picture, not a microbenchmark.
>
> Good, so we are in violent agreement :-)

Yes, perhaps that last sentence didn't provide enough evidence of which
category I put Mike's test into :-)

So to kick things off, I added an 'interactive' knob to CFQ and
defaulted it to on, along with re-enabling slice idling for hardware
that does tagged command queuing. This is almost completely identical to
what Vivek Goyal originally posted, it's just combined into one and uses
the term 'interactive' instead of 'fairness'. I think the former is a
better umbrella under which to add further tweaks that may sacrifice
throughput slightly, in the quest for better latency.

It's queued up in the for-linus branch.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Jens Axboe <jens.axboe(a)oracle.com> wrote:

> On Fri, Oct 02 2009, Ingo Molnar wrote:
> >
> > * Jens Axboe <jens.axboe(a)oracle.com> wrote:
> >
> > > On Fri, Oct 02 2009, Ingo Molnar wrote:
> > > >
> > > > * Jens Axboe <jens.axboe(a)oracle.com> wrote:
> > > >
> > > > > It's not _that_ easy, it depends a lot on the access patterns. A
> > > > > good example of that is actually the idling that we already do.
> > > > > Say you have two applications, each starting up. If you start them
> > > > > both at the same time and just care for the dumb low latency, then
> > > > > you'll do one IO from each of them in turn. Latency will be good,
> > > > > but throughput will be aweful. And this means that in 20s they are
> > > > > both started, while with the slice idling and priority disk access
> > > > > that CFQ does, you'd hopefully have both up and running in 2s.
> > > > >
> > > > > So latency is good, definitely, but sometimes you have to worry
> > > > > about the bigger picture too. Latency is more than single IOs,
> > > > > it's often for complete operation which may involve lots of IOs.
> > > > > Single IO latency is a benchmark thing, it's not a real life
> > > > > issue. And that's where it becomes complex and not so black and
> > > > > white. Mike's test is a really good example of that.
> > > >
> > > > To the extent of you arguing that Mike's test is artificial (i'm not
> > > > sure you are arguing that) - Mike certainly did not do an artificial
> > > > test - he tested 'konsole' cache-cold startup latency, such as:
> > >
> > > [snip]
> > >
> > > I was saying the exact opposite, that Mike's test is a good example of
> > > a valid test. It's not measuring single IO latencies, it's doing a
> > > sequence of valid events and looking at the latency for those. It's
> > > benchmarking the bigger picture, not a microbenchmark.
> >
> > Good, so we are in violent agreement :-)
>
> Yes, perhaps that last sentence didn't provide enough evidence of
> which category I put Mike's test into :-)
>
> So to kick things off, I added an 'interactive' knob to CFQ and
> defaulted it to on, along with re-enabling slice idling for hardware
> that does tagged command queuing. This is almost completely identical
> to what Vivek Goyal originally posted, it's just combined into one and
> uses the term 'interactive' instead of 'fairness'. I think the former
> is a better umbrella under which to add further tweaks that may
> sacrifice throughput slightly, in the quest for better latency.
>
> It's queued up in the for-linus branch.

i'd say 'latency' describes it even better. 'interactivity' as a term is
a bit overladen.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Thu, Oct 01 2009, Mike Galbraith wrote:
> max_dispatch = cfqd->cfq_quantum;
> if (cfq_class_idle(cfqq))
> max_dispatch = 1;
>
> + if (cfqd->busy_queues > 1)
> + cfqd->od_stamp = jiffies;
> +

->busy_queues > 1 just means that they have requests ready for dispatch,
not that they are dispatched.


--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mike Galbraith on
On Fri, 2009-10-02 at 19:37 +0200, Jens Axboe wrote:
> On Fri, Oct 02 2009, Ingo Molnar wrote:
> >
> > * Jens Axboe <jens.axboe(a)oracle.com> wrote:
> >
> > > On Fri, Oct 02 2009, Ingo Molnar wrote:
> > > >
> > > > * Jens Axboe <jens.axboe(a)oracle.com> wrote:
> > > >
> > > > > It's not _that_ easy, it depends a lot on the access patterns. A
> > > > > good example of that is actually the idling that we already do.
> > > > > Say you have two applications, each starting up. If you start them
> > > > > both at the same time and just care for the dumb low latency, then
> > > > > you'll do one IO from each of them in turn. Latency will be good,
> > > > > but throughput will be aweful. And this means that in 20s they are
> > > > > both started, while with the slice idling and priority disk access
> > > > > that CFQ does, you'd hopefully have both up and running in 2s.
> > > > >
> > > > > So latency is good, definitely, but sometimes you have to worry
> > > > > about the bigger picture too. Latency is more than single IOs,
> > > > > it's often for complete operation which may involve lots of IOs.
> > > > > Single IO latency is a benchmark thing, it's not a real life
> > > > > issue. And that's where it becomes complex and not so black and
> > > > > white. Mike's test is a really good example of that.
> > > >
> > > > To the extent of you arguing that Mike's test is artificial (i'm not
> > > > sure you are arguing that) - Mike certainly did not do an artificial
> > > > test - he tested 'konsole' cache-cold startup latency, such as:
> > >
> > > [snip]
> > >
> > > I was saying the exact opposite, that Mike's test is a good example of
> > > a valid test. It's not measuring single IO latencies, it's doing a
> > > sequence of valid events and looking at the latency for those. It's
> > > benchmarking the bigger picture, not a microbenchmark.
> >
> > Good, so we are in violent agreement :-)
>
> Yes, perhaps that last sentence didn't provide enough evidence of which
> category I put Mike's test into :-)
>
> So to kick things off, I added an 'interactive' knob to CFQ and
> defaulted it to on, along with re-enabling slice idling for hardware
> that does tagged command queuing. This is almost completely identical to
> what Vivek Goyal originally posted, it's just combined into one and uses
> the term 'interactive' instead of 'fairness'. I think the former is a
> better umbrella under which to add further tweaks that may sacrifice
> throughput slightly, in the quest for better latency.
>
> It's queued up in the for-linus branch.

FWIW, I did a matrix of Vivek's patch combined with my hack. Seems we
do lose a bit of dd throughput over stock with either or both.

dd pre 65.1 65.4 67.5 64.8 65.1 65.5 fairness=1 overload_delay=1
perf stat 1.70 1.94 1.32 1.89 1.87 1.7
dd post 69.4 62.3 69.7 70.3 69.6 68.2

dd pre 67.0 67.8 64.7 64.7 64.9 65.8 fairness=1 overload_delay=0
perf stat 4.89 3.13 2.98 2.71 2.17 3.1
dd post 67.2 63.3 62.6 62.8 63.1 63.8

dd pre 65.0 66.0 66.9 64.6 67.0 65.9 fairness=0 overload_delay=1
perf stat 4.66 3.81 4.23 2.98 4.23 3.9
dd post 62.0 60.8 62.4 61.4 62.2 61.7

dd pre 65.3 65.6 64.9 69.5 65.8 66.2 fairness=0 overload_delay=0
perf stat 14.79 9.11 14.16 8.44 13.67 12.0
dd post 64.1 66.5 64.0 66.5 64.4 65.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Fri, Oct 02 2009, Mike Galbraith wrote:
> On Fri, 2009-10-02 at 19:37 +0200, Jens Axboe wrote:
> > On Fri, Oct 02 2009, Ingo Molnar wrote:
> > >
> > > * Jens Axboe <jens.axboe(a)oracle.com> wrote:
> > >
> > > > On Fri, Oct 02 2009, Ingo Molnar wrote:
> > > > >
> > > > > * Jens Axboe <jens.axboe(a)oracle.com> wrote:
> > > > >
> > > > > > It's not _that_ easy, it depends a lot on the access patterns. A
> > > > > > good example of that is actually the idling that we already do.
> > > > > > Say you have two applications, each starting up. If you start them
> > > > > > both at the same time and just care for the dumb low latency, then
> > > > > > you'll do one IO from each of them in turn. Latency will be good,
> > > > > > but throughput will be aweful. And this means that in 20s they are
> > > > > > both started, while with the slice idling and priority disk access
> > > > > > that CFQ does, you'd hopefully have both up and running in 2s.
> > > > > >
> > > > > > So latency is good, definitely, but sometimes you have to worry
> > > > > > about the bigger picture too. Latency is more than single IOs,
> > > > > > it's often for complete operation which may involve lots of IOs.
> > > > > > Single IO latency is a benchmark thing, it's not a real life
> > > > > > issue. And that's where it becomes complex and not so black and
> > > > > > white. Mike's test is a really good example of that.
> > > > >
> > > > > To the extent of you arguing that Mike's test is artificial (i'm not
> > > > > sure you are arguing that) - Mike certainly did not do an artificial
> > > > > test - he tested 'konsole' cache-cold startup latency, such as:
> > > >
> > > > [snip]
> > > >
> > > > I was saying the exact opposite, that Mike's test is a good example of
> > > > a valid test. It's not measuring single IO latencies, it's doing a
> > > > sequence of valid events and looking at the latency for those. It's
> > > > benchmarking the bigger picture, not a microbenchmark.
> > >
> > > Good, so we are in violent agreement :-)
> >
> > Yes, perhaps that last sentence didn't provide enough evidence of which
> > category I put Mike's test into :-)
> >
> > So to kick things off, I added an 'interactive' knob to CFQ and
> > defaulted it to on, along with re-enabling slice idling for hardware
> > that does tagged command queuing. This is almost completely identical to
> > what Vivek Goyal originally posted, it's just combined into one and uses
> > the term 'interactive' instead of 'fairness'. I think the former is a
> > better umbrella under which to add further tweaks that may sacrifice
> > throughput slightly, in the quest for better latency.
> >
> > It's queued up in the for-linus branch.
>
> FWIW, I did a matrix of Vivek's patch combined with my hack. Seems we
> do lose a bit of dd throughput over stock with either or both.
>
> dd pre 65.1 65.4 67.5 64.8 65.1 65.5 fairness=1 overload_delay=1
> perf stat 1.70 1.94 1.32 1.89 1.87 1.7
> dd post 69.4 62.3 69.7 70.3 69.6 68.2
>
> dd pre 67.0 67.8 64.7 64.7 64.9 65.8 fairness=1 overload_delay=0
> perf stat 4.89 3.13 2.98 2.71 2.17 3.1
> dd post 67.2 63.3 62.6 62.8 63.1 63.8
>
> dd pre 65.0 66.0 66.9 64.6 67.0 65.9 fairness=0 overload_delay=1
> perf stat 4.66 3.81 4.23 2.98 4.23 3.9
> dd post 62.0 60.8 62.4 61.4 62.2 61.7
>
> dd pre 65.3 65.6 64.9 69.5 65.8 66.2 fairness=0 overload_delay=0
> perf stat 14.79 9.11 14.16 8.44 13.67 12.0
> dd post 64.1 66.5 64.0 66.5 64.4 65.1

I'm not too worried about the "single IO producer" scenarios, and it
looks like (from a quick look) that most of your numbers are within some
expected noise levels. It's the more complex mixes that are likely to
cause a bit of a stink, but lets worry about that later. One quick thing
would be to read eg 2 or more files sequentially from disk and see how
that performs.

If you could do a cleaned up version of your overload patch based on
this:

http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=1d2235152dc745c6d94bedb550fea84cffdbf768

then lets take it from there.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/