From: Mike Galbraith on
On Fri, 2009-10-02 at 11:55 +0200, Jens Axboe wrote:
> On Fri, Oct 02 2009, Mike Galbraith wrote:
> >
> > /*
> > * Drain async requests before we start sync IO
> > */
> > if (cfq_cfqq_idle_window(cfqq) && cfqd->rq_in_driver[BLK_RW_ASYNC])
> >
> > Looked about the same to me as..
> >
> > enable_idle = old_idle = cfq_cfqq_idle_window(cfqq);
> >
> > ..where Vivek prevented turning 1 into 0, so I stamped it ;-)
>
> cfq_cfqq_idle_window(cfqq) just tells you whether this queue may enter
> idling, not that it is currently idling. The actual idling happens from
> cfq_completed_request(), here:
>
> else if (cfqq_empty && !cfq_close_cooperator(cfqd, cfqq, 1) &&
> sync && !rq_noidle(rq))
> cfq_arm_slice_timer(cfqd);
>
> and after that the queue will be marked as waiting, so
> cfq_cfqq_wait_request(cfqq) is a better indication of whether we are
> currently waiting for a request (idling) or not.

Hm. Then cfq_cfqq_idle_window(cfqq) actually suits my intent better.

(If I want to reduce async's advantage, I should target specifically, ie
only stamp if this queue is a sync queue....otoh, if this queue is sync,
it is now officially too late, whereas if this queue is dd about to
inflict the wrath of kjournald on my reader's world, stamping now is a
really good idea.. scritch scritch scritch <smoke>)

I'll go tinker with it. Thanks for the clue.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Vivek Goyal on
On Fri, Oct 02, 2009 at 12:55:25PM +0200, Corrado Zoccolo wrote:
> Hi Jens,
> On Fri, Oct 2, 2009 at 11:28 AM, Jens Axboe <jens.axboe(a)oracle.com> wrote:
> > On Fri, Oct 02 2009, Ingo Molnar wrote:
> >>
> >> * Jens Axboe <jens.axboe(a)oracle.com> wrote:
> >>
> >
> > It's really not that simple, if we go and do easy latency bits, then
> > throughput drops 30% or more. You can't say it's black and white latency
> > vs throughput issue, that's just not how the real world works. The
> > server folks would be most unpleased.
> Could we be more selective when the latency optimization is introduced?
>
> The code that is currently touched by Vivek's patch is:
> if (!atomic_read(&cic->ioc->nr_tasks) || !cfqd->cfq_slice_idle ||
> (cfqd->hw_tag && CIC_SEEKY(cic)))
> enable_idle = 0;
> basically, when fairness=1, it becomes just:
> if (!atomic_read(&cic->ioc->nr_tasks) || !cfqd->cfq_slice_idle)
> enable_idle = 0;
>

Actually I am not touching this code. Looking at the V10, I have not
changed anything here in idling code.

I think we are seeing latency improvements with fairness=1 because, CFQ
does pure roundrobin and once a seeky reader expires, it is put at the
end of the queue.

I retained the same behavior if fairness=0 but if fairness=1, then I don't
put the seeky reader at the end of queue, instead it gets vdisktime based
on the disk it has used. So it should get placed ahead of sync readers.

I think following is the code snippet in "elevator-fq.c" which is making a
difference.

/*
* We don't want to charge more than allocated slice otherwise
* this
* queue can miss one dispatch round doubling max latencies. On
* the
* other hand we don't want to charge less than allocated slice as
* we stick to CFQ theme of queue loosing its share if it does not
* use the slice and moves to the back of service tree (almost).
*/
if (!ioq->efqd->fairness)
queue_charge = allocated_slice;

So if a sync readers consumes 100ms and an seeky reader dispatches only
one request, then in CFQ, seeky reader gets to dispatch next request after
another 100ms.

With fairness=1, it should get a lower vdisktime when it comes with a new
request because its last slice usage was less (like CFS sleepers as mike
said). But this will make a difference only if there are more than one
processes in the system otherwise a vtime jump will take place by the time
seeky readers gets backlogged.

Anyway, once I started timestamping the queues and started keeping a cache
of expired queues, then any queue which got new request almost
immediately, should get a lower vdisktime assigned if it did not use the
full time slice in the previous dispatch round. Hence with fairness=1,
seeky readers kind of get more share of disk (fair share), because these
are now placed ahead of streaming readers and hence get better latencies.

In short, most likely, better latencies are being experienced because
seeky reader is getting lower time stamp (vdisktime), because it did not
use its full time slice in previous dispatch round, and not because we kept
the idling enabled on seeky reader.

Thanks
Vivek

> Note that, even if we enable idling here, the cfq_arm_slice_timer will use
> a different idle window for seeky (2ms) than for normal I/O.
>
> I think that the 2ms idle window is good for a single rotational SATA disk scenario,
> even if it supports NCQ. Realistic access times for those disks are still around 8ms
> (but it is proportional to seek lenght), and waiting 2ms to see if we get a nearby
> request may pay off, not only in latency and fairness, but also in throughput.
>
> What we don't want to do is to enable idling for NCQ enabled SSDs
> (and this is already taken care in cfq_arm_slice_timer) or for hardware RAIDs.
> If we agree that hardware RAIDs should be marked as non-rotational, then that
> code could become:
>
> if (!atomic_read(&cic->ioc->nr_tasks) || !cfqd->cfq_slice_idle ||
> (blk_queue_nonrot(cfqd->queue) && cfqd->hw_tag && CIC_SEEKY(cic)))
> enable_idle = 0;
> else if (sample_valid(cic->ttime_samples)) {
> unsigned idle_time = CIC_SEEKY(cic) ? CFQ_MIN_TT : cfqd->cfq_slice_idle;
> if (cic->ttime_mean > idle_time)
> enable_idle = 0;
> else
> enable_idle = 1;
> }
>
> Thanks,
> Corrado
>
> >
> > --
> > Jens Axboe
> >
>
> --
> __________________________________________________________________________
>
> dott. Corrado Zoccolo mailto:czoccolo(a)gmail.com
> PhD - Department of Computer Science - University of Pisa, Italy
> --------------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Fri, 2 Oct 2009, Jens Axboe wrote:
>
> It's really not that simple, if we go and do easy latency bits, then
> throughput drops 30% or more.

Well, if we're talking 500-950% improvement vs 30% deprovement, I think
it's pretty clear, though. Even the server people do care about latencies.

Often they care quite a bit, in fact.

And Mike's patch didn't look big or complicated.

> You can't say it's black and white latency vs throughput issue,

Umm. Almost 1000% vs 30%. Forget latency vs throughput. That's pretty damn
black-and-white _regardless_ of what you're measuring. Plus you probably
made up the 30% - have you tested the patch?

And quite frankly, we get a _lot_ of complaints about latency. A LOT. It's
just harder to measure, so people seldom attach numbers to it. But that
again means that when people _are_ able to attach numbers to it, we should
take those numbers _more_ seriously rather than less.

So the 30% you threw out as a number is pretty much worthless.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mike Galbraith on
On Fri, 2009-10-02 at 07:24 -0700, Linus Torvalds wrote:
>
> On Fri, 2 Oct 2009, Jens Axboe wrote:
> >
> > It's really not that simple, if we go and do easy latency bits, then
> > throughput drops 30% or more.
>
> Well, if we're talking 500-950% improvement vs 30% deprovement, I think
> it's pretty clear, though. Even the server people do care about latencies.
>
> Often they care quite a bit, in fact.
>
> And Mike's patch didn't look big or complicated.

But it is a hack. (thought about and measured, but hack nonetheless)

I haven't tested it on much other than reader vs streaming writer. It
may well destroy the rest of the IO universe. I don't have the hw to
even test any hairy chested IO.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Fri, Oct 02 2009, Mike Galbraith wrote:
> On Fri, 2009-10-02 at 07:24 -0700, Linus Torvalds wrote:
> >
> > On Fri, 2 Oct 2009, Jens Axboe wrote:
> > >
> > > It's really not that simple, if we go and do easy latency bits, then
> > > throughput drops 30% or more.
> >
> > Well, if we're talking 500-950% improvement vs 30% deprovement, I think
> > it's pretty clear, though. Even the server people do care about latencies.
> >
> > Often they care quite a bit, in fact.
> >
> > And Mike's patch didn't look big or complicated.
>
> But it is a hack. (thought about and measured, but hack nonetheless)
>
> I haven't tested it on much other than reader vs streaming writer. It
> may well destroy the rest of the IO universe. I don't have the hw to
> even test any hairy chested IO.

I'll get a desktop box going on this too. The plan is to make the
latency as good as we can without making too many stupid decisions in
the io scheduler, then we can care about the throughput later. Rinse
and repeat.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/