IO scheduler based IO controller V10 [Kernel]

Prev: kernel : USB sound problem
Next: [PATCH 1/2] jsm: IRQ handlers doesn't need to have IRQ_DISABLED enabled

From: Munehiro Ikeda on 2 Oct 2009 16:40

Vivek Goyal wrote, on 10/01/2009 10:57 PM:
> Before finishing this mail, will throw a whacky idea in the ring. I was
> going through the request based dm-multipath paper. Will it make sense
> to implement request based dm-ioband? So basically we implement all the
> group scheduling in CFQ and let dm-ioband implement a request function
> to take the request and break it back into bios. This way we can keep
> all the group control at one place and also meet most of the requirements.
>
> So request based dm-ioband will have a request in hand once that request
> has passed group control and prio control. Because dm-ioband is a device
> mapper target, one can put it on higher level devices (practically taking
> CFQ at higher level device), and provide fairness there. One can also
> put it on those SSDs which don't use IO scheduler (this is kind of forcing
> them to use the IO scheduler.)
>
> I am sure that will be many issues but one big issue I could think of that
> CFQ thinks that there is one device beneath it and dipsatches requests
> from one queue (in case of idling) and that would kill parallelism at
> higher layer and throughput will suffer on many of the dm/md configurations.
>
> Thanks
> Vivek

As long as using CFQ, your idea is reasonable for me. But how about for
other IO schedulers? In my understanding, one of the keys to guarantee
group isolation in your patch is to have per-group IO scheduler internal
queue even with as, deadline, and noop scheduler. I think this is
great idea, and to implement generic code for all IO schedulers was
concluded when we had so many IO scheduler specific proposals.
If we will still need per-group IO scheduler internal queues with
request-based dm-ioband, we have to modify elevator layer. It seems
out of scope of dm.
I might miss something...

--
IKEDA, Munehiro
NEC Corporation of America
m-ikeda(a)ds.jp.nec.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Mike Galbraith on 2 Oct 2009 16:50

On Fri, 2009-10-02 at 20:57 +0200, Mike Galbraith wrote:
> On Fri, 2009-10-02 at 20:19 +0200, Jens Axboe wrote:
>
> > I'm not too worried about the "single IO producer" scenarios, and it
> > looks like (from a quick look) that most of your numbers are within some
> > expected noise levels. It's the more complex mixes that are likely to
> > cause a bit of a stink, but lets worry about that later. One quick thing
> > would be to read eg 2 or more files sequentially from disk and see how
> > that performs.
>
> Hm. git(s) should be good for a nice repeatable load. Suggestions?
>
> > If you could do a cleaned up version of your overload patch based on
> > this:
> >
> > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=1d2235152dc745c6d94bedb550fea84cffdbf768
> >
> > then lets take it from there.
>
> I'll try to find a good repeatable git beater first. At this point, I
> only know it helps with one load.

Seems to help mixed concurrent read/write a bit too.

perf stat testo.sh Avg
108.12 106.33 106.34 97.00 106.52 104.8 1.000 fairness=0 overload_delay=0
93.98 102.44 94.47 97.70 98.90 97.4 .929 fairness=0 overload_delay=1
90.87 95.40 95.79 93.09 94.25 93.8 .895 fairness=1 overload_delay=0
89.93 90.57 89.13 93.43 93.72 91.3 .871 fairness=1 overload_delay=1

#!/bin/sh

LOGFILE=testo.log
rm -f $LOGFILE

echo 3 > /proc/sys/vm/drop_caches
sh -c "(cd linux-2.6.23; perf stat -- git checkout -f; git archive --format=tar HEAD > ../linux-2.6.23.tar)" 2>&1|tee -a $LOGFILE &
sh -c "(cd linux-2.6.24; perf stat -- git archive --format=tar HEAD > ../linux-2.6.24.tar; git checkout -f)" 2>&1|tee -a $LOGFILE &
sh -c "(cd linux-2.6.25; perf stat -- git checkout -f; git archive --format=tar HEAD > ../linux-2.6.25.tar)" 2>&1|tee -a $LOGFILE &
sh -c "(cd linux-2.6.26; perf stat -- git archive --format=tar HEAD > ../linux-2.6.26.tar; git checkout -f)" 2>&1|tee -a $LOGFILE &
wait

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Mike Galbraith on 3 Oct 2009 01:50

On Fri, 2009-10-02 at 20:19 +0200, Jens Axboe wrote:

> If you could do a cleaned up version of your overload patch based on
> this:
>
> http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=1d2235152dc745c6d94bedb550fea84cffdbf768
>
> then lets take it from there.

If take it from there ends up meaning apply, and see who squeaks, feel
free to delete the "Not", and my somewhat defective sense of humor.

Block: Delay overloading of CFQ queues to improve read latency.

Introduce a delay maximum dispatch timestamp, and stamp it when:
1. we encounter a known seeky or possibly new sync IO queue.
2. the current queue may go idle and we're draining async IO.
3. we have sync IO in flight and are servicing an async queue.
4 we are not the sole user of disk.
Disallow exceeding quantum if any of these events have occurred recently.

Protect this behavioral change with a "desktop_dispatch" knob and default
it to "on".. providing an easy means of regression verification prior to
hate-mail dispatch :) to CC list.

Signed-off-by: Mike Galbraith <efault(a)gmx.de>
Cc: Jens Axboe <jens.axboe(a)oracle.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
.... others who let somewhat hacky tweak slip by

LKML-Reference: <new-submission>

---
block/cfq-iosched.c | 45 +++++++++++++++++++++++++++++++++++++++++----
1 file changed, 41 insertions(+), 4 deletions(-)

Index: linux-2.6/block/cfq-iosched.c
===================================================================
--- linux-2.6.orig/block/cfq-iosched.c
+++ linux-2.6/block/cfq-iosched.c
@@ -174,6 +174,9 @@ struct cfq_data {
unsigned int cfq_slice_async_rq;
unsigned int cfq_slice_idle;
unsigned int cfq_desktop;
+ unsigned int cfq_desktop_dispatch;
+
+ unsigned long desktop_dispatch_ts;

struct list_head cic_list;

@@ -1283,6 +1286,7 @@ static int cfq_dispatch_requests(struct
struct cfq_data *cfqd = q->elevator->elevator_data;
struct cfq_queue *cfqq;
unsigned int max_dispatch;
+ unsigned long delay;

if (!cfqd->busy_queues)
return 0;
@@ -1297,19 +1301,26 @@ static int cfq_dispatch_requests(struct
/*
* Drain async requests before we start sync IO
*/
- if (cfq_cfqq_idle_window(cfqq) && cfqd->rq_in_driver[BLK_RW_ASYNC])
+ if (cfq_cfqq_idle_window(cfqq) && cfqd->rq_in_driver[BLK_RW_ASYNC]) {
+ cfqd->desktop_dispatch_ts = jiffies;
return 0;
+ }

/*
* If this is an async queue and we have sync IO in flight, let it wait
*/
- if (cfqd->sync_flight && !cfq_cfqq_sync(cfqq))
+ if (cfqd->sync_flight && !cfq_cfqq_sync(cfqq)) {
+ cfqd->desktop_dispatch_ts = jiffies;
return 0;
+ }

max_dispatch = cfqd->cfq_quantum;
if (cfq_class_idle(cfqq))
max_dispatch = 1;

+ if (cfqd->busy_queues > 1)
+ cfqd->desktop_dispatch_ts = jiffies;
+
/*
* Does this cfqq already have too much IO in flight?
*/
@@ -1327,6 +1338,16 @@ static int cfq_dispatch_requests(struct
return 0;

/*
+ * Don't start overloading until we've been alone for a bit.
+ */
+ if (cfqd->cfq_desktop_dispatch) {
+ delay = cfqd->desktop_dispatch_ts + cfq_slice_sync;
+
+ if (time_before(jiffies, max_delay))
+ return 0;
+ }
+
+ /*
* we are the only queue, allow up to 4 times of 'quantum'
*/
if (cfqq->dispatched >= 4 * max_dispatch)
@@ -1942,7 +1963,7 @@ static void
cfq_update_idle_window(struct cfq_data *cfqd, struct cfq_queue *cfqq,
struct cfq_io_context *cic)
{
- int old_idle, enable_idle;
+ int old_idle, enable_idle, seeky = 0;

/*
* Don't idle for async or idle io prio class
@@ -1950,10 +1971,20 @@ cfq_update_idle_window(struct cfq_data *
if (!cfq_cfqq_sync(cfqq) || cfq_class_idle(cfqq))
return;

+ if (cfqd->hw_tag) {
+ if (CIC_SEEKY(cic))
+ seeky = 1;
+ /*
+ * If seeky or incalculable seekiness, delay overloading.
+ */
+ if (seeky || !sample_valid(cic->seek_samples))
+ cfqd->desktop_dispatch_ts = jiffies;
+ }
+
enable_idle = old_idle = cfq_cfqq_idle_window(cfqq);

if (!atomic_read(&cic->ioc->nr_tasks) || !cfqd->cfq_slice_idle ||
- (!cfqd->cfq_desktop && cfqd->hw_tag && CIC_SEEKY(cic)))
+ (!cfqd->cfq_desktop && seeky))
enable_idle = 0;
else if (sample_valid(cic->ttime_samples)) {
if (cic->ttime_mean > cfqd->cfq_slice_idle)
@@ -2483,6 +2514,9 @@ static void *cfq_init_queue(struct reque
cfqd->cfq_slice_async_rq = cfq_slice_async_rq;
cfqd->cfq_slice_idle = cfq_slice_idle;
cfqd->cfq_desktop = 1;
+ cfqd->cfq_desktop_dispatch = 1;
+
+ cfqd->desktop_dispatch_ts = INITIAL_JIFFIES;
cfqd->hw_tag = 1;

return cfqd;
@@ -2553,6 +2587,7 @@ SHOW_FUNCTION(cfq_slice_sync_show, cfqd-
SHOW_FUNCTION(cfq_slice_async_show, cfqd->cfq_slice[0], 1);
SHOW_FUNCTION(cfq_slice_async_rq_show, cfqd->cfq_slice_async_rq, 0);
SHOW_FUNCTION(cfq_desktop_show, cfqd->cfq_desktop, 0);
+SHOW_FUNCTION(cfq_desktop_dispatch_show, cfqd->cfq_desktop_dispatch, 0);
#undef SHOW_FUNCTION

#define STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, __CONV) \
@@ -2585,6 +2620,7 @@ STORE_FUNCTION(cfq_slice_async_store, &c
STORE_FUNCTION(cfq_slice_async_rq_store, &cfqd->cfq_slice_async_rq, 1,
UINT_MAX, 0);
STORE_FUNCTION(cfq_desktop_store, &cfqd->cfq_desktop, 0, 1, 0);
+STORE_FUNCTION(cfq_desktop_dispatch_store, &cfqd->cfq_desktop_dispatch, 0, 1, 0);
#undef STORE_FUNCTION

#define CFQ_ATTR(name) \
@@ -2601,6 +2637,7 @@ static struct elv_fs_entry cfq_attrs[] =
CFQ_ATTR(slice_async_rq),
CFQ_ATTR(slice_idle),
CFQ_ATTR(desktop),
+ CFQ_ATTR(desktop_dispatch),
__ATTR_NULL
};

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Mike Galbraith on 3 Oct 2009 02:00

On Sat, 2009-10-03 at 07:49 +0200, Mike Galbraith wrote:
> On Fri, 2009-10-02 at 20:19 +0200, Jens Axboe wrote:
>
> > If you could do a cleaned up version of your overload patch based on
> > this:
> >
> > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=1d2235152dc745c6d94bedb550fea84cffdbf768
> >
> > then lets take it from there.

Note to self: build the darn thing after last minute changes.

Block: Delay overloading of CFQ queues to improve read latency.

Introduce a delay maximum dispatch timestamp, and stamp it when:
1. we encounter a known seeky or possibly new sync IO queue.
2. the current queue may go idle and we're draining async IO.
3. we have sync IO in flight and are servicing an async queue.
4 we are not the sole user of disk.
Disallow exceeding quantum if any of these events have occurred recently.

Protect this behavioral change with a "desktop_dispatch" knob and default
it to "on".. providing an easy means of regression verification prior to
hate-mail dispatch :) to CC list.

Signed-off-by: Mike Galbraith <efault(a)gmx.de>
Cc: Jens Axboe <jens.axboe(a)oracle.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
.... others who let somewhat hacky tweak slip by

---
block/cfq-iosched.c | 45 +++++++++++++++++++++++++++++++++++++++++----
1 file changed, 41 insertions(+), 4 deletions(-)

Index: linux-2.6/block/cfq-iosched.c
===================================================================
--- linux-2.6.orig/block/cfq-iosched.c
+++ linux-2.6/block/cfq-iosched.c
@@ -174,6 +174,9 @@ struct cfq_data {
unsigned int cfq_slice_async_rq;
unsigned int cfq_slice_idle;
unsigned int cfq_desktop;
+ unsigned int cfq_desktop_dispatch;
+
+ unsigned long desktop_dispatch_ts;

struct list_head cic_list;

@@ -1283,6 +1286,7 @@ static int cfq_dispatch_requests(struct
struct cfq_data *cfqd = q->elevator->elevator_data;
struct cfq_queue *cfqq;
unsigned int max_dispatch;
+ unsigned long delay;

if (!cfqd->busy_queues)
return 0;
@@ -1297,19 +1301,26 @@ static int cfq_dispatch_requests(struct
/*
* Drain async requests before we start sync IO
*/
- if (cfq_cfqq_idle_window(cfqq) && cfqd->rq_in_driver[BLK_RW_ASYNC])
+ if (cfq_cfqq_idle_window(cfqq) && cfqd->rq_in_driver[BLK_RW_ASYNC]) {
+ cfqd->desktop_dispatch_ts = jiffies;
return 0;
+ }

/*
* If this is an async queue and we have sync IO in flight, let it wait
*/
- if (cfqd->sync_flight && !cfq_cfqq_sync(cfqq))
+ if (cfqd->sync_flight && !cfq_cfqq_sync(cfqq)) {
+ cfqd->desktop_dispatch_ts = jiffies;
return 0;
+ }

max_dispatch = cfqd->cfq_quantum;
if (cfq_class_idle(cfqq))
max_dispatch = 1;

+ if (cfqd->busy_queues > 1)
+ cfqd->desktop_dispatch_ts = jiffies;
+
/*
* Does this cfqq already have too much IO in flight?
*/
@@ -1327,6 +1338,16 @@ static int cfq_dispatch_requests(struct
return 0;

/*
+ * Don't start overloading until we've been alone for a bit.
+ */
+ if (cfqd->cfq_desktop_dispatch) {
+ delay = cfqd->desktop_dispatch_ts + cfq_slice_sync;
+
+ if (time_before(jiffies, max_delay))
+ return 0;
+ }
+
+ /*
* we are the only queue, allow up to 4 times of 'quantum'
*/
if (cfqq->dispatched >= 4 * max_dispatch)
@@ -1942,7 +1963,7 @@ static void
cfq_update_idle_window(struct cfq_data *cfqd, struct cfq_queue *cfqq,
struct cfq_io_context *cic)
{
- int old_idle, enable_idle;
+ int old_idle, enable_idle, seeky = 0;

/*
* Don't idle for async or idle io prio class
@@ -1950,10 +1971,20 @@ cfq_update_idle_window(struct cfq_data *
if (!cfq_cfqq_sync(cfqq) || cfq_class_idle(cfqq))
return;

+ if (cfqd->hw_tag) {
+ if (CIC_SEEKY(cic))
+ seeky = 1;
+ /*
+ * If seeky or incalculable seekiness, delay overloading.
+ */
+ if (seeky || !sample_valid(cic->seek_samples))
+ cfqd->desktop_dispatch_ts = jiffies;
+ }
+
enable_idle = old_idle = cfq_cfqq_idle_window(cfqq);

if (!atomic_read(&cic->ioc->nr_tasks) || !cfqd->cfq_slice_idle ||
- (!cfqd->cfq_desktop && cfqd->hw_tag && CIC_SEEKY(cic)))
+ (!cfqd->cfq_desktop && seeky))
enable_idle = 0;
else if (sample_valid(cic->ttime_samples)) {
if (cic->ttime_mean > cfqd->cfq_slice_idle)
@@ -2483,6 +2514,9 @@ static void *cfq_init_queue(struct reque
cfqd->cfq_slice_async_rq = cfq_slice_async_rq;
cfqd->cfq_slice_idle = cfq_slice_idle;
cfqd->cfq_desktop = 1;
+ cfqd->cfq_desktop_dispatch = 1;
+
+ cfqd->desktop_dispatch_ts = INITIAL_JIFFIES;
cfqd->hw_tag = 1;

return cfqd;
@@ -2553,6 +2587,7 @@ SHOW_FUNCTION(cfq_slice_sync_show, cfqd-
SHOW_FUNCTION(cfq_slice_async_show, cfqd->cfq_slice[0], 1);
SHOW_FUNCTION(cfq_slice_async_rq_show, cfqd->cfq_slice_async_rq, 0);
SHOW_FUNCTION(cfq_desktop_show, cfqd->cfq_desktop, 0);
+SHOW_FUNCTION(cfq_desktop_dispatch_show, cfqd->cfq_desktop_dispatch, 0);
#undef SHOW_FUNCTION

#define STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, __CONV) \
@@ -2585,6 +2620,7 @@ STORE_FUNCTION(cfq_slice_async_store, &c
STORE_FUNCTION(cfq_slice_async_rq_store, &cfqd->cfq_slice_async_rq, 1,
UINT_MAX, 0);
STORE_FUNCTION(cfq_desktop_store, &cfqd->cfq_desktop, 0, 1, 0);
+STORE_FUNCTION(cfq_desktop_dispatch_store, &cfqd->cfq_desktop_dispatch, 0, 1, 0);
#undef STORE_FUNCTION

#define CFQ_ATTR(name) \
@@ -2601,6 +2637,7 @@ static struct elv_fs_entry cfq_attrs[] =
CFQ_ATTR(slice_async_rq),
CFQ_ATTR(slice_idle),
CFQ_ATTR(desktop),
+ CFQ_ATTR(desktop_dispatch),
__ATTR_NULL
};

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ingo Molnar on 3 Oct 2009 03:30

* Mike Galbraith <efault(a)gmx.de> wrote:

> unsigned int cfq_desktop;
> + unsigned int cfq_desktop_dispatch;

> - if (cfq_cfqq_idle_window(cfqq) && cfqd->rq_in_driver[BLK_RW_ASYNC])
> + if (cfq_cfqq_idle_window(cfqq) && cfqd->rq_in_driver[BLK_RW_ASYNC]) {
> + cfqd->desktop_dispatch_ts = jiffies;
> return 0;
> + }

btw., i hope all those desktop_ things will be named latency_ pretty
soon as the consensus seems to be - the word 'desktop' feels so wrong in
this context.

'desktop' is a form of use of computers and the implication of good
latencies goes far beyond that category of systems.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Prev: kernel : USB sound problem
Next: [PATCH 1/2] jsm: IRQ handlers doesn't need to have IRQ_DISABLED enabled