From: Jens Axboe on
On 11/06/10 21.07, Vivek Goyal wrote:
> On Fri, Jun 11, 2010 at 11:18:47AM +0200, Jens Axboe wrote:
>> On 2010-06-11 10:55, Ingo Molnar wrote:
>>>>> Caused by the same blkiocg_update_io_add_stats() function. Bootlog and config
>>>>> attached. Reproducible on that sha1 and with that config.
>>>>
>>>> I think I see it, the internal CFQ blkg groups are not properly
>>>> initialized... Will send a patch shortly.
>>>
>>> Cool - can test it with a short turnaround, the bug is easy to reproduce.
>>
>> Here's a nasty patch that should fix it. Not optimal, since we really
>> just want empty functions for these when cfq group scheduling is not
>> defined.
>>
>> CC'ing the guilty parties to come up with a better patch that does NOT
>> involve ifdefs in cfq-iosched.c. We want blk-cgroup.[ch] fixed up.
>> And trimming the CC list a bit.
>
> Jens, Ingo, I am sorry for this mess.
>
> Jens,
>
> How about introducing "block/cfq.h" and declaring additional set of wrapper
> functions to update blkiocg stats and make these do nothing if
> CFQ_GROUP_IOSCHED=n.
>
> For example, in linux-2.6/block/cfq.h, we can define functions as follows.
>
> #ifdef CONFIG_CFQ_GROUP_IOSCHED
> cfq_blkiocg_update_dequeue_stats () {
> blkiocg_update_dequeue_stats()
> }
> #else
> cfq_blkiocg_update_dequeue_stats () {}
> #endif
>
> Fixing it blk-cgroup.[ch] might not be best as BLK_CGROUP is set.
> Secondly, if there are other IO control policies later, they might
> want to make use of BLK_CGROUP while cfq has disabled the group io
> scheduling.

I already tried such a patch, but it's not exactly pretty. How about
splitting blk-cgroup.c into two parts, one that is built for
BLK_CGROUP and an additional one that is also built for
CFQ_GROUP_SCHED? Lets try and improve on the ifdef mess, not extend
it.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On 11/06/10 21.48, Vivek Goyal wrote:
> On Fri, Jun 11, 2010 at 09:11:49PM +0200, Jens Axboe wrote:
>> On 11/06/10 21.07, Vivek Goyal wrote:
>>> On Fri, Jun 11, 2010 at 11:18:47AM +0200, Jens Axboe wrote:
>>>> On 2010-06-11 10:55, Ingo Molnar wrote:
>>>>>>> Caused by the same blkiocg_update_io_add_stats() function. Bootlog and config
>>>>>>> attached. Reproducible on that sha1 and with that config.
>>>>>>
>>>>>> I think I see it, the internal CFQ blkg groups are not properly
>>>>>> initialized... Will send a patch shortly.
>>>>>
>>>>> Cool - can test it with a short turnaround, the bug is easy to reproduce.
>>>>
>>>> Here's a nasty patch that should fix it. Not optimal, since we really
>>>> just want empty functions for these when cfq group scheduling is not
>>>> defined.
>>>>
>>>> CC'ing the guilty parties to come up with a better patch that does NOT
>>>> involve ifdefs in cfq-iosched.c. We want blk-cgroup.[ch] fixed up.
>>>> And trimming the CC list a bit.
>>>
>>> Jens, Ingo, I am sorry for this mess.
>>>
>>> Jens,
>>>
>>> How about introducing "block/cfq.h" and declaring additional set of wrapper
>>> functions to update blkiocg stats and make these do nothing if
>>> CFQ_GROUP_IOSCHED=n.
>>>
>>> For example, in linux-2.6/block/cfq.h, we can define functions as follows.
>>>
>>> #ifdef CONFIG_CFQ_GROUP_IOSCHED
>>> cfq_blkiocg_update_dequeue_stats () {
>>> blkiocg_update_dequeue_stats()
>>> }
>>> #else
>>> cfq_blkiocg_update_dequeue_stats () {}
>>> #endif
>>>
>>> Fixing it blk-cgroup.[ch] might not be best as BLK_CGROUP is set.
>>> Secondly, if there are other IO control policies later, they might
>>> want to make use of BLK_CGROUP while cfq has disabled the group io
>>> scheduling.
>>
>> I already tried such a patch, but it's not exactly pretty. How about
>> splitting blk-cgroup.c into two parts, one that is built for
>> BLK_CGROUP and an additional one that is also built for
>> CFQ_GROUP_SCHED? Lets try and improve on the ifdef mess, not extend
>> it.
>
> Sorry, I did not understand your suggestion. Can you please throw some more
> light on it.
>
> blk-cgroup.c does not have any cfq specific parts. So I can't split it
> out and build part of it based on CFQ_GROUP_SCHED.

I know they are not cfq specific, but cfq is the only one that calls
them currently. If others depend on them later on, then let that other
blk-cgroup-iosched.o be built for them as well.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Vivek Goyal on
On Fri, Jun 11, 2010 at 09:11:49PM +0200, Jens Axboe wrote:
> On 11/06/10 21.07, Vivek Goyal wrote:
> > On Fri, Jun 11, 2010 at 11:18:47AM +0200, Jens Axboe wrote:
> >> On 2010-06-11 10:55, Ingo Molnar wrote:
> >>>>> Caused by the same blkiocg_update_io_add_stats() function. Bootlog and config
> >>>>> attached. Reproducible on that sha1 and with that config.
> >>>>
> >>>> I think I see it, the internal CFQ blkg groups are not properly
> >>>> initialized... Will send a patch shortly.
> >>>
> >>> Cool - can test it with a short turnaround, the bug is easy to reproduce.
> >>
> >> Here's a nasty patch that should fix it. Not optimal, since we really
> >> just want empty functions for these when cfq group scheduling is not
> >> defined.
> >>
> >> CC'ing the guilty parties to come up with a better patch that does NOT
> >> involve ifdefs in cfq-iosched.c. We want blk-cgroup.[ch] fixed up.
> >> And trimming the CC list a bit.
> >
> > Jens, Ingo, I am sorry for this mess.
> >
> > Jens,
> >
> > How about introducing "block/cfq.h" and declaring additional set of wrapper
> > functions to update blkiocg stats and make these do nothing if
> > CFQ_GROUP_IOSCHED=n.
> >
> > For example, in linux-2.6/block/cfq.h, we can define functions as follows.
> >
> > #ifdef CONFIG_CFQ_GROUP_IOSCHED
> > cfq_blkiocg_update_dequeue_stats () {
> > blkiocg_update_dequeue_stats()
> > }
> > #else
> > cfq_blkiocg_update_dequeue_stats () {}
> > #endif
> >
> > Fixing it blk-cgroup.[ch] might not be best as BLK_CGROUP is set.
> > Secondly, if there are other IO control policies later, they might
> > want to make use of BLK_CGROUP while cfq has disabled the group io
> > scheduling.
>
> I already tried such a patch, but it's not exactly pretty. How about
> splitting blk-cgroup.c into two parts, one that is built for
> BLK_CGROUP and an additional one that is also built for
> CFQ_GROUP_SCHED? Lets try and improve on the ifdef mess, not extend
> it.

Sorry, I did not understand your suggestion. Can you please throw some more
light on it.

blk-cgroup.c does not have any cfq specific parts. So I can't split it
out and build part of it based on CFQ_GROUP_SCHED.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Vivek Goyal on
On Fri, Jun 11, 2010 at 09:53:06PM +0200, Jens Axboe wrote:
[..]
> >>> How about introducing "block/cfq.h" and declaring additional set of wrapper
> >>> functions to update blkiocg stats and make these do nothing if
> >>> CFQ_GROUP_IOSCHED=n.
> >>>
> >>> For example, in linux-2.6/block/cfq.h, we can define functions as follows.
> >>>
> >>> #ifdef CONFIG_CFQ_GROUP_IOSCHED
> >>> cfq_blkiocg_update_dequeue_stats () {
> >>> blkiocg_update_dequeue_stats()
> >>> }
> >>> #else
> >>> cfq_blkiocg_update_dequeue_stats () {}
> >>> #endif
> >>>
> >>> Fixing it blk-cgroup.[ch] might not be best as BLK_CGROUP is set.
> >>> Secondly, if there are other IO control policies later, they might
> >>> want to make use of BLK_CGROUP while cfq has disabled the group io
> >>> scheduling.
> >>
> >> I already tried such a patch, but it's not exactly pretty. How about
> >> splitting blk-cgroup.c into two parts, one that is built for
> >> BLK_CGROUP and an additional one that is also built for
> >> CFQ_GROUP_SCHED? Lets try and improve on the ifdef mess, not extend
> >> it.
> >
> > Sorry, I did not understand your suggestion. Can you please throw some more
> > light on it.
> >
> > blk-cgroup.c does not have any cfq specific parts. So I can't split it
> > out and build part of it based on CFQ_GROUP_SCHED.
>
> I know they are not cfq specific, but cfq is the only one that calls
> them currently. If others depend on them later on, then let that other
> blk-cgroup-iosched.o be built for them as well.

Hi Jens,

IIUC, you are suggesting that all the blkiocg interfaces used by CFQ, pull
these out in a separate file say, blk-cgroup-iosched.c and build this
file only if CFQ_GROUP_IOSCHED is enabled.

Two things come to mind.

- If CFQ_GROUP_IOSCHED=n, then nothing much is left in blk-cgroup.c. IOW,
what good a blkio controller interface is without blk-cgroup-iosched.c

- More importantly, when a new policy is implemented, then it will force
building blk-cgroup-iosched.o (even if CFQ_GROUP_IOSCHED=n) and then
we will be face the same issue that CFQ_GROUP_IOSCHED=n but CFQ is
calling all the stat update functions and spin lock warning triggers.

If we create two binaries, one for cfq and for new policy, say
blk-cgroup-cfq.o and blk-cgroup-dm-ioband.o, that is also not a very
pleasant situation because many of the stats interface in these two
binaries are common and we will be unnecessary creating two binary
copies of same functions having different name.
cfq_blkiocg_update_completion_stats()
dm_ioband_blkiocg_update_completion_stats()

I am still trying to implement what you suggested. Breaking apart
blk-cgroup.c and blk-cgroup.h is making it ugly. In the mean time, I
would request you to reconsider the providing another wrapper approach
by implementing the "cfq.h".

I am attaching a patch for that approach. I have done some basic compile
testing on it. I can do more testing on it in case you decide to change your
mind.

Thanks
Vivek


---
block/cfq-iosched.c | 56 ++++++++++++-------------
block/cfq.h | 115 ++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 143 insertions(+), 28 deletions(-)

Index: linux-2.6/block/cfq-iosched.c
===================================================================
--- linux-2.6.orig/block/cfq-iosched.c 2010-06-12 09:11:39.548160746 -0400
+++ linux-2.6/block/cfq-iosched.c 2010-06-12 09:16:32.364816205 -0400
@@ -14,7 +14,7 @@
#include <linux/rbtree.h>
#include <linux/ioprio.h>
#include <linux/blktrace_api.h>
-#include "blk-cgroup.h"
+#include "cfq.h"

/*
* tunables
@@ -879,7 +879,7 @@
if (!RB_EMPTY_NODE(&cfqg->rb_node))
cfq_rb_erase(&cfqg->rb_node, st);
cfqg->saved_workload_slice = 0;
- blkiocg_update_dequeue_stats(&cfqg->blkg, 1);
+ cfq_blkiocg_update_dequeue_stats(&cfqg->blkg, 1);
}

static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq)
@@ -939,8 +939,8 @@

cfq_log_cfqg(cfqd, cfqg, "served: vt=%llu min_vt=%llu", cfqg->vdisktime,
st->min_vdisktime);
- blkiocg_update_timeslice_used(&cfqg->blkg, used_sl);
- blkiocg_set_start_empty_time(&cfqg->blkg);
+ cfq_blkiocg_update_timeslice_used(&cfqg->blkg, used_sl);
+ cfq_blkiocg_set_start_empty_time(&cfqg->blkg);
}

#ifdef CONFIG_CFQ_GROUP_IOSCHED
@@ -995,7 +995,7 @@

/* Add group onto cgroup list */
sscanf(dev_name(bdi->dev), "%u:%u", &major, &minor);
- blkiocg_add_blkio_group(blkcg, &cfqg->blkg, (void *)cfqd,
+ cfq_blkiocg_add_blkio_group(blkcg, &cfqg->blkg, (void *)cfqd,
MKDEV(major, minor));
cfqg->weight = blkcg_get_weight(blkcg, cfqg->blkg.dev);

@@ -1079,7 +1079,7 @@
* it from cgroup list, then it will take care of destroying
* cfqg also.
*/
- if (!blkiocg_del_blkio_group(&cfqg->blkg))
+ if (!cfq_blkiocg_del_blkio_group(&cfqg->blkg))
cfq_destroy_cfqg(cfqd, cfqg);
}
}
@@ -1421,10 +1421,10 @@
{
elv_rb_del(&cfqq->sort_list, rq);
cfqq->queued[rq_is_sync(rq)]--;
- blkiocg_update_io_remove_stats(&(RQ_CFQG(rq))->blkg, rq_data_dir(rq),
- rq_is_sync(rq));
+ cfq_blkiocg_update_io_remove_stats(&(RQ_CFQG(rq))->blkg,
+ rq_data_dir(rq), rq_is_sync(rq));
cfq_add_rq_rb(rq);
- blkiocg_update_io_add_stats(&(RQ_CFQG(rq))->blkg,
+ cfq_blkiocg_update_io_add_stats(&(RQ_CFQG(rq))->blkg,
&cfqq->cfqd->serving_group->blkg, rq_data_dir(rq),
rq_is_sync(rq));
}
@@ -1482,8 +1482,8 @@
cfq_del_rq_rb(rq);

cfqq->cfqd->rq_queued--;
- blkiocg_update_io_remove_stats(&(RQ_CFQG(rq))->blkg, rq_data_dir(rq),
- rq_is_sync(rq));
+ cfq_blkiocg_update_io_remove_stats(&(RQ_CFQG(rq))->blkg,
+ rq_data_dir(rq), rq_is_sync(rq));
if (rq_is_meta(rq)) {
WARN_ON(!cfqq->meta_pending);
cfqq->meta_pending--;
@@ -1518,8 +1518,8 @@
static void cfq_bio_merged(struct request_queue *q, struct request *req,
struct bio *bio)
{
- blkiocg_update_io_merged_stats(&(RQ_CFQG(req))->blkg, bio_data_dir(bio),
- cfq_bio_sync(bio));
+ cfq_blkiocg_update_io_merged_stats(&(RQ_CFQG(req))->blkg,
+ bio_data_dir(bio), cfq_bio_sync(bio));
}

static void
@@ -1539,8 +1539,8 @@
if (cfqq->next_rq == next)
cfqq->next_rq = rq;
cfq_remove_request(next);
- blkiocg_update_io_merged_stats(&(RQ_CFQG(rq))->blkg, rq_data_dir(next),
- rq_is_sync(next));
+ cfq_blkiocg_update_io_merged_stats(&(RQ_CFQG(rq))->blkg,
+ rq_data_dir(next), rq_is_sync(next));
}

static int cfq_allow_merge(struct request_queue *q, struct request *rq,
@@ -1571,7 +1571,7 @@
static inline void cfq_del_timer(struct cfq_data *cfqd, struct cfq_queue *cfqq)
{
del_timer(&cfqd->idle_slice_timer);
- blkiocg_update_idle_time_stats(&cfqq->cfqg->blkg);
+ cfq_blkiocg_update_idle_time_stats(&cfqq->cfqg->blkg);
}

static void __cfq_set_active_queue(struct cfq_data *cfqd,
@@ -1580,7 +1580,7 @@
if (cfqq) {
cfq_log_cfqq(cfqd, cfqq, "set_active wl_prio:%d wl_type:%d",
cfqd->serving_prio, cfqd->serving_type);
- blkiocg_update_avg_queue_size_stats(&cfqq->cfqg->blkg);
+ cfq_blkiocg_update_avg_queue_size_stats(&cfqq->cfqg->blkg);
cfqq->slice_start = 0;
cfqq->dispatch_start = jiffies;
cfqq->allocated_slice = 0;
@@ -1911,7 +1911,7 @@
sl = cfqd->cfq_slice_idle;

mod_timer(&cfqd->idle_slice_timer, jiffies + sl);
- blkiocg_update_set_idle_time_stats(&cfqq->cfqg->blkg);
+ cfq_blkiocg_update_set_idle_time_stats(&cfqq->cfqg->blkg);
cfq_log_cfqq(cfqd, cfqq, "arm_idle: %lu", sl);
}

@@ -1931,7 +1931,7 @@
elv_dispatch_sort(q, rq);

cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]++;
- blkiocg_update_dispatch_stats(&cfqq->cfqg->blkg, blk_rq_bytes(rq),
+ cfq_blkiocg_update_dispatch_stats(&cfqq->cfqg->blkg, blk_rq_bytes(rq),
rq_data_dir(rq), rq_is_sync(rq));
}

@@ -3248,7 +3248,7 @@
cfq_clear_cfqq_wait_request(cfqq);
__blk_run_queue(cfqd->queue);
} else {
- blkiocg_update_idle_time_stats(
+ cfq_blkiocg_update_idle_time_stats(
&cfqq->cfqg->blkg);
cfq_mark_cfqq_must_dispatch(cfqq);
}
@@ -3276,7 +3276,7 @@
rq_set_fifo_time(rq, jiffies + cfqd->cfq_fifo_expire[rq_is_sync(rq)]);
list_add_tail(&rq->queuelist, &cfqq->fifo);
cfq_add_rq_rb(rq);
- blkiocg_update_io_add_stats(&(RQ_CFQG(rq))->blkg,
+ cfq_blkiocg_update_io_add_stats(&(RQ_CFQG(rq))->blkg,
&cfqd->serving_group->blkg, rq_data_dir(rq),
rq_is_sync(rq));
cfq_rq_enqueued(cfqd, cfqq, rq);
@@ -3364,9 +3364,9 @@
WARN_ON(!cfqq->dispatched);
cfqd->rq_in_driver--;
cfqq->dispatched--;
- blkiocg_update_completion_stats(&cfqq->cfqg->blkg, rq_start_time_ns(rq),
- rq_io_start_time_ns(rq), rq_data_dir(rq),
- rq_is_sync(rq));
+ cfq_blkiocg_update_completion_stats(&cfqq->cfqg->blkg,
+ rq_start_time_ns(rq), rq_io_start_time_ns(rq),
+ rq_data_dir(rq), rq_is_sync(rq));

cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]--;

@@ -3730,7 +3730,7 @@

cfq_put_async_queues(cfqd);
cfq_release_cfq_groups(cfqd);
- blkiocg_del_blkio_group(&cfqd->root_group.blkg);
+ cfq_blkiocg_del_blkio_group(&cfqd->root_group.blkg);

spin_unlock_irq(q->queue_lock);

@@ -3791,15 +3791,15 @@
/* Give preference to root group over other groups */
cfqg->weight = 2*BLKIO_WEIGHT_DEFAULT;

-#ifdef CONFIG_CFQ_GROUP_IOSCHED
/*
* Take a reference to root group which we never drop. This is just
* to make sure that cfq_put_cfqg() does not try to kfree root group
*/
+#ifdef CONFIG_CFQ_GROUP_IOSCHED
atomic_set(&cfqg->ref, 1);
rcu_read_lock();
- blkiocg_add_blkio_group(&blkio_root_cgroup, &cfqg->blkg, (void *)cfqd,
- 0);
+ cfq_blkiocg_add_blkio_group(&blkio_root_cgroup, &cfqg->blkg,
+ (void *)cfqd, 0);
rcu_read_unlock();
#endif
/*
Index: linux-2.6/block/cfq.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/block/cfq.h 2010-06-12 09:12:42.497044544 -0400
@@ -0,0 +1,115 @@
+#ifndef _CFQ_H
+#define _CFQ_H
+#include "blk-cgroup.h"
+
+#ifdef CONFIG_CFQ_GROUP_IOSCHED
+static inline void cfq_blkiocg_update_io_add_stats(struct blkio_group *blkg,
+ struct blkio_group *curr_blkg, bool direction, bool sync)
+{
+ blkiocg_update_io_add_stats(blkg, curr_blkg, direction, sync);
+}
+
+static inline void cfq_blkiocg_update_dequeue_stats(struct blkio_group *blkg,
+ unsigned long dequeue)
+{
+ blkiocg_update_dequeue_stats(blkg, dequeue);
+}
+
+static inline void cfq_blkiocg_update_timeslice_used(struct blkio_group *blkg,
+ unsigned long time)
+{
+ blkiocg_update_timeslice_used(blkg, time);
+}
+
+static inline void cfq_blkiocg_set_start_empty_time(struct blkio_group *blkg)
+{
+ blkiocg_set_start_empty_time(blkg);
+}
+
+static inline void cfq_blkiocg_update_io_remove_stats(struct blkio_group *blkg,
+ bool direction, bool sync)
+{
+ blkiocg_update_io_remove_stats(blkg, direction, sync);
+}
+
+static inline void cfq_blkiocg_update_io_merged_stats(struct blkio_group *blkg,
+ bool direction, bool sync)
+{
+ blkiocg_update_io_merged_stats(blkg, direction, sync);
+}
+
+static inline void cfq_blkiocg_update_idle_time_stats(struct blkio_group *blkg)
+{
+ blkiocg_update_idle_time_stats(blkg);
+}
+
+static inline void
+cfq_blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg)
+{
+ blkiocg_update_avg_queue_size_stats(blkg);
+}
+
+static inline void
+cfq_blkiocg_update_set_idle_time_stats(struct blkio_group *blkg)
+{
+ blkiocg_update_set_idle_time_stats(blkg);
+}
+
+static inline void cfq_blkiocg_update_dispatch_stats(struct blkio_group *blkg,
+ uint64_t bytes, bool direction, bool sync)
+{
+ blkiocg_update_dispatch_stats(blkg, bytes, direction, sync);
+}
+
+static inline void cfq_blkiocg_update_completion_stats(struct blkio_group *blkg, uint64_t start_time, uint64_t io_start_time, bool direction, bool sync)
+{
+ cfq_blkiocg_update_completion_stats(blkg, start_time, io_start_time,
+ direction, sync);
+}
+
+static inline void cfq_blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
+ struct blkio_group *blkg, void *key, dev_t dev) {
+ blkiocg_add_blkio_group(blkcg, blkg, key, dev);
+}
+
+static inline int cfq_blkiocg_del_blkio_group(struct blkio_group *blkg)
+{
+ return blkiocg_del_blkio_group(blkg);
+}
+
+#else /* CFQ_GROUP_IOSCHED */
+static inline void cfq_blkiocg_update_io_add_stats(struct blkio_group *blkg,
+ struct blkio_group *curr_blkg, bool direction, bool sync) {}
+
+static inline void cfq_blkiocg_update_dequeue_stats(struct blkio_group *blkg,
+ unsigned long dequeue) {}
+
+static inline void cfq_blkiocg_update_timeslice_used(struct blkio_group *blkg,
+ unsigned long time) {}
+static inline void cfq_blkiocg_set_start_empty_time(struct blkio_group *blkg) {}
+static inline void cfq_blkiocg_update_io_remove_stats(struct blkio_group *blkg,
+ bool direction, bool sync) {}
+static inline void cfq_blkiocg_update_io_merged_stats(struct blkio_group *blkg,
+ bool direction, bool sync) {}
+static inline void cfq_blkiocg_update_idle_time_stats(struct blkio_group *blkg)
+{
+}
+static inline void
+cfq_blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg) {}
+
+static inline void
+cfq_blkiocg_update_set_idle_time_stats(struct blkio_group *blkg) {}
+
+static inline void cfq_blkiocg_update_dispatch_stats(struct blkio_group *blkg,
+ uint64_t bytes, bool direction, bool sync) {}
+static inline void cfq_blkiocg_update_completion_stats(struct blkio_group *blkg, uint64_t start_time, uint64_t io_start_time, bool direction, bool sync) {}
+
+static inline void cfq_blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
+ struct blkio_group *blkg, void *key, dev_t dev) {}
+static inline int cfq_blkiocg_del_blkio_group(struct blkio_group *blkg)
+{
+ return 0;
+}
+
+#endif /* CFQ_GROUP_IOSCHED */
+#endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrew Morton on
On Wed, 9 Jun 2010 11:22:35 +0200
"Rafael J. Wysocki" <rjw(a)sisk.pl> wrote:

> On Wednesday 09 June 2010, Sedat Dilek wrote:
> > The patch from [1] is still missing.
> >
> > "cpufreq-call-nr_iowait_cpu-with-disabled-preemption.patch" from
> > Dmitry Monakhoc
> >
> > Tested-by: Sedat Dilek <sedat.dilek(a)gmail.com>
> > Tested-by Maciej Rutecki <maciej.rutecki(a)gmail.com>
> >
> > I have already reported this issue on LKML [2] and cpufreq ML [3].
> >
> > - Sedat -
> >
> > [1] http://www.spinics.net/lists/cpufreq/msg01631.html
> > [2] http://lkml.org/lkml/2010/5/31/77
> > [3] http://www.spinics.net/lists/cpufreq/msg01637.html
>
> Thanks, added.

I just merged a different patch whcih should address this:


From: Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com>

Fix

BUG: using smp_processor_id() in preemptible [00000000] code: s2disk/3392
caller is nr_iowait_cpu+0xe/0x1e
Pid: 3392, comm: s2disk Not tainted 2.6.35-rc3-dbg-00106-ga75e02b #2
Call Trace:
[<c1184c55>] debug_smp_processor_id+0xa5/0xbc
[<c10282a5>] nr_iowait_cpu+0xe/0x1e
[<c104ab7c>] update_ts_time_stats+0x32/0x6c
[<c104ac73>] get_cpu_idle_time_us+0x36/0x58
[<c124229b>] get_cpu_idle_time+0x12/0x74
[<c1242963>] cpufreq_governor_dbs+0xc3/0x2dc
[<c1240437>] __cpufreq_governor+0x51/0x85
[<c1241190>] __cpufreq_set_policy+0x10c/0x13d
[<c12413d3>] cpufreq_add_dev_interface+0x212/0x233
[<c1241b1e>] ? handle_update+0x0/0xd
[<c1241a18>] cpufreq_add_dev+0x34b/0x35a
[<c103c973>] ? schedule_delayed_work_on+0x11/0x13
[<c12c14db>] cpufreq_cpu_callback+0x59/0x63
[<c1042f39>] notifier_call_chain+0x26/0x48
[<c1042f7d>] __raw_notifier_call_chain+0xe/0x10
[<c102efb9>] __cpu_notify+0x15/0x29
[<c102efda>] cpu_notify+0xd/0xf
[<c12bfb30>] _cpu_up+0xaf/0xd2
[<c12b3ad4>] enable_nonboot_cpus+0x3d/0x94
[<c1055eef>] hibernation_snapshot+0x104/0x1a2
[<c1058b49>] snapshot_ioctl+0x24b/0x53e
[<c1028ad1>] ? sub_preempt_count+0x7c/0x89
[<c10ab91d>] vfs_ioctl+0x2e/0x8c
[<c10588fe>] ? snapshot_ioctl+0x0/0x53e
[<c10ac2c7>] do_vfs_ioctl+0x42f/0x45a
[<c10a0ba5>] ? fsnotify_modify+0x4f/0x5a
[<c11e9dc3>] ? tty_write+0x0/0x1d0
[<c10a12d6>] ? vfs_write+0xa2/0xda
[<c10ac333>] sys_ioctl+0x41/0x62
[<c10027d3>] sysenter_do_call+0x12/0x2d

The initial fix was to use get_cpu/put_cpu in nr_iowait_cpu. However,
Arjan stated that "the bug is that it needs to be nr_iowait_cpu(int cpu)".

This patch introduces nr_iowait_cpu(int cpu) and changes to it callers.

akpm: addresses about 30,000,000 different bug reports.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com>
Cc: Arjan van de Ven <arjan(a)infradead.org>
Cc: "Rafael J. Wysocki" <rjw(a)sisk.pl>
Cc: Maxim Levitsky <maximlevitsky(a)gmail.com>
Cc: Len Brown <len.brown(a)intel.com>
Cc: Pavel Machek <pavel(a)ucw.cz>
Cc: Jiri Slaby <jslaby(a)suse.cz>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---

drivers/cpuidle/governors/menu.c | 10 ++++++++--
include/linux/sched.h | 2 +-
kernel/sched.c | 4 ++--
kernel/time/tick-sched.c | 4 +++-
4 files changed, 14 insertions(+), 6 deletions(-)

diff -puN drivers/cpuidle/governors/menu.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu drivers/cpuidle/governors/menu.c
--- a/drivers/cpuidle/governors/menu.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu
+++ a/drivers/cpuidle/governors/menu.c
@@ -137,15 +137,18 @@ static inline int which_bucket(unsigned
{
int bucket = 0;

+ int cpu = get_cpu();
/*
* We keep two groups of stats; one with no
* IO pending, one without.
* This allows us to calculate
* E(duration)|iowait
*/
- if (nr_iowait_cpu())
+ if (nr_iowait_cpu(cpu))
bucket = BUCKETS/2;

+ put_cpu();
+
if (duration < 10)
return bucket;
if (duration < 100)
@@ -169,13 +172,16 @@ static inline int which_bucket(unsigned
static inline int performance_multiplier(void)
{
int mult = 1;
+ int cpu = get_cpu();

/* for higher loadavg, we are more reluctant */

mult += 2 * get_loadavg();

/* for IO wait tasks (per cpu!) we add 5x each */
- mult += 10 * nr_iowait_cpu();
+ mult += 10 * nr_iowait_cpu(cpu);
+
+ put_cpu();

return mult;
}
diff -puN include/linux/sched.h~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu include/linux/sched.h
--- a/include/linux/sched.h~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu
+++ a/include/linux/sched.h
@@ -139,7 +139,7 @@ extern int nr_processes(void);
extern unsigned long nr_running(void);
extern unsigned long nr_uninterruptible(void);
extern unsigned long nr_iowait(void);
-extern unsigned long nr_iowait_cpu(void);
+extern unsigned long nr_iowait_cpu(int cpu);
extern unsigned long this_cpu_load(void);


diff -puN kernel/sched.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu kernel/sched.c
--- a/kernel/sched.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu
+++ a/kernel/sched.c
@@ -2864,9 +2864,9 @@ unsigned long nr_iowait(void)
return sum;
}

-unsigned long nr_iowait_cpu(void)
+unsigned long nr_iowait_cpu(int cpu)
{
- struct rq *this = this_rq();
+ struct rq *this = cpu_rq(cpu);
return atomic_read(&this->nr_iowait);
}

diff -puN kernel/time/tick-sched.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu kernel/time/tick-sched.c
--- a/kernel/time/tick-sched.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu
+++ a/kernel/time/tick-sched.c
@@ -159,10 +159,12 @@ update_ts_time_stats(struct tick_sched *
ktime_t delta;

if (ts->idle_active) {
+ int cpu = get_cpu();
delta = ktime_sub(now, ts->idle_entrytime);
ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
- if (nr_iowait_cpu() > 0)
+ if (nr_iowait_cpu(cpu) > 0)
ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
+ put_cpu();
ts->idle_entrytime = now;
}

_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/