From: Arjan van de Ven on
On Thu, 17 Jun 2010 09:29:50 +0300
Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com> wrote:

> Fix
>
> BUG: using smp_processor_id() in preemptible [00000000] code:
> s2disk/3392 caller is nr_iowait_cpu+0xe/0x1e
> Pid: 3392, comm: s2disk Not tainted 2.6.35-rc3-dbg-00106-ga75e02b #2
> Call Trace:
> [<c1184c55>] debug_smp_processor_id+0xa5/0xbc
> [<c10282a5>] nr_iowait_cpu+0xe/0x1e
> [<c104ab7c>] update_ts_time_stats+0x32/0x6c
> [<c104ac73>] get_cpu_idle_time_us+0x36/0x58
> [<c124229b>] get_cpu_idle_time+0x12/0x74
> [<c1242963>] cpufreq_governor_dbs+0xc3/0x2dc
> [<c1240437>] __cpufreq_governor+0x51/0x85
> [<c1241190>] __cpufreq_set_policy+0x10c/0x13d
> [<c12413d3>] cpufreq_add_dev_interface+0x212/0x233
> [<c1241b1e>] ? handle_update+0x0/0xd
> [<c1241a18>] cpufreq_add_dev+0x34b/0x35a
> [<c103c973>] ? schedule_delayed_work_on+0x11/0x13
> [<c12c14db>] cpufreq_cpu_callback+0x59/0x63
> [<c1042f39>] notifier_call_chain+0x26/0x48
> [<c1042f7d>] __raw_notifier_call_chain+0xe/0x10
> [<c102efb9>] __cpu_notify+0x15/0x29
> [<c102efda>] cpu_notify+0xd/0xf
> [<c12bfb30>] _cpu_up+0xaf/0xd2
> [<c12b3ad4>] enable_nonboot_cpus+0x3d/0x94
> [<c1055eef>] hibernation_snapshot+0x104/0x1a2
> [<c1058b49>] snapshot_ioctl+0x24b/0x53e
> [<c1028ad1>] ? sub_preempt_count+0x7c/0x89
> [<c10ab91d>] vfs_ioctl+0x2e/0x8c
> [<c10588fe>] ? snapshot_ioctl+0x0/0x53e
> [<c10ac2c7>] do_vfs_ioctl+0x42f/0x45a
> [<c10a0ba5>] ? fsnotify_modify+0x4f/0x5a
> [<c11e9dc3>] ? tty_write+0x0/0x1d0
> [<c10a12d6>] ? vfs_write+0xa2/0xda
> [<c10ac333>] sys_ioctl+0x41/0x62
> [<c10027d3>] sysenter_do_call+0x12/0x2d
>
> The initial fix was to use get_cpu/put_cpu in nr_iowait_cpu. However,
> Arjan stated that "the bug is that it needs to be nr_iowait_cpu(int
> cpu)".
>
> This patch introduces nr_iowait_cpu(int cpu) and changes to its
> callers.
>
> Arjan also pointed out that we can't use get_cpu/put_cpu in
> update_ts_time_stats since we "pick the current cpu, rather than the
> one denoted by ts" in that case. To match given *ts and cpu denoted
> by *ts we use new field in the struct tick_sched: int cpu.
>
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com>

Acked-by: Arjan van de Ven <arjan(a)linux.intel.com>


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrew Morton on
On Thu, 17 Jun 2010 09:29:50 +0300 Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com> wrote:

> Fix
>
> BUG: using smp_processor_id() in preemptible [00000000] code: s2disk/3392
> caller is nr_iowait_cpu+0xe/0x1e
> Pid: 3392, comm: s2disk Not tainted 2.6.35-rc3-dbg-00106-ga75e02b #2
> Call Trace:
> [<c1184c55>] debug_smp_processor_id+0xa5/0xbc
> [<c10282a5>] nr_iowait_cpu+0xe/0x1e
> [<c104ab7c>] update_ts_time_stats+0x32/0x6c
> [<c104ac73>] get_cpu_idle_time_us+0x36/0x58
> [<c124229b>] get_cpu_idle_time+0x12/0x74
> [<c1242963>] cpufreq_governor_dbs+0xc3/0x2dc
> [<c1240437>] __cpufreq_governor+0x51/0x85
> [<c1241190>] __cpufreq_set_policy+0x10c/0x13d
> [<c12413d3>] cpufreq_add_dev_interface+0x212/0x233
> [<c1241b1e>] ? handle_update+0x0/0xd
> [<c1241a18>] cpufreq_add_dev+0x34b/0x35a
> [<c103c973>] ? schedule_delayed_work_on+0x11/0x13
> [<c12c14db>] cpufreq_cpu_callback+0x59/0x63
> [<c1042f39>] notifier_call_chain+0x26/0x48
> [<c1042f7d>] __raw_notifier_call_chain+0xe/0x10
> [<c102efb9>] __cpu_notify+0x15/0x29
> [<c102efda>] cpu_notify+0xd/0xf
> [<c12bfb30>] _cpu_up+0xaf/0xd2
> [<c12b3ad4>] enable_nonboot_cpus+0x3d/0x94
> [<c1055eef>] hibernation_snapshot+0x104/0x1a2
> [<c1058b49>] snapshot_ioctl+0x24b/0x53e
> [<c1028ad1>] ? sub_preempt_count+0x7c/0x89
> [<c10ab91d>] vfs_ioctl+0x2e/0x8c
> [<c10588fe>] ? snapshot_ioctl+0x0/0x53e
> [<c10ac2c7>] do_vfs_ioctl+0x42f/0x45a
> [<c10a0ba5>] ? fsnotify_modify+0x4f/0x5a
> [<c11e9dc3>] ? tty_write+0x0/0x1d0
> [<c10a12d6>] ? vfs_write+0xa2/0xda
> [<c10ac333>] sys_ioctl+0x41/0x62
> [<c10027d3>] sysenter_do_call+0x12/0x2d
>
> The initial fix was to use get_cpu/put_cpu in nr_iowait_cpu. However,
> Arjan stated that "the bug is that it needs to be nr_iowait_cpu(int cpu)".
>
> This patch introduces nr_iowait_cpu(int cpu) and changes to its callers.
>
> Arjan also pointed out that we can't use get_cpu/put_cpu in update_ts_time_stats
> since we "pick the current cpu, rather than the one denoted by ts" in that case.
> To match given *ts and cpu denoted by *ts we use new field in the struct tick_sched: int cpu.
>
>
> ...
>
> struct tick_sched *tick_get_tick_sched(int cpu)
> {
> + /*FIXME: Arjan van de Ven:
> + can we do this bit once, when the ts structure gets initialized?*/
> + per_cpu(tick_cpu_sched, cpu).cpu = cpu;
> return &per_cpu(tick_cpu_sched, cpu);
> }

That's just weird. And by doing a write it does require that this
cahcheline be probably-read and written back regularly, which is more
bus traffic.

It should be OK to initialise these guys with a for_each_possible_cpu()
loop in a new module_init() function in tick-sched.c - if someone runs
update_ts_time_stats() before the initcalls then conceivably the
`swapper' process's accounting will go a little bit wrong, but I doubt
it.

Still, it'd be better to do it earlier, I guess. tick_init() is called
super-early and that would be a good place. tick_init() is presently a
no-op if !CONFIG_GENERIC_CLOCKEVENTS, but all this code depends on
CONFIG_GENERIC_CLOCKEVENTS anwyay.

So how does this look? If "OK" then would you be able to test it please?


[ Sigh. The field tick_sched.cpu shouldn't even exist on
uniprocessor builds. Ifdeffing it away is trivial and a bit messy,
but it's still only a partial solution. Passing the `cpu' argument
to nr_iowait_cpu() will generate additional code, and it's unneeded
on uniprocessor builds.]


include/linux/tick.h | 1 +
kernel/time/tick-common.c | 1 +
kernel/time/tick-sched.c | 11 ++++++++---
3 files changed, 10 insertions(+), 3 deletions(-)

diff -puN include/linux/tick.h~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu-v4-fix include/linux/tick.h
--- a/include/linux/tick.h~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu-v4-fix
+++ a/include/linux/tick.h
@@ -71,6 +71,7 @@ struct tick_sched {
};

extern void __init tick_init(void);
+extern void __init tick_sched_init(void);
extern int tick_is_oneshot_available(void);
extern struct tick_device *tick_get_device(int cpu);

diff -puN kernel/time/tick-sched.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu-v4-fix kernel/time/tick-sched.c
--- a/kernel/time/tick-sched.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu-v4-fix
+++ a/kernel/time/tick-sched.c
@@ -38,9 +38,6 @@ static ktime_t last_jiffies_update;

struct tick_sched *tick_get_tick_sched(int cpu)
{
- /*FIXME: Arjan van de Ven:
- can we do this bit once, when the ts structure gets initialized?*/
- per_cpu(tick_cpu_sched, cpu).cpu = cpu;
return &per_cpu(tick_cpu_sched, cpu);
}

@@ -880,3 +877,11 @@ int tick_check_oneshot_change(int allow_
tick_nohz_switch_to_nohz();
return 0;
}
+
+void __init tick_sched_init(void)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu)
+ per_cpu(tick_cpu_sched, cpu).cpu = cpu;
+}
diff -puN kernel/time/tick-common.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu-v4-fix kernel/time/tick-common.c
--- a/kernel/time/tick-common.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu-v4-fix
+++ a/kernel/time/tick-common.c
@@ -413,4 +413,5 @@ static struct notifier_block tick_notifi
void __init tick_init(void)
{
clockevents_register_notifier(&tick_notifier);
+ tick_sched_init();
}
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrew Morton on
On Wed, 16 Jun 2010 23:59:07 -0700 Andrew Morton <akpm(a)linux-foundation.org> wrote:

> So how does this look? If "OK" then would you be able to test it please?

I saw it first!


fix !CONFIG_TICK_ONESHOT

--- a/include/linux/tick.h~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu-v4-fix-fix
+++ a/include/linux/tick.h
@@ -71,7 +71,6 @@ struct tick_sched {
};

extern void __init tick_init(void);
-extern void __init tick_sched_init(void);
extern int tick_is_oneshot_available(void);
extern struct tick_device *tick_get_device(int cpu);

@@ -93,6 +92,9 @@ extern struct cpumask *tick_get_broadcas

# ifdef CONFIG_TICK_ONESHOT
extern struct cpumask *tick_get_broadcast_oneshot_mask(void);
+extern void __init tick_sched_init(void);
+# else
+static inline void tick_sched_init(void) { }
# endif

# endif /* BROADCAST */
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Sergey Senozhatsky on
On (06/16/10 23:59), Andrew Morton wrote:
> [..] if someone runs
> update_ts_time_stats() before the initcalls then conceivably the
> `swapper' process's accounting will go a little bit wrong, but I doubt
> it.
>

That was the sing that scared me - update_ts_time_stats call before init.
Having ".cpu = cpu" in tick_get_tick_sched guarantees correct .cpu and...
and it sucks.

> Still, it'd be better to do it earlier, I guess. tick_init() is called
> super-early and that would be a good place. tick_init() is presently a
> no-op if !CONFIG_GENERIC_CLOCKEVENTS, but all this code depends on
> CONFIG_GENERIC_CLOCKEVENTS anwyay.
>
> So how does this look? If "OK" then would you be able to test it please?
>
>
I'll test it in 2 hours. Thanks.


> [ Sigh. The field tick_sched.cpu shouldn't even exist on
> uniprocessor builds. Ifdeffing it away is trivial and a bit messy,
> but it's still only a partial solution. Passing the `cpu' argument
> to nr_iowait_cpu() will generate additional code, and it's unneeded
> on uniprocessor builds.]
>
>
You're right.


Sergey
From: Peter Zijlstra on
On Thu, 2010-06-17 at 09:29 +0300, Sergey Senozhatsky wrote:
> Fix
>
> BUG: using smp_processor_id() in preemptible [00000000] code: s2disk/3392

> The initial fix was to use get_cpu/put_cpu in nr_iowait_cpu. However,
> Arjan stated that "the bug is that it needs to be nr_iowait_cpu(int cpu)".
>
> This patch introduces nr_iowait_cpu(int cpu) and changes to its callers.
>
> Arjan also pointed out that we can't use get_cpu/put_cpu in update_ts_time_stats
> since we "pick the current cpu, rather than the one denoted by ts" in that case.
> To match given *ts and cpu denoted by *ts we use new field in the struct tick_sched: int cpu.


> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index b232ccc..db14691 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -51,6 +51,7 @@ struct tick_sched {
> unsigned long check_clocks;
> enum tick_nohz_mode nohz_mode;
> ktime_t idle_tick;
> + int cpu;
> int inidle;
> int tick_stopped;
> unsigned long idle_jiffies;

> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 1d7b9bc..1907037 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -38,6 +38,9 @@ static ktime_t last_jiffies_update;
>
> struct tick_sched *tick_get_tick_sched(int cpu)
> {
> + /*FIXME: Arjan van de Ven:
> + can we do this bit once, when the ts structure gets initialized?*/
> + per_cpu(tick_cpu_sched, cpu).cpu = cpu;
> return &per_cpu(tick_cpu_sched, cpu);
> }

> @@ -161,7 +164,7 @@ update_ts_time_stats(struct tick_sched *ts, ktime_t now, u64 *last_update_time)
> if (ts->idle_active) {
> delta = ktime_sub(now, ts->idle_entrytime);
> ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
> - if (nr_iowait_cpu() > 0)
> + if (nr_iowait_cpu(ts->cpu) > 0)
> ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
> ts->idle_entrytime = now;
> }


This all seems extremely silly, why not something like:

---
kernel/time/tick-sched.c | 16 ++++++++--------
1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 5f171f0..1363d3a 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -154,14 +154,14 @@ static void tick_nohz_update_jiffies(ktime_t now)
* Updates the per cpu time idle statistics counters
*/
static void
-update_ts_time_stats(struct tick_sched *ts, ktime_t now, u64 *last_update_time)
+update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_update_time)
{
ktime_t delta;

if (ts->idle_active) {
delta = ktime_sub(now, ts->idle_entrytime);
ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
- if (nr_iowait_cpu() > 0)
+ if (nr_iowait_cpu(cpu) > 0)
ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
ts->idle_entrytime = now;
}
@@ -175,19 +175,19 @@ static void tick_nohz_stop_idle(int cpu, ktime_t now)
{
struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);

- update_ts_time_stats(ts, now, NULL);
+ update_ts_time_stats(cpu, ts, now, NULL);
ts->idle_active = 0;

sched_clock_idle_wakeup_event(0);
}

-static ktime_t tick_nohz_start_idle(struct tick_sched *ts)
+static ktime_t tick_nohz_start_idle(int cpu, struct tick_sched *ts)
{
ktime_t now;

now = ktime_get();

- update_ts_time_stats(ts, now, NULL);
+ update_ts_time_stats(cpu, ts, now, NULL);

ts->idle_entrytime = now;
ts->idle_active = 1;
@@ -216,7 +216,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
if (!tick_nohz_enabled)
return -1;

- update_ts_time_stats(ts, ktime_get(), last_update_time);
+ update_ts_time_stats(cpu, ts, ktime_get(), last_update_time);

return ktime_to_us(ts->idle_sleeptime);
}
@@ -242,7 +242,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
if (!tick_nohz_enabled)
return -1;

- update_ts_time_stats(ts, ktime_get(), last_update_time);
+ update_ts_time_stats(cpu, ts, ktime_get(), last_update_time);

return ktime_to_us(ts->iowait_sleeptime);
}
@@ -284,7 +284,7 @@ void tick_nohz_stop_sched_tick(int inidle)
*/
ts->inidle = 1;

- now = tick_nohz_start_idle(ts);
+ now = tick_nohz_start_idle(cpu, ts);

/*
* If this cpu is offline and it is the one which updates

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/