From: Peter Zijlstra on
On Thu, 2010-07-08 at 21:04 +0900, Norbert Preining wrote:

> Just one more point, searching a bit more in the net I found the following
> patch (forgot who wrote it) which I merged into my current git:

> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index a878b53..f26efba 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -3248,6 +3248,9 @@ int select_nohz_load_balancer(int stop_tick)
> if (stop_tick) {
> cpu_rq(cpu)->in_nohz_recently = 1;
>
> + if (!mc_capable())
> + return 0;
> +
> if (!cpu_active(cpu)) {
> if (atomic_read(&nohz.load_balancer) != cpu)
> return 0;
> @@ -3297,6 +3300,9 @@ int select_nohz_load_balancer(int stop_tick)
> if (!cpumask_test_cpu(cpu, nohz.cpu_mask))
> return 0;
>
> + if (!mc_capable())
> + return 0;
> +
> cpumask_clear_cpu(cpu, nohz.cpu_mask);
>
> if (atomic_read(&nohz.load_balancer) == cpu)

Right, so that is a buggy patch, see the original discussion:
http://lkml.org/lkml/2010/4/26/249

> which looks better

The thing is, we didn't change that code recently, the patches that are
supposed to cure the nohz balancer are still pending (in -tip and
-next).

That said, we did frob something with the whole nohz thing, does the
below cure anything:

---
kernel/time/tick-sched.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 813993b..9bc8029 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -325,7 +325,7 @@ void tick_nohz_stop_sched_tick(int inidle)
} while (read_seqretry(&xtime_lock, seq));

if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu) ||
- arch_needs_cpu(cpu) || nohz_ratelimit(cpu)) {
+ arch_needs_cpu(cpu) /* || nohz_ratelimit(cpu) */) {
next_jiffies = last_jiffies + 1;
delta_jiffies = 1;
} else {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Norbert Preining on
On Do, 08 Jul 2010, Peter Zijlstra wrote:
> Right, so that is a buggy patch, see the original discussion:
> http://lkml.org/lkml/2010/4/26/249

Well, to me it wasn't so clear that this was buggy *for*my*system*
(core2)

> That said, we did frob something with the whole nohz thing, does the
> below cure anything:

Looks promising, reverting the old patch, adding that one, building,
running, unplugging ppower, powertop runs now since some time,
it seems that we are back to better situation:
Cn Avg residency P-states (frequencies)
C0 (cpu running) ( 1.5%) Turbo Mode 0.0%
C0 0.0ms ( 0.0%) 2.54 Ghz 0.0%
C1 mwait 0.0ms ( 0.0%) 1.60 Ghz 0.0%
C2 mwait 0.3ms ( 0.9%) 800 Mhz 100.0%
C6 mwait 8.5ms (97.6%)

Wakeups-from-idle per second : 139.9 interval: 15.0s
Power usage (ACPI estimate): 10.0W (8.8 hours) (long term: 1.7W,/50.9h)

Top causes for wakeups:
32.2% ( 46.1) [kernel scheduler] Load balancing tick
20.4% ( 29.1) [iwlagn] <interrupt>
12.6% ( 18.0) [extra timer interrupt]
6.5% ( 9.3) [ahci] <interrupt>
3.7% ( 5.3) [kernel core] hrtimer_start (tick_sched_timer)
3.5% ( 5.0) syndaemon
3.4% ( 4.8) [acpi] <interrupt>
2.3% ( 3.3) yarssr

Power is going down to below 10W with brightness dimmed.

Thanks.

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
I'm going to have a look.'
He glanced round at the others.
`Is no one going to say, "No you can't possibly, let me go
instead"?'
They all shook their heads.
`Oh well.'
--- Ford attempting to be heroic whilst being seiged by
--- Shooty and Bangbang.
--- Douglas Adams, The Hitchhikers Guide to the Galaxy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Thu, 2010-07-08 at 21:46 +0900, Norbert Preining wrote:
> Looks promising, reverting the old patch, adding that one, building,
> running, unplugging ppower, powertop runs now since some time,
> it seems that we are back to better situation:

Hrmm, Mike seems you wrecked power usage..

So nohz_ratelimit() prevents us from entering NOHZ when the last attempt
was less than 1/2 a jiffy ago (fwiw: NSEC_PER_SEC/HZ == TICK_NSEC).

Its either entering idle or irq_exit trying to enter nohz state, if we
keep skipping it it means that we get enough interrupt activity to
render nohz useless anyway.. so not quite sure how this wrecks things..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arjan van de Ven on

> and for that matter, what is "[extra timer interrupt]", surely the
> timer hardware doesn't generate spurious interrupts?

extra timer interrupt are those cases where we see the hardware
interrupt fire, but we do not have software timers to account for them.
two cases this can happen
* a NO_HZ bug
* we are idle longer than the longest interval we can program the hw
timer for. Without HPET this can happen.



--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Thu, 2010-07-08 at 15:23 +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-08 at 21:46 +0900, Norbert Preining wrote:
> > Looks promising, reverting the old patch, adding that one, building,
> > running, unplugging ppower, powertop runs now since some time,
> > it seems that we are back to better situation:
>
> Hrmm, Mike seems you wrecked power usage..
>
> So nohz_ratelimit() prevents us from entering NOHZ when the last attempt
> was less than 1/2 a jiffy ago (fwiw: NSEC_PER_SEC/HZ == TICK_NSEC).
>
> Its either entering idle or irq_exit trying to enter nohz state, if we
> keep skipping it it means that we get enough interrupt activity to
> render nohz useless anyway.. so not quite sure how this wrecks things..

OK, so Arjan said the gain could come from tricking the idle governor
into not using deeper C states. He also said he significantly cured said
governor in .35.

Mike could you re-run your netperf tests that showed the 10% throughput
gain? Hopefully the fixed governor will yield the same result and we can
kill off this ratelimit thing.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/