From: Peter Zijlstra on
On Thu, 2010-07-01 at 16:12 -0700, Venkatesh Pallipadi wrote:
> commit 871e35b moved update_group_power() call in update_sg_lb_stats(),
> resulting in it being called for each group, even though it only updates
> the power of local group. As a result we have frequent redundant
> update_group_power() calls.
>
> Move it back under "if (local_group)" condition.
>
> This reduces the number of calls to update_group_power by a factor of 4
> on my 4 core in 4 NUMA nodes test system.

Hrm,.. so Gautham removed that because for things like the NO_HZ
balancer the initial balance_cpu == this_cpu constraint doesn't hold.

Not I don't think the local_group constraint holds for that either, so
the below would again break that..

Should we perhaps have a conditional on this_rq->nohz_balance_kick or
so?

> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -2359,8 +2359,11 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
> unsigned int balance_cpu = -1, first_idle_cpu = 0;
> unsigned long avg_load_per_task = 0;
>
> - if (local_group)
> + if (local_group) {
> balance_cpu = group_first_cpu(group);
> + update_group_power(sd, this_cpu);
> + }
> +
>
> /* Tally up the load of all CPUs in the group */
> max_cpu_load = 0;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Venkatesh Pallipadi on
On Fri, Jul 2, 2010 at 1:05 AM, Peter Zijlstra <peterz(a)infradead.org> wrote:
> On Thu, 2010-07-01 at 16:12 -0700, Venkatesh Pallipadi wrote:
>> commit 871e35b moved update_group_power() call in update_sg_lb_stats(),
>> resulting in it being called for each group, even though it only updates
>> the power of local group. As a result we have frequent redundant
>> update_group_power() calls.
>>
>> Move it back under "if (local_group)" condition.
>>
>> This reduces the number of calls to update_group_power by a factor of 4
>> on my 4 core in 4 NUMA nodes test system.
>
> Hrm,.. so Gautham removed that because for things like the NO_HZ
> balancer the initial balance_cpu == this_cpu constraint doesn't hold.
>
> Not I don't think the local_group constraint holds for that either, so
> the below would again break that..
>
> Should we perhaps have a conditional on this_rq->nohz_balance_kick or
> so?

The thing is that update_group_power is only updating the power of
local group (sd->groups).
It is getting called multiple times however for each group as
update_sd_lb_stats loops
through groups->next calling update_sg_lb_stats.
If we really want to update the power of non-local groups,
update_cpu_power has to change
to take a groups parameter and non this_cpu as arguments and may have
to access non-local
rq etc.

Thanks,
Venki
>
>> --- a/kernel/sched_fair.c
>> +++ b/kernel/sched_fair.c
>> @@ -2359,8 +2359,11 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
>> � � � unsigned int balance_cpu = -1, first_idle_cpu = 0;
>> � � � unsigned long avg_load_per_task = 0;
>>
>> - � � if (local_group)
>> + � � if (local_group) {
>> � � � � � � � balance_cpu = group_first_cpu(group);
>> + � � � � � � update_group_power(sd, this_cpu);
>> + � � }
>> +
>>
>> � � � /* Tally up the load of all CPUs in the group */
>> � � � max_cpu_load = 0;
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Fri, 2010-07-02 at 09:20 -0700, Venkatesh Pallipadi wrote:
> > Hrm,.. so Gautham removed that because for things like the NO_HZ
> > balancer the initial balance_cpu == this_cpu constraint doesn't hold.
> >
> > Not I don't think the local_group constraint holds for that either, so
> > the below would again break that..
> >
> > Should we perhaps have a conditional on this_rq->nohz_balance_kick or
> > so?
>
> The thing is that update_group_power is only updating the power of
> local group (sd->groups).

Not quite, see nohz_idle_balance(), that iterates idle_cpus_mask, and
calls rebalance_domains(balance_cpu, CPU_IDLE), which then does
for_each_domain(balance_cpu, sd)

So sd need not be local at all, and sd->group will be the group of which
balance_cpu is part.

> It is getting called multiple times however for each group as
> update_sd_lb_stats loops
> through groups->next calling update_sg_lb_stats.

Sure I see how that's happening and why you would want to avoid that, no
argument there.

> If we really want to update the power of non-local groups,
> update_cpu_power has to change
> to take a groups parameter and non this_cpu as arguments and may have
> to access non-local
> rq etc.

No, see above. All we need is to somehow allow nohz_idle_balance() to
update cpu_power as well.

So I think we want something like:

if (local_group) {
balance_cpu = group_first_cpu(group);
if (balance_cpu == this_cpu || nohz_balance)
update_group_power(sd, this_cpu);
}

Or am I totally missing something here?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Venkatesh Pallipadi on
My fault. I completely missed that code path, assuming this_cpu in
load_balance means this_cpu :(

How about the change below instead?

Signed-off-by: Venkatesh Pallipadi <venki(a)google.com>
---
kernel/sched_fair.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index a878b53..97f4534 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2406,8 +2406,6 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
return;
}

- update_group_power(sd, this_cpu);
-
/* Adjust by relative CPU power of the group */
sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;

@@ -2456,6 +2454,8 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu,
init_sd_power_savings_stats(sd, sds, idle);
load_idx = get_sd_load_idx(sd, idle);

+ update_group_power(sd, this_cpu);
+
do {
int local_group;

--
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Fri, 2010-07-02 at 09:56 -0700, Venkatesh Pallipadi wrote:
> How about the change below instead?
>
I think I've now managed to confuse myself too.. will try and reset my
brain and have a second look in a bit.. ;-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/