cpuidle: extend cpuidle and menu governor to handle dynamic states [Kernel]

Prev: linux-next: Tree for July 15
Next: [tip:perf/core] perf tools: Add DWARF register lookup for SH

From: Arjan van de Ven on 16 Jul 2010 00:10

On 7/15/2010 1:30 PM, Ai Li wrote:

I'm ok with the general idea, but have a few comments about the
implementation
> Signed-off-by: Ai Li<aili(a)codeaurora.org>
> ---
> drivers/cpuidle/governors/menu.c | 59 +++++++++++++++++++++++++++++--------
> include/linux/cpuidle.h | 4 ++
> 2 files changed, 50 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 1b12870..b3854cc 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -271,6 +271,9 @@ static int menu_select(struct cpuidle_device *dev)
>
> detect_repeating_patterns(data);
>
> + if (dev->prepare)
> + dev->prepare(dev, data->predicted_us);
>

I don't like the idea of passing predicted_us here.
the states and their updates should be independent of how long we think
we'll be idle;
it's up to the menu governor to then pick a good one, not for the
platform to muck with things
based on this.

Also I would like the cpuidle code, not the governor, to call this
prepare function.
The need to call ->prepare is governor independent....

+ if (dev->compare_power) {

I'm not a big fan of this as a flag; either we always do this, which I
can understand, or we sort things, which is also fine with me.
Doing this condition like this.... not a fan.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Arjan van de Ven on 16 Jul 2010 13:30

On 7/16/2010 10:25 AM, Ai Li wrote:
>>> + if (dev->prepare)
>>> + dev->prepare(dev, data->predicted_us);
>>>
>>>
>> I don't like the idea of passing predicted_us here.
>> the states and their updates should be independent of how long we
>> think we'll be idle;
>>
> The power_usage value, total or average, would depend on how long the
> predicted idle period is. On our SoCs, a cpuidle state has three
> stages: entry stage, low power stage, and exit stage. Entry and exit
> stages consume more power than the low power stage but have fixed
> durations, irrespective how long the idle period is. As the
> predicted idle period changes, the entry and exit duration stay the
> same but the low power duration changes, resulting in different total
> or average power for the idle period.
>

the power value in the structure should represent ONLY the power level
during the low power stage.
And this should be independent of total duration.

all other power is taken into account in terms of break even point/etc...

> One of the concerns I have is backwards compatibility. As far as I
> know, none of the current cpuidle drivers use the power_usage field.
> If we always do compare_power, those drivers would break until
> someone with technical device knowledge update the drivers to specify
> power... I could derive fake power_usage numbers by default, using
> the cstate index position. That seems kind of hacky but it would
> remove the need for the compare_power flag and retain the current
> behavior when cpuidle drivers do not provide their own power numbers.
>

I'm fine with this approach actually; if someone does not fill it in, we
fake data that makes it
valid... better than getting complex code.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Arjan van de Ven on 16 Jul 2010 15:40

On 7/16/2010 12:19 PM, Ai Li wrote:
>> the power value in the structure should represent ONLY the power
>> level during the low power stage.
>> And this should be independent of total duration.
>> all other power is taken into account in terms of break even
>> point/etc...
>>
> With static cstates, determining the break even point is
> straitforward, compare the power numbers of state Cn and Cn-1, since
> the states are ordered in increasing order of latency and power.
> With dynamic cstates, Cn-1 may not be a valid state to compare any
> more, for example, because Cn-1's latency may have become too high.
> It seems the driver would need to know which cstate the govenor would
> compare Cn to, and that would break the design philosophy of driver +
> govenor. The break even point does not seem to have a transistive
> property, where the govenor can calculat Cn vs Cn-2 from some
> arithmatic combination of Cn vs Cn-1 and Cn-1 vs Cn-2 values. On the
> other hand, if the power_usage field also includes the entry and exit
> stages, then the driver does not need to know whether it should
> calculate break even point for Cn vs Cn-1, or Cn vs Cn-2, etc.
>

that's nice in theory.
in practice though, this is all noise compared to some of the accuracy
in the predictions.

break even generally is done against C1 only (since C1 is assumed to
always be there)....
yes it'd be nice to also have it against Cx in a matrix form, but that
is a level of complexity that
hasn't been worth it.

Note that the prediction is.... a prediction. I can show you data on how
well it does (now that it's
much better in 2.6.35-rc), but it's still "50% of the time we're within
a factor of two of actual".
not "we're 90% of the time within 10%".

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

|
Pages: 1
Prev: linux-next: Tree for July 15
Next: [tip:perf/core] perf tools: Add DWARF register lookup for SH