From: Venkatesh Pallipadi on
On Wed, Jul 21, 2010 at 2:31 PM, Len Brown <lenb(a)kernel.org> wrote:
> From: Len Brown <len.brown(a)intel.com>
>
> The BIOS exports deep C-states on modern Intel processors
> as "C3-type" to satisfy various legacy Operating Systems.
>
> However, the hardware actually supports C2-type, and does
> not require the extra costs of C3-type.
>
> One of the costs is to check the BM_STS (Bus Master Status)
> bit before entering C3, and instead choose a shallower C-state
> if there was "recent bus master activity".
>
> We have found a number of systems in the field that erroneously
> set BM_STS and prevent entry into deep C-states.
> Re-define BIOS presented C3-type states as C2-type states
> on modern processors to avoid this issue.
>
> If a device in the system really does want to prevent use
> of a deep C-state, its Linux driver should register its
> constraints via pm_qos_add_request().
>
> https://bugzilla.kernel.org/show_bug.cgi?id=15886
>

Agree with the intent. But, I think its cleaner to keep all arch model
checks in arch/x86/kernel/acpi/cstate.c.

Thanks,
Venki

> Signed-off-by: Len Brown <len.brown(a)intel.com>
> ---
> �drivers/acpi/processor_idle.c | � 38 ++++++++++++++++++++++++++++++++++++++
> �1 files changed, 38 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index b1b3856..14d1a0c 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -607,6 +607,38 @@ static void acpi_processor_power_verify_c3(struct acpi_processor *pr,
> � � � �return;
> �}
>
> +/*
> + * Modern Intel processors support only ACPI C2-type C-states.
> + * But the BIOS tends to report its deepest C-state as C3-type
> + * to satisfy various old operating systems. �We can skip
> + * C3 OS overhead by treating the deep-states as C2-type.
> + * Also, we can avoid checking BM_STS, which on some systems
> + * erroneously prevents entry into C3-type states.
> + */
> +static int acpi_c3type_is_really_c2type(void) {
> +
> + � � � if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> + � � � � � � � return 0;
> +
> + � � � if (boot_cpu_data.x86 != 6)
> + � � � � � � � return 0;
> +
> + � � � switch(boot_cpu_data.x86_model) {
> + � � � case 0x1A: � � �/* Core i7, Xeon 5500 series */
> + � � � case 0x1E: � � �/* Core i7 and i5 Processor */
> + � � � case 0x1F: � � �/* Core i7 and i5 Processor */
> + � � � case 0x2E: � � �/* NHM-EX Xeon */
> + � � � case 0x2F: � � �/* WSM-EX Xeon */
> + � � � case 0x25: � � �/* WSM */
> + � � � case 0x2C: � � �/* WSM */
> + � � � case 0x2A: � � �/* SNB */
> + � � � case 0x2D: � � �/* SNB Xeon */
> + � � � � � � � return 1;
> + � � � default:
> + � � � � � � � return 0;
> + � � � }
> +}
> +
> �static int acpi_processor_power_verify(struct acpi_processor *pr)
> �{
> � � � �unsigned int i;
> @@ -617,6 +649,12 @@ static int acpi_processor_power_verify(struct acpi_processor *pr)
> � � � �for (i = 1; i < ACPI_PROCESSOR_MAX_POWER && i <= max_cstate; i++) {
> � � � � � � � �struct acpi_processor_cx *cx = &pr->power.states[i];
>
> + � � � � � � � if ((cx->type == ACPI_STATE_C3)
> + � � � � � � � � � � � && acpi_c3type_is_really_c2type()) {
> + � � � � � � � � � � � � � � � ACPI_DEBUG_PRINT((ACPI_DB_INFO, "Redefining C3-type to C2\n"));
> + � � � � � � � � � � � � � � � cx->type = ACPI_STATE_C2;
> + � � � � � � � }
> +
> � � � � � � � �switch (cx->type) {
> � � � � � � � �case ACPI_STATE_C1:
> � � � � � � � � � � � �cx->valid = 1;
> --
> 1.7.2.rc3.43.g24e7a
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on
> If a device in the system really does want to prevent use
> of a deep C-state, its Linux driver should register its
> constraints via pm_qos_add_request().

I reviewed the patch and it looks good to me.

I would suggest to have a command line option for this too,
in case someone wants to run an older kernel on a new system not
known by your patch yet.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Len Brown on
This patch is kaput.

As detailed in the bug report
https://bugzilla.kernel.org/show_bug.cgi?id=15886
we should be able to fix some of these boxes
by paying attention to an ACPI flag we didn't
realize existed until yesterday.

I'll follow-up with a new patch today.

However, we'll still have issues with systems
like the HP DL360 G6 which explicity set the
flag to ask for BM_STS checking and configure
the chipset such that BM_STS is active.
That may require a BIOS fix, or we may
have to run intel_idle on that box --
since intel_idle ignores BM_STS always
and instead relies on drivers to use pm_qos
to register device latency constraints.

thanks,
Len Brown, Intel Open Source Technology Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Iain on
Len Brown wrote:
> However, we'll still have issues with systems
> like the HP DL360 G6 which explicity set the
> flag to ask for BM_STS checking and configure
> the chipset such that BM_STS is active.
> That may require a BIOS fix, or we may
> have to run intel_idle on that box --
> since intel_idle ignores BM_STS always
> and instead relies on drivers to use pm_qos
> to register device latency constraints.

I'm curious as to why you see a problem with the DL380G6 as the one I have here happily sits in C6 when idle.

your turbostat util shows:

CPU GHz TSC %c0 %c1 %c3 %c6 %pc3 %pc6
avg 1.64 2.27 0.16 0.12 0.00 99.71 0.00 90.15

and powertop has results like:

Cn Avg residency P-states (frequencies)
C0 (cpu running) ( 0,1%) Turbo Mode 0,0%
polling 0,0ms ( 0,0%) 2,27 Ghz 0,0%
C1 mwait 0,1ms ( 0,0%) 2,13 Ghz 0,0%
C2 mwait 1,0ms ( 0,0%) 2,00 Ghz 0,0%
C3 mwait 90,4ms (99,9%) 1,60 Ghz 100,0%

this is with v2.6.35-rc5-176-gcd5b8f8 and using acpi_idle. I've deliberately disabled intel_idle to test, however using intel_idle
gives almost identical results.

Looking at the bug 15886, the Access Size 0x03 entries you mentioned are all 0x01 on this system. I've also uploaded the acpidump
from this DL380G6 to that bug so that you can check I've not just looked in the wrong place.

Did the first acpidump come from a system with the 'HP Power Regulator' setting in the bios set to OS Control mode ? My system is
set this way and it seems to work as expected.
The other settings for this option appear to be designed to override OS power management controls, for example the description of
the 'Static High Performance' option suggests it'll somehow force the CPU to operate in the highest performance mode all of the
time: "HP Static High Performance Mode: Processors will run in their maximum power/performance state at all times regardless of the
OS power management policy".

If this does turn out to be as simple as a bios setting, should we really be trying to workaround what may be a legitimate decision
by the servers admin ?

Iain
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Iain on
Iain wrote:
> Len Brown wrote:
>> However, we'll still have issues with systems
>> like the HP DL360 G6 which explicity set the
>
> I'm curious as to why you see a problem with the DL380G6 as the one I have here happily sits in C6 when idle.

Please ignore me, apologies for the noise.

Just noticed the problem system was a DL360 and mine is a DL380. Long day spent working with both 360's and 380's - I don't seem to
be able to tell them apart anymore..

Iain
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/