From: Len Brown on
Please look over and test this patch set.
(If you test linux-next, you already have it)

There are a few simple patches, leading up to a new intel_idle driver.

Note that you can get the patch series as a single patch here:
http://ftp.kernel.org/pub/linux/kernel/people/lenb/idle/patches/2.6.34/idle-test-2.6.34.diff.gz

or pull from this git branch
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6.git idle-test

Both are vs 2.6.34.

Why is it good to have a native intel_idle driver?

Basically, we think we can do better than ACPI.
Indeed, on my (production level commerically available) Nehalem desktop
the ACPI tables are broken and an ACPI OS idles at 100W. With this
driver the box idles at 85W.

Thanks,
-Len

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Renninger on
On Thursday 27 May 2010 04:42:23 Len Brown wrote:
> Please look over and test this patch set.
> (If you test linux-next, you already have it)
>
> There are a few simple patches, leading up to a new intel_idle driver.
>
> Note that you can get the patch series as a single patch here:
> http://ftp.kernel.org/pub/linux/kernel/people/lenb/idle/patches/2.6.34/idle-test-2.6.34.diff.gz
>
> or pull from this git branch
> git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6.git idle-test
>
> Both are vs 2.6.34.
>
> Why is it good to have a native intel_idle driver?
>
> Basically, we think we can do better than ACPI.
Why exactly? Is there any info missing in the ACPI tables?
Or is this just to be more independent from OEMs?

> Indeed, on my (production level commerically available) Nehalem desktop
> the ACPI tables are broken and an ACPI OS idles at 100W. With this
> driver the box idles at 85W.
What exactly was broken there?

IMO this is a step backward.
CPUfreq runs rather well on nearly every machine supporting it without
tons of static frequency tables in kernel. Even powernow-k8 might get merged
into acpi-cpufreq.

Intel set up a huge ACPI API for this and now it's not used anymore?!?
Will these parts get obsoleted in a future spec?
While for C-states there are not that many static entries needed, another
drawback could be that OEMs will disable/hide C-states on purpose.

Using ACPI table based C-states by default and using intel_idle.enable=1
or similar for workarounds sounds safer.
At least as long as the driver is experimental.

Does Windows use ACPI C-state info for idle?

Thanks,

Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Len Brown on
> > ... we think we can do better than ACPI.

> Why exactly? Is there any info missing in the ACPI tables?
> Or is this just to be more independent from OEMs?

ACPI has a few fundmental flaws here. One is that it reports
exit latency instead of break-even power duration.
The other is that it requires a BIOS writer to
get the tables right.

Both of these are fatal flaws.

There are also more subtle problems, like bogus ACPI implementations
mapping LAPIC breaking C-states to ACPI-C2, causing Linux to need
to assume the LAPIC is always broken in in C2 -- which is erroneous.

I'll be speaking on this topic at length at Linuxcon this summer.

> > Indeed, on my (production level commerically available) Nehalem desktop
> > the ACPI tables are broken and an ACPI OS idles at 100W. With this
> > driver the box idles at 85W.

> What exactly was broken there?

Dell's BIOS developer botched a bug fix immediately before the system
went to market and disabled support for all ACPI C-states except C1.
After several month of shipping systems, they still were unable
to ship them with a fixed BIOS.

Of course, besides a 15% idle power hit,the other effect of that BIOS issue
was to disable all Turbo frequencies -- which is a somewhat important
feature on a Core-i7 desktop...

> IMO this is a step backward.

I don't dispute your right to have an opinion:-)

> CPUfreq runs rather well on nearly every machine supporting it without
> tons of static frequency tables in kernel. Even powernow-k8 might get merged
> into acpi-cpufreq.

There are a couple of important differences between cpufreq and idle
state enumeration. p-states are per-bin within each model.
Idle states not only span bins within a model, they span multiple
models which span multiple years. Note also the idle tables are
validated at run-time by CPUID.MWAIT, which means the same
table can be used for multiple parts -- the parts themselves
know which states they have -- and they can tell us.

So I don't expect a proliferation of idle tables in intel_idle.

I do expect to tune some of the latencies based on some of
the information that Intel instructs BIOS writers to convey,
but they fail to convey. In particular, the actual latencies
and power break-even points of the same model in different
configurations are actually different. I've not seen a single
BIOS get that part rigiht.

I expect a new table to cover sandy bridge plus the generation after it.

> Intel set up a huge ACPI API for this and now it's not used anymore?!?
> Will these parts get obsoleted in a future spec?

Both p-states and c-states will be moving to a more native enumeration
method - but there will still be BIOS ACPI support wrapping that
enumeration as long as somebody wants to run a legacy ACPI OS that
knows nothing else.

> While for C-states there are not that many static entries needed, another
> drawback could be that OEMs will disable/hide C-states on purpose.

Yes, there is a real possibility that a system has a device in it
that malfunctions when a deep C-state is used. On Linux, we
invented PM_QOS to address exactly this problem.

The number of devices requiring PM_QOS users is still quite small.

> Using ACPI table based C-states by default and using intel_idle.enable=1
> or similar for workarounds sounds safer.
> At least as long as the driver is experimental.

I plan to remove the EXPERIMENTAL in 1 release.

> Does Windows use ACPI C-state info for idle?

Yes, Windows uses ACPI.
On the Dell above, that is why Linux consumes 15% less idle power
and why Linux can take advantage of turbo mode and Windows can not.

cheers,
Len Brown, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Renninger on
On Friday 28 May 2010 02:59:07 Len Brown wrote:
> > > ... we think we can do better than ACPI.
>
> > Why exactly? Is there any info missing in the ACPI tables?
> > Or is this just to be more independent from OEMs?
>
> ACPI has a few fundmental flaws here. One is that it reports
> exit latency instead of break-even power duration.
> The other is that it requires a BIOS writer to
> get the tables right.
This is a general ACPI problem...

> > Using ACPI table based C-states by default and using
> > intel_idle.enable=1
> > or similar for workarounds sounds safer.
> > At least as long as the driver is experimental.
>
> I plan to remove the EXPERIMENTAL in 1 release.
>
> > Does Windows use ACPI C-state info for idle?

> Yes, Windows uses ACPI.
> On the Dell above, that is why Linux consumes 15% less idle power
> and why Linux can take advantage of turbo mode and Windows can not.
You always propageted to stay Windows compatible...
Now we go the untested way.
Let's see how much machines will break...

Thanks for clarifications,

Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Len Brown on
> > Yes, Windows uses ACPI.
> > On the Dell above, that is why Linux consumes 15% less idle power
> > and why Linux can take advantage of turbo mode and Windows can not.

> You always propageted to stay Windows compatible...
> Now we go the untested way.
> Let's see how much machines will break...
>
> Thanks for clarifications,

Don't get me wrong, Linux' ACPI "compatibility" is a huge asset.

However, there are opportunities for Linux to do better than Windows,
and I believe this is one of them. Yes, when we take the risk of
doing something that Windows does not do, there will always be
the possibility that we will fail.

cheers,
-Len Brown, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/