From: Matthew Garrett on
On Tue, May 25, 2010 at 01:43:26AM -0400, Len Brown wrote:

> I'm told by the hardware guys that BM_STS is _not_ always
> a NOP, and so we're not supposed to simply ignore it on C3 --
> though it should be extremely rare that we see it set.
> If it is ever set, it should go on and off depending on
> activity on some latency sensitive device, like out on the LPC.
> It may be possible for the BIOS writer to configure the chipset
> so that BM_STS is enabled always, presumably to accomodate
> some latency sensitve device -- or maybe by mistake.

On some hardware we've seen BM_STS be enabled approximately 50% of the
time without any obvious cause.

--
Matthew Garrett | mjg59(a)srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Matthew Garrett on
On Tue, May 25, 2010 at 01:59:39PM +0800, Yu, Luming wrote:

> On some platforms like NHM-EX, I was told that it's a NOP,
> But I might be given wrong information at that time when I wrote that patch.
>
> IIRC, acpi spec just say it's optional..

Implementing it is optional, but the spec implies that it should be used
if it's present.

--
Matthew Garrett | mjg59(a)srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Matthew Garrett on
On the other hand, the relevant section of spec is:

"OSPM uses the BM_STS bit to determine the power state to enter when
considering a transition to or from the C2/C3 power state. The BM_STS is
an optional bit that indicates when bus masters are active. OSPM uses
this bit to determine the policy between the C2 and C3 power states: a
lot of bus master activity demotes the CPU power state to the C2 (or C1
if C2 is not supported), no bus master activity promotes the CPU power
state to the C3 power state. OSPM keeps a running history of the BM_STS
bit to determine CPU power state policy."

while the description of the bit itself is:

"This is the bus master status bit. This bit is set any time a system
bus master requests the system bus, and can only be cleared by writing a
“1” to this bit position. Notice that this bit reflects bus master
activity, not CPU activity (this bit monitors any bus master that can
cause an incoherent cache for a processor in the C3 state when the bus
master performs a memory transaction)."

which implies that as long as you don't have any cache coherency
concerns, it's acceptable (if potentially suboptimal) to enter C3 even
if the bit is set.

--
Matthew Garrett | mjg59(a)srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Len Brown on
On Tue, 25 May 2010, Matthew Garrett wrote:

> On the other hand, the relevant section of spec is:
>
> "OSPM uses the BM_STS bit to determine the power state to enter when
> considering a transition to or from the C2/C3 power state. The BM_STS is
> an optional bit that indicates when bus masters are active. OSPM uses
> this bit to determine the policy between the C2 and C3 power states: a
> lot of bus master activity demotes the CPU power state to the C2 (or C1
> if C2 is not supported), no bus master activity promotes the CPU power
> state to the C3 power state. OSPM keeps a running history of the BM_STS
> bit to determine CPU power state policy."
>
> while the description of the bit itself is:
>
> "This is the bus master status bit. This bit is set any time a system
> bus master requests the system bus, and can only be cleared by writing a
> “1” to this bit position. Notice that this bit reflects bus master
> activity, not CPU activity (this bit monitors any bus master that can
> cause an incoherent cache for a processor in the C3 state when the bus
> master performs a memory transaction)."
>
> which implies that as long as you don't have any cache coherency
> concerns, it's acceptable (if potentially suboptimal) to enter C3 even
> if the bit is set.

As I wrote, the HW people tell me that implication is usually correct,
but there exist cases where it is incorrect. (Of course the way
it is supposed to work is that when BM_STS is not meaningful,
it always returns zero)

The ACPI spec talks about BM_STS being set by traffic that is incoherent
with the frozen cache of C3, requiring a wake up of the processor
from C3 to snoop the traffic. It was written 10 years before the
hardware started automatically snooping the L3 when the processor was off,
and before the hardware learned how to automatically flush the cache
to get into deep C-states. So the description is stale, but the
underlying issue is unchanged. There exist devices which can not
handle the wakeup latency of some deep C-states. The BM_STS bit
is a chip-set bit that the BIOS writer can use to prevent the OS
from using the deep C-states when those devices are active.

I'm told that the cases in question are some legacy devices
hanging off the LPC bus, which should be rare. More interesting
in isochronous traffic over some 1394 controllers -- though
I don't know if Linux runs into that. If we do, one option
would be to ignore BM_STS, but to use pm_qos to disable the
deep c-state when needed -- a mechanism we've used for
several devices in the past.

I believe that the BIOS writer also has the option to keep
BM_STS set always. However, that doesn't make sense to me
as it would be simpler to just disable the C-state in _CST
on that platform.

So if we see a nehalem system that has BM_STS *always* set,
even when no devices are active in the system, my guess is
that the BIOS mis-configured the chip-set and we should
ignore that bit. If BM_STS is changing at run time, then
that is a more interesting situation, and we should endeavor
to find what device activity is changing it.

Len Brown, Intel Open Source Technology Center
From: Len Brown on
> > I'm told by the hardware guys that BM_STS is _not_ always
> > a NOP, and so we're not supposed to simply ignore it on C3 --
> > though it should be extremely rare that we see it set.
> > If it is ever set, it should go on and off depending on
> > activity on some latency sensitive device, like out on the LPC.
> > It may be possible for the BIOS writer to configure the chipset
> > so that BM_STS is enabled always, presumably to accomodate
> > some latency sensitve device -- or maybe by mistake.
>
> On some hardware we've seen BM_STS be enabled approximately 50% of the
> time without any obvious cause.

Assuming it is modern hardware, please get the acpidump and lspci -vv
output from that harware to this bug report:

https://bugzilla.kernel.org/show_bug.cgi?id=15886

thanks,
-Len Brown, Intel Open Source Technology Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/