From: Matthew Garrett on
On Tue, Jun 22, 2010 at 10:17:11AM -0700, Luis R. Rodriguez wrote:
> On Tue, Jun 22, 2010 at 9:52 AM, Matthew Garrett <mjg59(a)srcf.ucam.org> wrote:
> > Why would you only want to enable ASPM for one device?
>
> ASPM doesn't always work for all devices even if they do advertise
> ASPM capability so turning it on selectively by device is what I
> recommend since otherwise you may get hangs and you will then have to
> do the selective enabling.

Right, which we have to deal with by having drivers disable ASPM on
broken devices.

> Furthermore laptops tend to disable ASPM for cards not built-in to it,
> an example is Cardbus slots or internal PCI-E slots. This is often
> done because to enable ASPM for some cards you often need to tune the
> host controller in addition to enabling ASPM for the endpoint, so this
> will vary depending on vendor, chipset, and host controller
> combination. This is documentation that the OEM / ODM typically end up
> getting, but not end users.

Having looked into this, Windows will enable ASPM on external
controllers unless there's some reason for it not to - where that may be
either the appropriate bit in the FADT being set, the device not being
PCIe 1.1 or later, there being no _OSC method on the appropriate root
bridge or the _OSC method not giving it full control over PCIe, the
driver disabling ASPM or the device not advertising it in the first
place. Are you aware of any other cases where Windows will refuse to
enable ASPM?

--
Matthew Garrett | mjg59(a)srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Luis R. Rodriguez on
On Tue, Jun 22, 2010 at 10:25 AM, Matthew Garrett <mjg59(a)srcf.ucam.org> wrote:
> On Tue, Jun 22, 2010 at 10:17:11AM -0700, Luis R. Rodriguez wrote:
>> On Tue, Jun 22, 2010 at 9:52 AM, Matthew Garrett <mjg59(a)srcf.ucam.org> wrote:
>> > Why would you only want to enable ASPM for one device?
>>
>> ASPM doesn't always work for all devices even if they do advertise
>> ASPM capability so turning it on selectively by device is what I
>> recommend since otherwise you may get hangs and you will then have to
>> do the selective enabling.
>
> Right, which we have to deal with by having drivers disable ASPM on
> broken devices.

Agreed, but then the assumption would be drivers are ASPM bug free
which is expect to be false with Video and 802.11 given that only a
handful of vendors do actually get involved with their drivers
upstream. Safe thing of course is to just disable it, of course, but
if you are going to use pcie_aspm=force good luck!

>> Furthermore laptops tend to disable ASPM for cards not built-in to it,
>> an example is Cardbus slots or internal PCI-E slots. This is often
>> done because to enable ASPM for some cards you often need to tune the
>> host controller in addition to enabling ASPM for the endpoint, so this
>> will vary depending on vendor, chipset, and host controller
>> combination. This is documentation that the OEM / ODM typically end up
>> getting, but not end users.
>
> Having looked into this, Windows will enable ASPM on external
> controllers unless there's some reason for it not to - where that may be
> either the appropriate bit in the FADT being set, the device not being
> PCIe 1.1 or later, there being no _OSC method on the appropriate root
> bridge or the _OSC method not giving it full control over PCIe, the
> driver disabling ASPM or the device not advertising it in the first
> place.

I was unaware of all this root complex sanity checks on Windows,
thanks for sharing.

> Are you aware of any other cases where Windows will refuse to
> enable ASPM?

My point was not whether or not ASPM typically got enabled on Windows
Vs Linux, my point was more of the fact that for some endpoint devices
you may have to tweak the root complex to get ASPM properly working
and that these tweaks *are* implemented on the BIOS by the ODM / OEM
for those devices and that the documentation for such tweaks is not
typically public. So, if you are like me and cannot stand the internal
802.11 card on your laptop and want to replace it with something else
you are stuck to hoping such BIOS tweaks are either not required or
figuring out what the tweaks are yourself and doing them through
userspace for the root complex *prior* to enabling ASPM through
userspace as well for the endpoint.

I suspect these tweaks will go away as the industry produces cards
with both L1 and L0s enabled all the time (devices being produced
today), but for devices caught in that middle of time between whether
or not L0s would be *required* (last 2 years) I suspect we'll run
into these issues.

Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Matthew Garrett on
On Tue, Jun 22, 2010 at 10:40:15AM -0700, Luis R. Rodriguez wrote:
> On Tue, Jun 22, 2010 at 10:25 AM, Matthew Garrett <mjg59(a)srcf.ucam.org> wrote:
> > Right, which we have to deal with by having drivers disable ASPM on
> > broken devices.
>
> Agreed, but then the assumption would be drivers are ASPM bug free
> which is expect to be false with Video and 802.11 given that only a
> handful of vendors do actually get involved with their drivers
> upstream. Safe thing of course is to just disable it, of course, but
> if you are going to use pcie_aspm=force good luck!

People who use "force" deserve whatever they get, but "powersave" really
ought to work. Fedora's defaulted to that for a while now - we've hit
issues with aacraid, but that's pretty much it in terms of cases where
the heuristics don't work. Maxim's problems wouldn't be triggered
because CONFIG_PCIE_ASPM disables it on pre-1.1 devices regardless of
the BIOS setup.

> > Having looked into this, Windows will enable ASPM on external
> > controllers unless there's some reason for it not to - where that may be
> > either the appropriate bit in the FADT being set, the device not being
> > PCIe 1.1 or later, there being no _OSC method on the appropriate root
> > bridge or the _OSC method not giving it full control over PCIe, the
> > driver disabling ASPM or the device not advertising it in the first
> > place.
>
> I was unaware of all this root complex sanity checks on Windows,
> thanks for sharing.

With the patch I've just sent, they should also all be used for Linux as
well.

> I suspect these tweaks will go away as the industry produces cards
> with both L1 and L0s enabled all the time (devices being produced
> today), but for devices caught in that middle of time between whether
> or not L0s would be *required* (last 2 years) I suspect we'll run
> into these issues.

If the same problems would appear under Windows then it's not a problem
that I'm hugely concerned about as yet - we'll wait a bit longer and
then change the ASPM defaults to be more aggressive under Linux, and if
it turns out to be a significant problem in the real world we'll have to
reconsider it. But I don't think we should be depending on userspace
bashing hardware registers in order to be able to enable power
management.

--
Matthew Garrett | mjg59(a)srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Luis R. Rodriguez on
On Tue, Jun 22, 2010 at 10:50 AM, Matthew Garrett <mjg59(a)srcf.ucam.org> wrote:
> On Tue, Jun 22, 2010 at 10:40:15AM -0700, Luis R. Rodriguez wrote:
>> On Tue, Jun 22, 2010 at 10:25 AM, Matthew Garrett <mjg59(a)srcf.ucam.org> wrote:
>> > Right, which we have to deal with by having drivers disable ASPM on
>> > broken devices.
>>
>> Agreed, but then the assumption would be drivers are ASPM bug free
>> which is expect to be false with Video and 802.11 given that only a
>> handful of vendors do actually get involved with their drivers
>> upstream. Safe thing of course is to just disable it, of course, but
>> if you are going to use pcie_aspm=force good luck!
>
> People who use "force" deserve whatever they get,

Heh, this whole patch and thread was started because Jussi tested
ath5k with pcie_aspm=force (on a pre PCIE 1.1 device (?)) . I have
been trying to explain all along why this is a terrible idea to the
point we should probably just remove that code from the kernel. Hence
my side rants and explanations to justify my reasonings.

> but "powersave" really ought to work.

Interesting, as per Documentation/kernel-parameters.txt we have:

pcie_aspm= [PCIE] Forcibly enable or disable PCIe Active
State Power
Management.
off Disable ASPM.
force Enable ASPM even on devices that claim not to
support it.
WARNING: Forcing ASPM on may cause system lockups.

I was unaware of a "powersave" option to the pcie_aspm kernel
parameter. In fact:

static int __init pcie_aspm_disable(char *str)
{
if (!strcmp(str, "off")) {
aspm_disabled = 1;
printk(KERN_INFO "PCIe ASPM is disabled\n");
} else if (!strcmp(str, "force")) {
aspm_force = 1;
printk(KERN_INFO "PCIe ASPM is forcedly enabled\n");
}
return 1;
}

__setup("pcie_aspm=", pcie_aspm_disable);

Where is "powersave"?

I do see a "powersave" but its an ASPM policy string and it implies
you want to enable L1 and L0s when the device's LinkCap supports it,
see pcie_config_aspm_link() and its users. So in other words powersave
seems to imply you are using pcie_aspm=force, no?

> Fedora's defaulted to that for a while now - we've hit
> issues with aacraid, but that's pretty much it in terms of cases where
> the heuristics don't work. Maxim's problems wouldn't be triggered
> because CONFIG_PCIE_ASPM disables it on pre-1.1 devices regardless of
> the BIOS setup.

I don't expect all distributions to have CONFIG_PCIE_ASPM enabled, in
fact I was unaware of this sanity check being included as part of
CONFIG_PCIE_ASPM, I recommend we consider just enabling the sanity
check all the time. The fact that CONFIG_PCIE_ASPM is even an option
seems confusing to me given that apart from this sanity check the only
other thing that I see useful in it is the forcing of ASPM settings
and as I noted I think pcie_aspm=force is pretty dangerous.

>> > Having looked into this, Windows will enable ASPM on external
>> > controllers unless there's some reason for it not to - where that may be
>> > either the appropriate bit in the FADT being set, the device not being
>> > PCIe 1.1 or later, there being no _OSC method on the appropriate root
>> > bridge or the _OSC method not giving it full control over PCIe, the
>> > driver disabling ASPM or the device not advertising it in the first
>> > place.
>>
>> I was unaware of all this root complex sanity checks on Windows,
>> thanks for sharing.
>
> With the patch I've just sent, they should also all be used for Linux as
> well.

Oh nice! It'll be part of 2.6.36?

>> I suspect these tweaks will go away as the industry produces cards
>> with both L1 and L0s enabled all the time (devices being produced
>> today), but for devices caught in that middle of time between whether
>> or not L0s would be *required*  (last 2 years) I suspect we'll run
>> into these issues.
>
> If the same problems would appear under Windows then it's not a problem
> that I'm hugely concerned about as yet

Yes, these issues would also be part of Windows. But should also note
this also means for those people working on their own BIOSes it means
you also have to take these things into more serious consideration.

> - we'll wait a bit longer and
> then change the ASPM defaults to be more aggressive under Linux, and if
> it turns out to be a significant problem in the real world we'll have to
> reconsider it.

The problem is the tweaks in question are device specific. I can see
if I can get you concrete examples.

> But I don't think we should be depending on userspace
> bashing hardware registers in order to be able to enable power
> management.

Me neither, ASPM should just work with default settings, which is why
I also do not like that the sanity check on the PCIE 1.1 spec is done
through CONFIG_PCIE_ASPM, it makes no sense given that ASPM *will*
work even if you do not have CONFIG_PCIE_ASPM but the device has
functional ASPM.

I do think we should be depending on userspace to do development
testing and forcing ASPM on, because the only other alternative is
pcie_aspm=force and as noted this is just not a good idea unless you
*seriously* know what you are doing.

Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Matthew Garrett on
On Tue, Jun 22, 2010 at 11:28:20AM -0700, Luis R. Rodriguez wrote:
> On Tue, Jun 22, 2010 at 10:50 AM, Matthew Garrett <mjg59(a)srcf.ucam.org> wrote:
> > People who use "force" deserve whatever they get,
>
> Heh, this whole patch and thread was started because Jussi tested
> ath5k with pcie_aspm=force (on a pre PCIE 1.1 device (?)) . I have
> been trying to explain all along why this is a terrible idea to the
> point we should probably just remove that code from the kernel. Hence
> my side rants and explanations to justify my reasonings.

Well, there's two things here. If you use force then you might get
inappropriate ASPM. But if your BIOS enables ASPM on an old device, then
booting *without* CONFIG_PCIE_ASPM will leave it turned on, and booting
*with* CONFIG_PCIE_ASPM will turn it off. The Kconfig description is
confusing - reality is that CONFIG_PCIE_ASPM enables logic that allows
the kernel to modify the BIOS default, and disabling it makes the
assumption that your BIOS did something sensible.

> Where is "powersave"?
>
> I do see a "powersave" but its an ASPM policy string and it implies
> you want to enable L1 and L0s when the device's LinkCap supports it,
> see pcie_config_aspm_link() and its users. So in other words powersave
> seems to imply you are using pcie_aspm=force, no?

No. pcie_aspm=force will enable ASPM even if (a) the device is pre-1.1,
(b) the firmware has the FADT flag set to tell you not to and (c) the
firmware doesn't grant control via _OSC. The powersave policy will
enable ASPM even if the BIOS didn't, but only if something else doesn't
tell us not to.

> > Fedora's defaulted to that for a while now - we've hit
> > issues with aacraid, but that's pretty much it in terms of cases where
> > the heuristics don't work. Maxim's problems wouldn't be triggered
> > because CONFIG_PCIE_ASPM disables it on pre-1.1 devices regardless of
> > the BIOS setup.
>
> I don't expect all distributions to have CONFIG_PCIE_ASPM enabled, in
> fact I was unaware of this sanity check being included as part of
> CONFIG_PCIE_ASPM, I recommend we consider just enabling the sanity
> check all the time. The fact that CONFIG_PCIE_ASPM is even an option
> seems confusing to me given that apart from this sanity check the only
> other thing that I see useful in it is the forcing of ASPM settings
> and as I noted I think pcie_aspm=force is pretty dangerous.

You're right, it shouldn't be an option. It's vital for making sure that
ASPM is disabled when it should be. I'd be happy with pcie_aspm=force
tainting the kernel.

> > With the patch I've just sent, they should also all be used for Linux as
> > well.
>
> Oh nice! It'll be part of 2.6.36?

As long as Jesse picks it up.

> > If the same problems would appear under Windows then it's not a problem
> > that I'm hugely concerned about as yet
>
> Yes, these issues would also be part of Windows. But should also note
> this also means for those people working on their own BIOSes it means
> you also have to take these things into more serious consideration.

There's a standardised mechanism for BIOS authors to tell us not to
touch their ASPM configuration, and people that ignore that really do
deserve to have things break.

> Me neither, ASPM should just work with default settings, which is why
> I also do not like that the sanity check on the PCIE 1.1 spec is done
> through CONFIG_PCIE_ASPM, it makes no sense given that ASPM *will*
> work even if you do not have CONFIG_PCIE_ASPM but the device has
> functional ASPM.

I agree. I'll send a patch that moves it under CONFIG_EMBEDDED and
defaults to on.

> I do think we should be depending on userspace to do development
> testing and forcing ASPM on, because the only other alternative is
> pcie_aspm=force and as noted this is just not a good idea unless you
> *seriously* know what you are doing.

If you set the powersave policy and ASPM doesn't get enabled, then
that's because we've got a really strong belief that ASPM shouldn't be
enabled. Is your concern just that pcie_aspm=force is too easy for users
to get at?

--
Matthew Garrett | mjg59(a)srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/