From: Jens Axboe on
On Wed, Oct 28 2009, Alex Chiang wrote:
> * Jens Axboe <jens.axboe(a)oracle.com>:
> >
> > acpiphp: enable_slot - physical_slot = 1
> > power_on_slot
> > no _PS0
> > no _PS0
> > no _PS0
> > no _PS0
> > no _PS0
> > no _PS0
> > no _PS0
> > no _PS0
>
> One final thought -- your DSDT doesn't provide any power methods
> such as _PS[0-3] (I grepped your DSDT so basing my statement on
> more than just the output above), and without those, I'm pretty
> sure that there's no way for the OS to communicate to the BIOS
> that we want to power those slots on.
>
> So, something funky is going on with your BIOS. This isn't some
> weird proto board or something, is it? ;)

It's pre-production, but not a prototype. I'll take it up with the
vendor.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Thu, Oct 29 2009, Kenji Kaneshige wrote:
> Jens Axboe wrote:
>> On Thu, Oct 29 2009, Kenji Kaneshige wrote:
>>> Jens Axboe wrote:
>>>> On Wed, Oct 28 2009, Kenji Kaneshige wrote:
>>>>> Jens Axboe wrote:
>>>>>> On Tue, Oct 27 2009, Kenji Kaneshige wrote:
>>>>>>> Jens Axboe wrote:
>>>>>>>> On Tue, Oct 20 2009, Alex Chiang wrote:
>>>>>>>>> * Jens Axboe <jens.axboe(a)oracle.com>:
>>>>>>>>>> On Tue, Oct 13 2009, Alex Chiang wrote:
>>>>>>>>>>>>> Can you modprobe acpiphp with debug=1? And send the output?
>>>>>>>>>>>> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
>>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
>>>>>>>>>>>> acpiphp: Slot [1] registered
>>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
>>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
>>>>>>>>>>>> acpiphp: Slot [2] registered
>>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
>>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
>>>>>>>>>>>> acpiphp: Slot [6] registered
>>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
>>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
>>>>>>>>>>>> acpiphp: Slot [7] registered
>>>>>>>>>>>> acpiphp_glue: Bus 0000:87 has 1 slot
>>>>>>>>>>>> acpiphp_glue: Bus 0000:84 has 1 slot
>>>>>>>>>>>> acpiphp_glue: Bus 0000:0b has 1 slot
>>>>>>>>>>>> acpiphp_glue: Bus 0000:08 has 1 slot
>>>>>>>>>>>> acpiphp_glue: Total 4 slots
>>>>>>>>>>> You mentioned in another mail that you echoed 1 into the various
>>>>>>>>>>> slots' power files.
>>>>>>>>>>>
>>>>>>>>>>> Did you do that after modprobing acpiphp with debug=1?
>>>>>>>>>>>
>>>>>>>>>>> If so, there should be debug output when you try and turn them
>>>>>>>>>>> on.
>>>>>>>>>> It produces:
>>>>>>>>>>
>>>>>>>>>> acpiphp: enable_slot - physical_slot = 1
>>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>>> acpiphp: enable_slot - physical_slot = 2
>>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>>> acpiphp: enable_slot - physical_slot = 6
>>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>>> acpiphp: enable_slot - physical_slot = 7
>>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>> Hm, so for some reason, firmware on your machine is telling us
>>>>>>>>> that it doesn't think cards are present and/or enabled.
>>>>>>>>>
>>>>>>>>> Unfortunately, I don't know why your firmware would be saying
>>>>>>>>> that. We could add some more debug printks to see what firmware
>>>>>>>>> thinks about your system... Or we could just wait and see what
>>>>>>>>> happens after you get your hardware replaced.
>>>>>>>> New board, the exact same thing happens.
>>>>>>>>
>>>>>>>>>> I have a card in one of the slots only this time.
>>>>>>>>>>
>>>>>>>>>>> Also, quick dummy check, you are trying to power on populated
>>>>>>>>>>> slots, right? :)
>>>>>>>>>> Yes :-)
>>>>>>>>>>
>>>>>>>>>>> Can you send the output of lspci -vv? And I like the output of
>>>>>>>>>>> lspci -vt as well... Both before and after loading acpiphp
>>>>>>>>>>> please.
>>>>>>>>>> Send privately.
>>>>>>>>> No difference in before and after. Odd.
>>>>>>>>>
>>>>>>>>> If you want to poke us again after your hardware swap, please do
>>>>>>>>> so. Sorry for being not so helpful. :-/
>>>>>>>> Poke :-)
>>>>>>>>
>>>>>>>> One more thing I tried was pushing the power button on the slot
>>>>>>>> manually. With acpiphp, I get the same messages as above. Using pciehp,
>>>>>>>> I get the same power fault bit interrupt storm. So no difference from
>>>>>>>> using the sysfs interface or doing it on the box side, doesn't work
>>>>>>>> either way.
>>>>>>>>
>>>>>>> I'd like to confirm power fault interrupt storm, just in case.
>>>>>>> Could you get /proc/interrupts information after power fault
>>>>>>> problem happens and send it to me?
>>>>>> The box pretty much hangs when I try to power on a slot with pciehp, so
>>>>>> it's not easy to do... It doesn't hang with acpiphp, but doesn't work
>>>>>> either (see previous reply to Alex).
>>>>>>
>>>>> Could you try the attached debugging patch? With this patch, power
>>>>> fault interrupt would be disabled after 100 power fault detected (
>>>>> I hope so). You can get /proc/interrupts after that.
>>>> Here is the output of doing the power on with that patch applied.
>>>>
>>>> pciehp 0000:00:05.0:pcie04: enable_slot: physical_slot = 1
>>>> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 77b
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
>>>> pciehp 0000:00:05.0:pcie04: pciehp_power_on_slot: SLOTCTRL a8 write cmd 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
>>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: Power fault interrupt received
>>>> pciehp 0000:00:05.0:pcie04: Power fault on Slot(1)
>>>> pciehp 0000:00:05.0:pcie04: Power fault bit 0 set
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>> pciehp 0000:00:05.0:pcie04: Data Link Layer Link Active not set in 1000 msec
>>>> pciehp 0000:00:05.0:pcie04: pciehp_check_link_status: lnk_status = 1001
>>>> pciehp 0000:00:05.0:pcie04: Link Training Error occurs pciehp
>>>> 0000:00:05.0:pcie04: Failed to check link status
>>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>>> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
>>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
>>>> pciehp 0000:00:05.0:pcie04: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
>>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
>>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>>> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
>>>> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 779
>>>> pciehp 0000:00:05.0:pcie04: pciehp_get_attention_status: SLOTCTRL a8, value read 779
>>>>
>>> From the console log, it seems that my debug patch worked as I expected
>>> (power fault event interrupts ware disabled after 100 power fault event).
>>> But for some reasons, /proc/interrupts indicates only 5 interrupts of
>>> pciehp. Just in case, did you get /proc/interrupts after doing power on?
>>
>> Nope, it was captured post the power on attempt and the above log dump.
>>
>
> Can I confirm that? (sorry for my poor English skill)
>
> The /proc/interrupt was captured *before* the power on attempt and the log.
> Correct?

No, the /proc/interrupt output was captured AFTER the power on attempt
and the log capture shown above.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Kenji Kaneshige on
Jens Axboe wrote:
> On Thu, Oct 29 2009, Kenji Kaneshige wrote:
>> Jens Axboe wrote:
>>> On Wed, Oct 28 2009, Kenji Kaneshige wrote:
>>>> Jens Axboe wrote:
>>>>> On Tue, Oct 27 2009, Kenji Kaneshige wrote:
>>>>>> Jens Axboe wrote:
>>>>>>> On Tue, Oct 20 2009, Alex Chiang wrote:
>>>>>>>> * Jens Axboe <jens.axboe(a)oracle.com>:
>>>>>>>>> On Tue, Oct 13 2009, Alex Chiang wrote:
>>>>>>>>>>>> Can you modprobe acpiphp with debug=1? And send the output?
>>>>>>>>>>> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
>>>>>>>>>>> acpiphp: Slot [1] registered
>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
>>>>>>>>>>> acpiphp: Slot [2] registered
>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
>>>>>>>>>>> acpiphp: Slot [6] registered
>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
>>>>>>>>>>> acpiphp: Slot [7] registered
>>>>>>>>>>> acpiphp_glue: Bus 0000:87 has 1 slot
>>>>>>>>>>> acpiphp_glue: Bus 0000:84 has 1 slot
>>>>>>>>>>> acpiphp_glue: Bus 0000:0b has 1 slot
>>>>>>>>>>> acpiphp_glue: Bus 0000:08 has 1 slot
>>>>>>>>>>> acpiphp_glue: Total 4 slots
>>>>>>>>>> You mentioned in another mail that you echoed 1 into the various
>>>>>>>>>> slots' power files.
>>>>>>>>>>
>>>>>>>>>> Did you do that after modprobing acpiphp with debug=1?
>>>>>>>>>>
>>>>>>>>>> If so, there should be debug output when you try and turn them
>>>>>>>>>> on.
>>>>>>>>> It produces:
>>>>>>>>>
>>>>>>>>> acpiphp: enable_slot - physical_slot = 1
>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>> acpiphp: enable_slot - physical_slot = 2
>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>> acpiphp: enable_slot - physical_slot = 6
>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>> acpiphp: enable_slot - physical_slot = 7
>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>> Hm, so for some reason, firmware on your machine is telling us
>>>>>>>> that it doesn't think cards are present and/or enabled.
>>>>>>>>
>>>>>>>> Unfortunately, I don't know why your firmware would be saying
>>>>>>>> that. We could add some more debug printks to see what firmware
>>>>>>>> thinks about your system... Or we could just wait and see what
>>>>>>>> happens after you get your hardware replaced.
>>>>>>> New board, the exact same thing happens.
>>>>>>>
>>>>>>>>> I have a card in one of the slots only this time.
>>>>>>>>>
>>>>>>>>>> Also, quick dummy check, you are trying to power on populated
>>>>>>>>>> slots, right? :)
>>>>>>>>> Yes :-)
>>>>>>>>>
>>>>>>>>>> Can you send the output of lspci -vv? And I like the output of
>>>>>>>>>> lspci -vt as well... Both before and after loading acpiphp
>>>>>>>>>> please.
>>>>>>>>> Send privately.
>>>>>>>> No difference in before and after. Odd.
>>>>>>>>
>>>>>>>> If you want to poke us again after your hardware swap, please do
>>>>>>>> so. Sorry for being not so helpful. :-/
>>>>>>> Poke :-)
>>>>>>>
>>>>>>> One more thing I tried was pushing the power button on the slot
>>>>>>> manually. With acpiphp, I get the same messages as above. Using pciehp,
>>>>>>> I get the same power fault bit interrupt storm. So no difference from
>>>>>>> using the sysfs interface or doing it on the box side, doesn't work
>>>>>>> either way.
>>>>>>>
>>>>>> I'd like to confirm power fault interrupt storm, just in case.
>>>>>> Could you get /proc/interrupts information after power fault
>>>>>> problem happens and send it to me?
>>>>> The box pretty much hangs when I try to power on a slot with pciehp, so
>>>>> it's not easy to do... It doesn't hang with acpiphp, but doesn't work
>>>>> either (see previous reply to Alex).
>>>>>
>>>> Could you try the attached debugging patch? With this patch, power
>>>> fault interrupt would be disabled after 100 power fault detected (
>>>> I hope so). You can get /proc/interrupts after that.
>>> Here is the output of doing the power on with that patch applied.
>>>
>>> pciehp 0000:00:05.0:pcie04: enable_slot: physical_slot = 1
>>> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 77b
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
>>> pciehp 0000:00:05.0:pcie04: pciehp_power_on_slot: SLOTCTRL a8 write cmd 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: Power fault interrupt received
>>> pciehp 0000:00:05.0:pcie04: Power fault on Slot(1)
>>> pciehp 0000:00:05.0:pcie04: Power fault bit 0 set
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>> pciehp 0000:00:05.0:pcie04: Data Link Layer Link Active not set in 1000 msec
>>> pciehp 0000:00:05.0:pcie04: pciehp_check_link_status: lnk_status = 1001
>>> pciehp 0000:00:05.0:pcie04: Link Training Error occurs pciehp
>>> 0000:00:05.0:pcie04: Failed to check link status
>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
>>> pciehp 0000:00:05.0:pcie04: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
>>> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 779
>>> pciehp 0000:00:05.0:pcie04: pciehp_get_attention_status: SLOTCTRL a8, value read 779
>>>
>> From the console log, it seems that my debug patch worked as I expected
>> (power fault event interrupts ware disabled after 100 power fault event).
>> But for some reasons, /proc/interrupts indicates only 5 interrupts of
>> pciehp. Just in case, did you get /proc/interrupts after doing power on?
>
> Nope, it was captured post the power on attempt and the above log dump.
>

Can I confirm that? (sorry for my poor English skill)

The /proc/interrupt was captured *before* the power on attempt and the log.
Correct?

Thanks,
Kenji Kaneshige





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on

Just a note for the archives - after chatting with Alex on irc about
this issue and trying other cards, the likely suspect seems to be the
specific card used and/or the firmware on that card. Hotplug works
otherwise, just not with that card at least.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Kenji Kaneshige on
Jens Axboe wrote:
> On Thu, Oct 29 2009, Kenji Kaneshige wrote:
>> Jens Axboe wrote:
>>> On Thu, Oct 29 2009, Kenji Kaneshige wrote:
>>>> Jens Axboe wrote:
>>>>> On Wed, Oct 28 2009, Kenji Kaneshige wrote:
>>>>>> Jens Axboe wrote:
>>>>>>> On Tue, Oct 27 2009, Kenji Kaneshige wrote:
>>>>>>>> Jens Axboe wrote:
>>>>>>>>> On Tue, Oct 20 2009, Alex Chiang wrote:
>>>>>>>>>> * Jens Axboe <jens.axboe(a)oracle.com>:
>>>>>>>>>>> On Tue, Oct 13 2009, Alex Chiang wrote:
>>>>>>>>>>>>>> Can you modprobe acpiphp with debug=1? And send the output?
>>>>>>>>>>>>> acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>>>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:05.0
>>>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:08:00
>>>>>>>>>>>>> acpiphp: Slot [1] registered
>>>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:00:07.0
>>>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 2 at PCI 0000:0b:00
>>>>>>>>>>>>> acpiphp: Slot [2] registered
>>>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:07.0
>>>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:84:00
>>>>>>>>>>>>> acpiphp: Slot [6] registered
>>>>>>>>>>>>> acpiphp_glue: found PCI-to-PCI bridge at PCI 0000:80:09.0
>>>>>>>>>>>>> acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:87:00
>>>>>>>>>>>>> acpiphp: Slot [7] registered
>>>>>>>>>>>>> acpiphp_glue: Bus 0000:87 has 1 slot
>>>>>>>>>>>>> acpiphp_glue: Bus 0000:84 has 1 slot
>>>>>>>>>>>>> acpiphp_glue: Bus 0000:0b has 1 slot
>>>>>>>>>>>>> acpiphp_glue: Bus 0000:08 has 1 slot
>>>>>>>>>>>>> acpiphp_glue: Total 4 slots
>>>>>>>>>>>> You mentioned in another mail that you echoed 1 into the various
>>>>>>>>>>>> slots' power files.
>>>>>>>>>>>>
>>>>>>>>>>>> Did you do that after modprobing acpiphp with debug=1?
>>>>>>>>>>>>
>>>>>>>>>>>> If so, there should be debug output when you try and turn them
>>>>>>>>>>>> on.
>>>>>>>>>>> It produces:
>>>>>>>>>>>
>>>>>>>>>>> acpiphp: enable_slot - physical_slot = 1
>>>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>>>> acpiphp: enable_slot - physical_slot = 2
>>>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>>>> acpiphp: enable_slot - physical_slot = 6
>>>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>>>> acpiphp: enable_slot - physical_slot = 7
>>>>>>>>>>> acpiphp_glue: acpiphp_enable_slot: Slot status is not ACPI_STA_ALL
>>>>>>>>>> Hm, so for some reason, firmware on your machine is telling us
>>>>>>>>>> that it doesn't think cards are present and/or enabled.
>>>>>>>>>>
>>>>>>>>>> Unfortunately, I don't know why your firmware would be saying
>>>>>>>>>> that. We could add some more debug printks to see what firmware
>>>>>>>>>> thinks about your system... Or we could just wait and see what
>>>>>>>>>> happens after you get your hardware replaced.
>>>>>>>>> New board, the exact same thing happens.
>>>>>>>>>
>>>>>>>>>>> I have a card in one of the slots only this time.
>>>>>>>>>>>
>>>>>>>>>>>> Also, quick dummy check, you are trying to power on populated
>>>>>>>>>>>> slots, right? :)
>>>>>>>>>>> Yes :-)
>>>>>>>>>>>
>>>>>>>>>>>> Can you send the output of lspci -vv? And I like the output of
>>>>>>>>>>>> lspci -vt as well... Both before and after loading acpiphp
>>>>>>>>>>>> please.
>>>>>>>>>>> Send privately.
>>>>>>>>>> No difference in before and after. Odd.
>>>>>>>>>>
>>>>>>>>>> If you want to poke us again after your hardware swap, please do
>>>>>>>>>> so. Sorry for being not so helpful. :-/
>>>>>>>>> Poke :-)
>>>>>>>>>
>>>>>>>>> One more thing I tried was pushing the power button on the slot
>>>>>>>>> manually. With acpiphp, I get the same messages as above. Using pciehp,
>>>>>>>>> I get the same power fault bit interrupt storm. So no difference from
>>>>>>>>> using the sysfs interface or doing it on the box side, doesn't work
>>>>>>>>> either way.
>>>>>>>>>
>>>>>>>> I'd like to confirm power fault interrupt storm, just in case.
>>>>>>>> Could you get /proc/interrupts information after power fault
>>>>>>>> problem happens and send it to me?
>>>>>>> The box pretty much hangs when I try to power on a slot with pciehp, so
>>>>>>> it's not easy to do... It doesn't hang with acpiphp, but doesn't work
>>>>>>> either (see previous reply to Alex).
>>>>>>>
>>>>>> Could you try the attached debugging patch? With this patch, power
>>>>>> fault interrupt would be disabled after 100 power fault detected (
>>>>>> I hope so). You can get /proc/interrupts after that.
>>>>> Here is the output of doing the power on with that patch applied.
>>>>>
>>>>> pciehp 0000:00:05.0:pcie04: enable_slot: physical_slot = 1
>>>>> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 77b
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
>>>>> pciehp 0000:00:05.0:pcie04: pciehp_power_on_slot: SLOTCTRL a8 write cmd 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 10
>>>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: Power fault interrupt received
>>>>> pciehp 0000:00:05.0:pcie04: Power fault on Slot(1)
>>>>> pciehp 0000:00:05.0:pcie04: Power fault bit 0 set
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 2
>>>>> pciehp 0000:00:05.0:pcie04: Data Link Layer Link Active not set in 1000 msec
>>>>> pciehp 0000:00:05.0:pcie04: pciehp_check_link_status: lnk_status = 1001
>>>>> pciehp 0000:00:05.0:pcie04: Link Training Error occurs pciehp
>>>>> 0000:00:05.0:pcie04: Failed to check link status
>>>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>>>> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
>>>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
>>>>> pciehp 0000:00:05.0:pcie04: pcie_isr: intr_loc 12
>>>>> pciehp 0000:00:05.0:pcie04: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
>>>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>>>> pciehp 0000:00:05.0:pcie04: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
>>>>> pciehp 0000:00:05.0:pcie04: Command not completed in 1000 msec
>>>>> pciehp 0000:00:05.0:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
>>>>> pciehp 0000:00:05.0:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 779
>>>>> pciehp 0000:00:05.0:pcie04: pciehp_get_attention_status: SLOTCTRL a8, value read 779
>>>>>
>>>> From the console log, it seems that my debug patch worked as I expected
>>>> (power fault event interrupts ware disabled after 100 power fault event).
>>>> But for some reasons, /proc/interrupts indicates only 5 interrupts of
>>>> pciehp. Just in case, did you get /proc/interrupts after doing power on?
>>> Nope, it was captured post the power on attempt and the above log dump.
>>>
>> Can I confirm that? (sorry for my poor English skill)
>>
>> The /proc/interrupt was captured *before* the power on attempt and the log.
>> Correct?
>
> No, the /proc/interrupt output was captured AFTER the power on attempt
> and the log capture shown above.
>

Thank you very much for confirmation.

The pciehp driver has a code that calls interrupt service routine
internally. So I would like to confirm if "pcie_isr: intr_loc 2"
message storm is caused by interrupts, not by internal loop.
Unfortunately, I could not confirm it from /proc/interrupts.

According the current handle_edge_irq() implementation, there seems
some cases interrupts are not counted up in /proc/interrups (I think
it could happen when next interrupts comes while first interrupt is
being handled). In addition, what I did in the debug patch was just
disabling power fault interrupts by touching hardware register, and
it actually worked just fine. So "pcie_isr: intr_loc 2" storm should
be caused by power fault interrupt storm.

I made a two patches to prevent pciehp problems happining on your
machine. Could you try those patches one by one?

- pciehp-fix-power-fault-interrupt-storm-problem.patch
This patch is against 2.6.32-rc5. This is for fixing the power
fault interrupt storm problem. Could you check if interrupt
storm doesn't happen with this patch?

- pci-hotplug-fix-oshp-evaluation.patch
This patch is against 2.6.32-rc5. This is for fixing the slot
mis-detection problem. According to the debug information you
sent me before, your system seems to expect that at least slot
1 and slot 2 are NOT handled by pciehp. With this patch, those
slot will not be detected by pciehp.

In both testing, could you use pciehp_debug option, and send me
the console log?

Thanks,
Kenji Kaneshige