From: Jesse Barnes on
On Tue, 13 Jul 2010 18:17:18 -0700
Ben Greear <greearb(a)candelatech.com> wrote:

> On 07/13/2010 05:36 PM, Ben Greear wrote:
> > We're seeing boot failures on multiple machines, running FC8 and
> > F11. I bisected on an FC8 32-bit system. Newer hardware works,
> > but these older ones do not.
> >
> > A console log of the hang is found later in this email.
> >
> > Please let me know if you would like any additional information,
> > and I will be happy to test patches.
> >
> > The same failure happens in 2.6.34.1, so the fix does not appear to
> > be in the stable tree yet.
>
>
> I added some printks to the offending code. It seems the problem
> is that the fixed_bar_cap method in arch/x86/pci/mrst.c loops forever:
>
> # Endless loop of this spewing to console...
>
> pcie_cap: 268435456Checking vendor..
> pos after shift: 256
> Before read..
> pcie_cap: 268435456Checking vendor..
> pos after shift: 256
> Before read..
> pcie_cap: 268435456Checking vendor..
> pos after shift: 256
> Before read..
> pcie_cap: 268435456Checking vendor..
> pos after shift: 256
> Before read..
> pcie_cap: 268435456Checking vendor..
> pos after shift: 256
> Before read..
> pcie_cap: 268435456Checking vendor..
>
>
> static int fixed_bar_cap(struct pci_bus *bus, unsigned int devfn)
> {
> int pos;
> u32 pcie_cap = 0, cap_data;
> printk("fixed_bar_cap, bus: %p devfn: %u\n", bus, devfn);
> pos = PCIE_CAP_OFFSET;
> while (pos) {
> printk("Before read..\n");
> if (raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
> devfn, pos, 4, &pcie_cap))
> return 0;
> printk("pcie_cap: %u", pcie_cap);
>
> if (pcie_cap == 0xffffffff)
> return 0;
>
> printk("Checking vendor..\n");
> if (PCI_EXT_CAP_ID(pcie_cap) == PCI_EXT_CAP_ID_VNDR) {
> printk("reading domain_nr\n");
> raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
> devfn, pos + 4, 4, &cap_data);
> printk("cap_data: %u\n", cap_data);
> if ((cap_data & 0xffff) == PCIE_VNDR_CAP_ID_FIXED_BAR)
> return pos;
> }
>
> pos = pcie_cap >> 20;
> printk("pos after shift: %i\n", pos);
> }
>
> printk("Returning from fixed_bar_cap\n");
> return 0;
> }
>
>

I thought a related bug was fixed already; the code should be returning
all zeros for non-existent BAR reads.

--
Jesse Barnes, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jesse Barnes on
On Wed, 14 Jul 2010 11:52:44 -0700
"H. Peter Anvin" <hpa(a)zytor.com> wrote:

> So I suggest the following changes...
>
> On 07/13/2010 07:19 PM, Jesse Barnes wrote:
> >>
> >> static int fixed_bar_cap(struct pci_bus *bus, unsigned int devfn)
> >> {
> >> int pos;
> >> u32 pcie_cap = 0, cap_data;
> >> printk("fixed_bar_cap, bus: %p devfn: %u\n", bus, devfn);
> >> pos = PCIE_CAP_OFFSET;
> >> while (pos) {
>
> while (pos >= PCIE_CAP_OFFSET) {
>
> >> printk("Before read..\n");
> >> if (raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
> >> devfn, pos, 4, &pcie_cap))
> >> return 0;
> >> printk("pcie_cap: %u", pcie_cap);
> >>
> - if (pcie_cap == 0xffffffff)
> - return 0;
>
> + if (PCI_EXT_CAP_ID(pcie_cap) == 0x0000 ||
> + PCI_EXT_CAP_ID(pcie_cap) == 0xffff)
> + break;
>
> >> printk("Checking vendor..\n");
>
> >> if (PCI_EXT_CAP_ID(pcie_cap) == PCI_EXT_CAP_ID_VNDR) {
> >> printk("reading domain_nr\n");
> >> raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
> >> devfn, pos + 4, 4, &cap_data);
> >> printk("cap_data: %u\n", cap_data);
> >> if ((cap_data & 0xffff) == PCIE_VNDR_CAP_ID_FIXED_BAR)
> >> return pos;
> >> }
> >>
> >> pos = pcie_cap >> 20;
>
> pos = (pcie_cap >> 20) & 0xffc;
>
> >> printk("pos after shift: %i\n", pos);
> >> }
> >>
> >> printk("Returning from fixed_bar_cap\n");
> >> return 0;
> >> }
> >>
> >>
> >
> > I thought a related bug was fixed already; the code should be returning
> > all zeros for non-existent BAR reads.

Changes look ok to me, though I'd prefer not hitting this code at all
on non-Moorestown if at all possible.

--
Jesse Barnes, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Pan, Jacob jun on


>-----Original Message-----
>From: Jesse Barnes [mailto:jbarnes(a)virtuousgeek.org]
>Sent: Wednesday, July 14, 2010 11:59 AM
>To: H. Peter Anvin
>Cc: Ben Greear; linux-kernel; Pan, Jacob jun
>Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected:
>de08e2c26
>
>On Wed, 14 Jul 2010 11:52:44 -0700
>"H. Peter Anvin" <hpa(a)zytor.com> wrote:
>
>> So I suggest the following changes...
>>
>> On 07/13/2010 07:19 PM, Jesse Barnes wrote:
>> >>
>> >> static int fixed_bar_cap(struct pci_bus *bus, unsigned int devfn)
>> >> {
>> >> int pos;
>> >> u32 pcie_cap = 0, cap_data;
>> >> printk("fixed_bar_cap, bus: %p devfn: %u\n", bus, devfn);
>> >> pos = PCIE_CAP_OFFSET;
>> >> while (pos) {
>>
>> while (pos >= PCIE_CAP_OFFSET) {
>>
>> >> printk("Before read..\n");
>> >> if (raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
>> >> devfn, pos, 4, &pcie_cap))
>> >> return 0;
>> >> printk("pcie_cap: %u", pcie_cap);
>> >>
>> - if (pcie_cap == 0xffffffff)
>> - return 0;
>>
>> + if (PCI_EXT_CAP_ID(pcie_cap) == 0x0000 ||
>> + PCI_EXT_CAP_ID(pcie_cap) == 0xffff)
>> + break;
>>
>> >> printk("Checking vendor..\n");
>>
>> >> if (PCI_EXT_CAP_ID(pcie_cap) == PCI_EXT_CAP_ID_VNDR) {
>> >> printk("reading domain_nr\n");
>> >> raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
>> >> devfn, pos + 4, 4, &cap_data);
>> >> printk("cap_data: %u\n", cap_data);
>> >> if ((cap_data & 0xffff) == PCIE_VNDR_CAP_ID_FIXED_BAR)
>> >> return pos;
>> >> }
>> >>
>> >> pos = pcie_cap >> 20;
>>
>> pos = (pcie_cap >> 20) & 0xffc;
>>
>> >> printk("pos after shift: %i\n", pos);
>> >> }
>> >>
>> >> printk("Returning from fixed_bar_cap\n");
>> >> return 0;
>> >> }
>> >>
>> >>
>> >
>> > I thought a related bug was fixed already; the code should be
>returning
>> > all zeros for non-existent BAR reads.
>
>Changes look ok to me, though I'd prefer not hitting this code at all
>on non-Moorestown if at all possible.

Can we use PCI_EXT_CAP_NEXT to replace this line?
pos = (pcie_cap >> 20) & 0xffc
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Pan, Jacob jun on
HPA,
If you have not done it already, I can generate a patch against tip tree
and do some testing on it today.

Thanks,

Jacob

>-----Original Message-----
>From: Ben Greear [mailto:greearb(a)candelatech.com]
>Sent: Thursday, July 15, 2010 9:38 AM
>To: H. Peter Anvin
>Cc: Pan, Jacob jun; Jesse Barnes; linux-kernel
>Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected:
>de08e2c26
>
>On 07/14/2010 12:55 PM, H. Peter Anvin wrote:
>> On 07/14/2010 12:01 PM, Pan, Jacob jun wrote:
>>>
>>> Can we use PCI_EXT_CAP_NEXT to replace this line?
>>> pos = (pcie_cap>> 20)& 0xffc
>>>
>>
>> Presumably, I haven't checked the definition of that macro, but that
>> would be the logical semantics.
>>
>> -hpa
>
>Are one of you guys going to submit a fix upstream (and hopefully
>to stable)? Since you guys actually understand the code, probably
>best that you do it....
>
>Thanks,
>Ben
>
>--
>Ben Greear <greearb(a)candelatech.com>
>Candela Technologies Inc http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/