From: Bjorn Helgaas on
[If you haven't been following this bug, the report is at [3].]

Here's a theory. I'm not an expert in HyperTransport, so maybe somebody
who knows HyperTransport and/or VIA chipsets can validate or refute it.

This is based on the _HyperTransport I/O Link Specification_, rev 3.10b [1],
and the _BIOS and Kernel Developer's Guide (BKDG) for AMD Family 10h
Processors_ [2].

In a nutshell, I think the problem is that amd_bus.c treats a
HyperTransport (HT) host bridge as though it were a PCI host bridge. In
particular, when an HT chain contains more than one PCI host bridge, the
HT host bridge apertures encompass all the PCI host bridges, but
amd_bus.c mistakenly assigns all those resources to one PCI host bridge.

From a software point of view, HyperTransport is similar but not
identical to PCI. It is possible to make native HyperTransport
peripheral devices, but PCI devices must be attached via a
HyperTransport-to-PCI bridge [1, sec 4.1].

A PCI host bridge has a platform-specific non-PCI connection, e.g., a
front-side bus, on the primary (upstream) side and a PCI bus on the
secondary (downstream) side. Note that in the HyperTransport spec,
"host bridge" refers to the interface from the host, e.g., CPU cores, to
a HyperTransport chain. This HyperTransport host bridge has a
HyperTransport link on the secondary side, *not* a PCI bus.

A HyperTransport-to-PCI bridge is one kind of PCI host bridge, because
the primary side is HyperTransport and the secondary side is PCI.

Graham's machine contains one HT host bridge leading to an HT chain, and
it has PCI devices on buses 00, 02, 03, 06, and 80. In addition, the HT
host bridge configuration registers appear at device 18 (hex) in bus 00
configuration space, though they are not actually PCI functions. PCI
buses 02, 03, and 06 are reachable from bus 00 via the PCI-to-PCI
bridges at 00:03.3, 00:03.2, and 00:02.0, respectively.

However, there are no PCI-to-PCI bridges that lead to bus 00 or bus 80,
so the HT chain must contain two separate PCI host bridges that lead to
them.

Now, here's the problem: amd_bus.c reads the HT host bridge configuration
and learns that it routes buses 00-ff and the related address space,
including the following range, down the HT chain at node 0, link 0:

[mem 0x80000000-0xfcffffffff]

That makes sense, because both PCI host bridges are on that HT chain, so
the HT host bridge has to forward all that address space. The problem
is that amd_bus.c assumes there's only one PCI host bridge on the
chain, so it assigns *all* that address space to PCI bus 00.

This doesn't work because parts of that address space belong to bus 80,
not bus 00, and we can't reach bus 80 from PCI bus 00. In particular,
we know that at least the following address space is routed to bus 80,
because the 80:01.0 device does work at this address, which is in the
middle of the range we found above:

[mem 0xfebfc000-0xfebfffff]

(Note that we can reach bus 80 from the HT chain, but the HT chain is
outside the PCI domain, even though some of the HT registers appear in
PCI bus 00 config space. We need a second PCI host bridge from the HT
chain to PCI bus 80.)

The HT spec does suggest that an HT/PCI host bridge should implement a
HyperTransport Bridge Header [1, sec 7.4]. This header would make the
HT/PCI host bridge look just like a PCI-to-PCI bridge, with the usual
primary/secondary/subordinate bus numbers, memory, prefetchable memory,
and I/O port apertures, etc.

If all the HT/PCI host bridges on a chain were implemented this way, I
think it probably would work to pretend the HT host bridge is a PCI host
bridge. But this sort of implementation is apparently not universal.
The VIA chipset in Graham's machine doesn't do it that way, and the
Serverworks HT-2100 chipset in the HP DL785 doesn't either.


[1] http://www.hypertransport.org/docs/twgdocs/HTC20051222-0046-0033_changes.pdf
[2] http://support.amd.com/us/Embedded_TechDocs/31116-Public-GH-BKDG_3-28_5-28-09.pdf
[3] https://bugzilla.kernel.org/show_bug.cgi?id=16007
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Yinghai Lu on
On Fri, Jun 11, 2010 at 2:49 PM, Bjorn Helgaas <bjorn.helgaas(a)hp.com> wrote:
> [If you haven't been following this bug, the report is at [3].]
>
> Here's a theory. �I'm not an expert in HyperTransport, so maybe somebody
> who knows HyperTransport and/or VIA chipsets can validate or refute it.
>
> This is based on the _HyperTransport I/O Link Specification_, rev 3.10b [1],
> and the _BIOS and Kernel Developer's Guide (BKDG) for AMD Family 10h
> Processors_ [2].
>
> In a nutshell, I think the problem is that amd_bus.c treats a
> HyperTransport (HT) host bridge as though it were a PCI host bridge. �In
> particular, when an HT chain contains more than one PCI host bridge, the
> HT host bridge apertures encompass all the PCI host bridges, but
> amd_bus.c mistakenly assigns all those resources to one PCI host bridge.

I don't think so. that system only have one HT chain.

May 19 23:20:33 ocham kernel: pci 0000:00:18.1 config space:
May 19 23:20:33 ocham kernel: 00: 22 10 01 11 00 00 00 00 00 00 00 06
00 00 80 00
May 19 23:20:33 ocham kernel: 10: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: 20: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: 30: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: 40: 03 00 00 00 00 00 7f 00 00 00 00 00
01 00 00 00
May 19 23:20:33 ocham kernel: 50: 00 00 00 00 02 00 00 00 00 00 00 00
03 00 00 00
May 19 23:20:33 ocham kernel: 60: 00 00 00 00 04 00 00 00 00 00 00 00
05 00 00 00
May 19 23:20:33 ocham kernel: 70: 00 00 00 00 06 00 00 00 00 00 00 00
07 00 00 00
May 19 23:20:33 ocham kernel: 80: 03 00 e0 00 80 ff ef 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: 90: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: a0: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: b0: 03 0a 00 00 00 0b 00 00 03 00 80 00
00 ff ff 00
May 19 23:20:33 ocham kernel: c0: 13 10 00 00 00 f0 ff 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: d0: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: e0: 03 00 00 ff 00 00 00 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: f0: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00

the (0xe4) = ff 00 00 03

mean it will route pci operation all to node0 link0.

that chip from VIA has some design problem that will produce one orphan device.

May 19 23:20:33 ocham kernel: pci 0000:80:01.0 config space:
May 19 23:20:33 ocham kernel: 00: 06 11 88 32 06 00 10 00 10 00 03 04
10 00 00 00
May 19 23:20:33 ocham kernel: 10: 04 c0 bf fe 00 00 00 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: 20: 00 00 00 00 00 00 00 00 00 00 00 00
49 18 88 08
May 19 23:20:33 ocham kernel: 30: 00 00 00 00 50 00 00 00 00 00 00 00
0b 01 00 00
May 19 23:20:33 ocham kernel: 40: 00 30 00 00 00 00 00 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: 50: 01 60 42 c8 00 00 00 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: 60: 05 70 80 00 00 00 00 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: 70: 10 00 91 00 00 00 00 00 00 00 30 00
00 00 00 00
May 19 23:20:33 ocham kernel: 80: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: 90: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: a0: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: b0: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: c0: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: d0: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: e0: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
May 19 23:20:33 ocham kernel: f0: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Yinghai Lu on

please check if this one workaround the problem

Thanks

Yinghai Lu

[PATCH] x86, pci: handle fallout pci devices with peer root bus

Signed-off-by: Yinghai Lu <yinghai(a)kernel.org>

---
arch/x86/pci/bus_numa.c | 4 +++-
kernel/resource.c | 2 +-
2 files changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/pci/bus_numa.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/bus_numa.c
+++ linux-2.6/arch/x86/pci/bus_numa.c
@@ -22,7 +22,8 @@ void x86_pci_root_bus_res_quirks(struct
return;

for (i = 0; i < pci_root_num; i++) {
- if (pci_root_info[i].bus_min == b->number)
+ if (pci_root_info[i].bus_min <= b->number &&
+ pci_root_info[i].bus_max >= b->number)
break;
}

@@ -37,6 +38,7 @@ void x86_pci_root_bus_res_quirks(struct
for (j = 0; j < info->res_num; j++) {
struct resource *res;
struct resource *root;
+ struct resource *tmp;

res = &info->res[j];
pci_bus_add_resource(b, res, 0);
Index: linux-2.6/kernel/resource.c
===================================================================
--- linux-2.6.orig/kernel/resource.c
+++ linux-2.6/kernel/resource.c
@@ -451,7 +451,7 @@ static struct resource * __insert_resour
if (!first)
return first;

- if (first == parent)
+ if (first == parent || first == new)
return first;

if ((first->start > new->start) || (first->end < new->end))
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Bjorn Helgaas on
On Friday, June 11, 2010 05:06:49 pm Yinghai Lu wrote:
>
> please check if this one workaround the problem
>
> Thanks
>
> Yinghai Lu
>
> [PATCH] x86, pci: handle fallout pci devices with peer root bus
>
> Signed-off-by: Yinghai Lu <yinghai(a)kernel.org>

This patch apparently does cover up the problem, but it fails on
so many levels:

- incomprehensible summary
- no changelog
- no bugzilla pointer
- unrelated junk in patch ("tmp")
- completely unexplained change to generic resource.c
- no indication that we understand the root cause

> ---
> arch/x86/pci/bus_numa.c | 4 +++-
> kernel/resource.c | 2 +-
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/arch/x86/pci/bus_numa.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/pci/bus_numa.c
> +++ linux-2.6/arch/x86/pci/bus_numa.c
> @@ -22,7 +22,8 @@ void x86_pci_root_bus_res_quirks(struct
> return;
>
> for (i = 0; i < pci_root_num; i++) {
> - if (pci_root_info[i].bus_min == b->number)
> + if (pci_root_info[i].bus_min <= b->number &&
> + pci_root_info[i].bus_max >= b->number)
> break;
> }
>
> @@ -37,6 +38,7 @@ void x86_pci_root_bus_res_quirks(struct
> for (j = 0; j < info->res_num; j++) {
> struct resource *res;
> struct resource *root;
> + struct resource *tmp;
>
> res = &info->res[j];
> pci_bus_add_resource(b, res, 0);
> Index: linux-2.6/kernel/resource.c
> ===================================================================
> --- linux-2.6.orig/kernel/resource.c
> +++ linux-2.6/kernel/resource.c
> @@ -451,7 +451,7 @@ static struct resource * __insert_resour
> if (!first)
> return first;
>
> - if (first == parent)
> + if (first == parent || first == new)
> return first;
>
> if ((first->start > new->start) || (first->end < new->end))
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Bjorn Helgaas on
I think the best long-term fix is to always enable "pci=use_crs",
regardless of the BIOS date (currently we only do it for 2008 and
newer). System designers and BIOS writers expect the OS to pay
attention to that information, and indications are that Windows
does use it, so I think we will ultimately be better off if we
use the expected, best-tested path.

However, we have at least one known Linux issue (bug #16228) when
_CRS is enabled, so I'm hesitant to enable it unconditionally at
least until that is resolved.

In the short term, I think we should apply Graham's quirk from
comment #8, which enables pci=use_crs just for his system.

Here's my response to Yinghai's patches. ACPI gives us these resources:
pci_root PNP0A03:00: host bridge window [mem 0x80000000-0xff37ffff] (bus 00)
pci_root PNP0A08:00: host bridge window [mem 0xfebfc000-0xfebfffff] (bus 80)

Yinghai's patch (comment #17, with a v2 posted to the list but not in
the bugzilla), gives us these resources:
pci_bus 0000:00: resource 5 [mem 0x80000000-0xfcffffffff]
pci_bus 0000:80: resource 5 [mem 0x80000000-0xfcffffffff]

I think it's just a bad idea to assign the same range to both buses,
especially when the BIOS is telling us what we should be using.

I also think it's a mistake to mess with the resource code to deal
with this specific case. A change like that makes resource.c hard
to understand and maintain in the future.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/