From: David Woodhouse on
On Wed, 2010-06-30 at 22:44 +0100, Williams, Dan J wrote:
> I don't see a way around this beyond blacklisting this (platform, vt-d
> setting, driver) combination. Is there a quirk infrastructure for this
> sort of problem?

Yeah, kind of. If the IOAT PCI device _always_ has its own IOMMU, we
could have a quirk for it which says it must _never_ be matched by a
catch-all IOMMU. That would probably solve it?

--
David Woodhouse Open Source Technology Centre
David.Woodhouse(a)intel.com Intel Corporation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dan Williams on
On 6/30/2010 11:21 PM, Woodhouse, David wrote:
> On Wed, 2010-06-30 at 22:44 +0100, Williams, Dan J wrote:
>> I don't see a way around this beyond blacklisting this (platform, vt-d
>> setting, driver) combination. Is there a quirk infrastructure for this
>> sort of problem?
>
> Yeah, kind of. If the IOAT PCI device _always_ has its own IOMMU, we
> could have a quirk for it which says it must _never_ be matched by a
> catch-all IOMMU. That would probably solve it?
>

This version of the device only exists on the 5400 chipset and always
has its own iommu, but since other platforms get the DMAR entry right I
think this hammer is too big? Wouldn't this break VT-d operation on
non-busted platforms?

Alternatively I can just catch this failure earlier in the init process
and fail the driver load with a grumble printk about broken bios...
instead of the current BUG_ON() that is meant to catch runtime catastrophes.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Woodhouse on
On Thu, 2010-07-01 at 07:51 +0100, Williams, Dan J wrote:
> On 6/30/2010 11:21 PM, Woodhouse, David wrote:
> > On Wed, 2010-06-30 at 22:44 +0100, Williams, Dan J wrote:
> >> I don't see a way around this beyond blacklisting this (platform, vt-d
> >> setting, driver) combination. Is there a quirk infrastructure for this
> >> sort of problem?
> >
> > Yeah, kind of. If the IOAT PCI device _always_ has its own IOMMU, we
> > could have a quirk for it which says it must _never_ be matched by a
> > catch-all IOMMU. That would probably solve it?
> >
>
> This version of the device only exists on the 5400 chipset and always
> has its own iommu, but since other platforms get the DMAR entry right I
> think this hammer is too big? Wouldn't this break VT-d operation on
> non-busted platforms?

That just means we have to get the quirk right. Does 'this version' of
the device have its own PCI ID? We can always fall back to checking the
ID of the device at 0000:00:00.0 to check which chipset we're on.

> Alternatively I can just catch this failure earlier in the init process
> and fail the driver load with a grumble printk about broken bios...
> instead of the current BUG_ON() that is meant to catch runtime catastrophes.

Please use WARN_TAINT(TAINT_FIRMWARE_WORKAROUND). That way the
statistics end up in kerneloops.org and we have found that extremely
useful when LARTing the offending vendors.

--
David Woodhouse Open Source Technology Centre
David.Woodhouse(a)intel.com Intel Corporation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dan Williams on
On 7/1/2010 12:12 AM, Woodhouse, David wrote:
> On Thu, 2010-07-01 at 07:51 +0100, Williams, Dan J wrote:
>> This version of the device only exists on the 5400 chipset and always
>> has its own iommu, but since other platforms get the DMAR entry right I
>> think this hammer is too big? Wouldn't this break VT-d operation on
>> non-busted platforms?
>
> That just means we have to get the quirk right. Does 'this version' of
> the device have its own PCI ID? We can always fall back to checking the
> ID of the device at 0000:00:00.0 to check which chipset we're on.
>

PCI_DEVICE_ID_INTEL_IOAT_SNB only exists on this chipset and to date
only "MacPro3,1" platforms have this problem.

>> Alternatively I can just catch this failure earlier in the init process
>> and fail the driver load with a grumble printk about broken bios...
>> instead of the current BUG_ON() that is meant to catch runtime catastrophes.
>
> Please use WARN_TAINT(TAINT_FIRMWARE_WORKAROUND). That way the
> statistics end up in kerneloops.org and we have found that extremely
> useful when LARTing the offending vendors.
>

Good to know, thanks.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Woodhouse on
On Thu, 2010-07-01 at 08:26 +0100, Williams, Dan J wrote:
> On 7/1/2010 12:12 AM, Woodhouse, David wrote:
> > On Thu, 2010-07-01 at 07:51 +0100, Williams, Dan J wrote:
> >> This version of the device only exists on the 5400 chipset and always
> >> has its own iommu, but since other platforms get the DMAR entry right I
> >> think this hammer is too big? Wouldn't this break VT-d operation on
> >> non-busted platforms?
> >
> > That just means we have to get the quirk right. Does 'this version' of
> > the device have its own PCI ID? We can always fall back to checking the
> > ID of the device at 0000:00:00.0 to check which chipset we're on.
> >
>
> PCI_DEVICE_ID_INTEL_IOAT_SNB only exists on this chipset

Something like this, then?

diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index 0a19708..24ac178 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -543,8 +543,20 @@ dmar_find_matched_drhd_unit(struct pci_dev *dev)
header);

if (dmaru->include_all &&
- drhd->segment == pci_domain_nr(dev->bus))
+ drhd->segment == pci_domain_nr(dev->bus)) {
+ /* We know that this device on this chipset has its own
+ IOMMU. If we find it under the catch-all IOMMU, then
+ the BIOS is lying to us. Hope that the IOMMU for
+ this device is actually disabled, and it needs no
+ translation... */
+ if (dev->vendor == PCI_VENDOR_ID_INTEL &&
+ dev->device == PCI_DEVICE_ID_INTEL_IOAT_SNB) {
+ WARN_TAINT_ONCE(1, TAINT_FIRMWARE_WORKAROUND,
+ "BIOS wrongly included I/OAT device under catch-all VT-d unit\n");
+ return NULL;
+ }
return dmaru;
+ }

if (dmar_pci_device_match(dmaru->devices,
dmaru->devices_cnt, dev))


--
David Woodhouse Open Source Technology Centre
David.Woodhouse(a)intel.com Intel Corporation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/