From: Eric W. Biederman on
Dave Airlie <airlied(a)gmail.com> writes:

>>
>> the kernel is using mptable, and the  system have mcp55, so how come
>> with irq 35?
>> assume we should only have ioapic irq 0 - 23 ...
>>
>> Can you send out boot log with "debug apic=debug pci=routeirq" with
>> 2.6.32 and 2.6.35?
>
> Okay el6log is from a RHEL6 2.6.32 kernel, but it should give a good
> baseline, the 2.6.35 oops even earlier with all those options and is
> in the second attachment.

It appears we have a smoking gun:

For some reason setup_IO_APIC_IRQS thinks we at least 2 io_apics,
but we have only setup 1 io_apic. Since io_apics need a kmap entry
accessing an apic that hasn't been setup will definitely give a
page fault. It sounds like something is stomping nr_ioapics.

From: 2.6.35-debuglog
IOAPIC[0]: apic_id 8, version 17, address 0xfec00000, GSI 0-23
.....
IOAPIC[1]: Set routing entry (0-16 -> 0x51 -> IRQ 16 Mode:1 Active:1)

Can we get your System.map of the failing kernel (so we can see what
is close to nr_ioapics), and could you add a print statement in
arch/x86/kernel/apic/io_apic:setup_IO_APIC_irqs to print nr_ioapics?

I would be surprised if drm changes could have affected this.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric W. Biederman on
Dave Airlie <airlied(a)gmail.com> writes:

> On Tue, Aug 3, 2010 at 1:26 PM, Eric W. Biederman <ebiederm(a)xmission.com> wrote:
>> Dave Airlie <airlied(a)gmail.com> writes:

>>> Okay el6log is from a RHEL6 2.6.32 kernel, but it should give a good
>>> baseline, the 2.6.35 oops even earlier with all those options and is
>>> in the second attachment.
>>
>> It appears we have a smoking gun:
>>
>> For some reason setup_IO_APIC_IRQS thinks we at least 2 io_apics,
>> but we have only setup 1 io_apic.  Since io_apics need a kmap entry
>> accessing an apic that hasn't been setup will definitely give a
>> page fault.  It sounds like something is stomping nr_ioapics.
>>
>> From: 2.6.35-debuglog
>> IOAPIC[0]: apic_id 8, version 17, address 0xfec00000, GSI 0-23
>> ....
>> IOAPIC[1]: Set routing entry (0-16 -> 0x51 -> IRQ 16 Mode:1 Active:1)
>>
>> Can we get your System.map of the failing kernel (so we can see what
>> is close to nr_ioapics), and could you add a print statement in
>> arch/x86/kernel/apic/io_apic:setup_IO_APIC_irqs to print nr_ioapics?
>>
>> I would be surprised if drm changes could have affected this.
>>
>
> Okay, from my debug addition it still only seems to have one ioapic

Thanks. I goofed reading that code. I saw setup_IO_APIC_irq and made
the incorrect leap that said we came from setup_IO_APIC_irqs, when
in fact we are coming from io_apic_set_pci_routing.

So let's see can I figure out why we are getting a bad apic_id.

For that I need to track back to pirq_enable_irq, which leads
me to IO_APIC_get_PCI_irq_vector. The likely canidate is that we
simply are not finding the apicid that is present in the mp_irqs
entry that we decided to return. The patch below should add
appropriate debugging and fix the lookup

The real difference appears to be that acpi is disabled where it
is not disabled in your reference kernel.

Dave can you verify this fixes the oops for you?

It would be nice if we didn't crash early in boot even without
acpi present.

Eric

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index e41ed24..e824e14 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1067,7 +1067,7 @@ static int pin_2_irq(int idx, int apic, int pin)
int IO_APIC_get_PCI_irq_vector(int bus, int slot, int pin,
struct io_apic_irq_attr *irq_attr)
{
- int apic, i, best_guess = -1;
+ int i, best_guess = -1;

apic_printk(APIC_DEBUG,
"querying PCI -> IRQ mapping bus:%d, slot:%d, pin:%d.\n",
@@ -1080,16 +1080,29 @@ int IO_APIC_get_PCI_irq_vector(int bus, int slot, int pin,
for (i = 0; i < mp_irq_entries; i++) {
int lbus = mp_irqs[i].srcbus;

- for (apic = 0; apic < nr_ioapics; apic++)
- if (mp_ioapics[apic].apicid == mp_irqs[i].dstapic ||
- mp_irqs[i].dstapic == MP_APIC_ALL)
- break;
-
if (!test_bit(lbus, mp_bus_not_pci) &&
!mp_irqs[i].irqtype &&
(bus == lbus) &&
(slot == ((mp_irqs[i].srcbusirq >> 2) & 0x1f))) {
- int irq = pin_2_irq(i, apic, mp_irqs[i].dstirq);
+ int apic;
+ int irq;
+
+ /* Lookup the ioapic by id */
+ for (apic = 0; apic < nr_ioapics; apic++)
+ if (mp_ioapics[apic].apicid == mp_irqs[i].dstapic ||
+ mp_irqs[i].dstapic == MP_APIC_ALL)
+ break;
+
+ /* Verify we found the ioapic */
+ if (apic >= nr_ioapics) {
+ printk(KERN_ERR
+ "%02x:%02x.%c: APIC_ID %u pin: %u not found BIOS bug?\n",
+ bus, slot, 'A' + pin - 1,
+ mp_irqs[i].dstapic, mp_irqs[i].dstirq);
+ continue;
+ }
+
+ irq = pin_2_irq(i, apic, mp_irqs[i].dstirq);

if (!(apic || IO_APIC_IRQ(irq)))
continue;
@@ -1099,7 +1112,8 @@ int IO_APIC_get_PCI_irq_vector(int bus, int slot, int pin,
mp_irqs[i].dstirq,
irq_trigger(i),
irq_polarity(i));
- return irq;
+ best_guess = irq;
+ goto out;
}
/*
* Use the first all-but-pin matching entry as a
@@ -1114,6 +1128,12 @@ int IO_APIC_get_PCI_irq_vector(int bus, int slot, int pin,
}
}
}
+out:
+ if (best_guess >= 0)
+ apic_printk(APIC_DEBUG,
+ "%02x:%02x.%c: IRQ %u IOAPIC: %u pin: %u",
+ bus, slot, 'A' + pin - 1,
+ best_guess, irq_attr->ioapic, irq_attr->ioapic_pin);
return best_guess;
}
EXPORT_SYMBOL(IO_APIC_get_PCI_irq_vector);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Yinghai Lu on
On 08/02/2010 08:13 PM, Eric W. Biederman wrote:
> Yinghai Lu <yinghai(a)kernel.org> writes:
>
>> On 08/02/2010 06:32 PM, Yinghai Lu wrote:
>>> On 08/02/2010 04:17 PM, Dave Airlie wrote:
>>>>>
>>>>> the kernel is using mptable, and the system have mcp55, so how come
>>>>> with irq 35?
>>>>> assume we should only have ioapic irq 0 - 23 ...
>>>>>
>>>>> Can you send out boot log with "debug apic=debug pci=routeirq" with
>>>>> 2.6.32 and 2.6.35?
>>>>
>>>> Okay el6log is from a RHEL6 2.6.32 kernel, but it should give a good
>>>> baseline, the 2.6.35 oops even earlier with all those options and is
>>>> in the second attachment.
>>>
>>
>
> This patch is wrong and there is no reason to even suspect it will
> affect this problem. At best this patch will trade one set of bugs
> for another because at least on some platforms we always did something
> like this. Having an irq 35 is odd and certainly a result of recent
> changes, but in this case it doesn't look like it has anything to do
> with the problem.
>
> Nacked-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
>
>> please use this one instead..., forget to run quilt refresh before sending it.
>>
>> [PATCH -v2] x86: fix pin_2_irq mapping
>>
>> We should not twist gsi to irq mapping if acpi is not used.
>>
>> -v2 remove not used irq_to_gsi()
>>
>> Signed-off-by: Yinghai Lu <yinghai(a)kernel.org>
>>
>> ---
>> arch/x86/include/asm/io_apic.h | 10 ++++++++++
>> arch/x86/kernel/acpi/boot.c | 4 ++--
>> arch/x86/kernel/apic/io_apic.c | 5 +----
>> 3 files changed, 13 insertions(+), 6 deletions(-)
>>
>> Index: linux-2.6/arch/x86/include/asm/io_apic.h
>> ===================================================================
>> --- linux-2.6.orig/arch/x86/include/asm/io_apic.h
>> +++ linux-2.6/arch/x86/include/asm/io_apic.h
>> @@ -185,6 +185,16 @@ int mp_find_ioapic_pin(int ioapic, u32 g
>> void __init mp_register_ioapic(int id, u32 address, u32 gsi_base);
>> extern void __init pre_init_apic_IRQ0(void);
>>
>> +#ifdef CONFIG_ACPI
>> +unsigned int gsi_to_irq(unsigned int gsi);
>> +u32 irq_to_gsi(int irq);
>> +#else
>> +static inline unsigned int gsi_to_irq(unsigned int gsi)
>> +{
>> + return gsi;
>> +}
>> +#endif
>> +
>> #else /* !CONFIG_X86_IO_APIC */
>>
>> #define io_apic_assign_pci_irqs 0
>> Index: linux-2.6/arch/x86/kernel/acpi/boot.c
>> ===================================================================
>> --- linux-2.6.orig/arch/x86/kernel/acpi/boot.c
>> +++ linux-2.6/arch/x86/kernel/acpi/boot.c
>> @@ -100,7 +100,7 @@ static u32 isa_irq_to_gsi[NR_IRQS_LEGACY
>> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
>> };
>>
>> -static unsigned int gsi_to_irq(unsigned int gsi)
>> +unsigned int gsi_to_irq(unsigned int gsi)
>> {
>> unsigned int irq = gsi + NR_IRQS_LEGACY;
>> unsigned int i;
>> @@ -123,7 +123,7 @@ static unsigned int gsi_to_irq(unsigned
>> return irq;
>> }
>>
>> -static u32 irq_to_gsi(int irq)
>> +u32 irq_to_gsi(int irq)
>> {
>> unsigned int gsi;
>>
>> Index: linux-2.6/arch/x86/kernel/apic/io_apic.c
>> ===================================================================
>> --- linux-2.6.orig/arch/x86/kernel/apic/io_apic.c
>> +++ linux-2.6/arch/x86/kernel/apic/io_apic.c
>> @@ -1029,10 +1029,7 @@ static int pin_2_irq(int idx, int apic,
>> } else {
>> u32 gsi = mp_gsi_routing[apic].gsi_base + pin;
>>
>> - if (gsi >= NR_IRQS_LEGACY)
>> - irq = gsi;
>> - else
>> - irq = gsi_top + gsi;
>> + irq = gsi_to_irq(gsi);
>> }
>>
>> #ifdef CONFIG_X86_32

what is the point for making irq = gsi_top + gsi when mptable is used instead of acpi?

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Yinghai Lu on
On 08/03/2010 01:00 AM, Eric W. Biederman wrote:
> Yinghai Lu <yinghai(a)kernel.org> writes:
>
>>>> Index: linux-2.6/arch/x86/kernel/apic/io_apic.c
>>>> ===================================================================
>>>> --- linux-2.6.orig/arch/x86/kernel/apic/io_apic.c
>>>> +++ linux-2.6/arch/x86/kernel/apic/io_apic.c
>>>> @@ -1029,10 +1029,7 @@ static int pin_2_irq(int idx, int apic,
>>>> } else {
>>>> u32 gsi = mp_gsi_routing[apic].gsi_base + pin;
>>>>
>>>> - if (gsi >= NR_IRQS_LEGACY)
>>>> - irq = gsi;
>>>> - else
>>>> - irq = gsi_top + gsi;
>>>> + irq = gsi_to_irq(gsi);
>>>> }
>>>>
>>>> #ifdef CONFIG_X86_32
>>
>> what is the point for making irq = gsi_top + gsi when mptable is used instead of acpi?
>
> Because it is only convention that when mptables are used that the
> first apic pins 0-15 are the ISA irqs. This thread witnessed and a
> pci irq that came in pin < 16 that was not an ISA irq. The truly rare
> and exotic case would be for the ISA irqs to be outside the first 16
> ioapic pins but the es7000 did exactly that.

nvidia chipset if acpi is enabled, external pci device will use ioapic from 16 to 23.

if mptable is used, external pci device will not use pin from 16 to 23..., and lot of devices will share same pin.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric W. Biederman on
Yinghai Lu <yinghai(a)kernel.org> writes:

>>> Index: linux-2.6/arch/x86/kernel/apic/io_apic.c
>>> ===================================================================
>>> --- linux-2.6.orig/arch/x86/kernel/apic/io_apic.c
>>> +++ linux-2.6/arch/x86/kernel/apic/io_apic.c
>>> @@ -1029,10 +1029,7 @@ static int pin_2_irq(int idx, int apic,
>>> } else {
>>> u32 gsi = mp_gsi_routing[apic].gsi_base + pin;
>>>
>>> - if (gsi >= NR_IRQS_LEGACY)
>>> - irq = gsi;
>>> - else
>>> - irq = gsi_top + gsi;
>>> + irq = gsi_to_irq(gsi);
>>> }
>>>
>>> #ifdef CONFIG_X86_32
>
> what is the point for making irq = gsi_top + gsi when mptable is used instead of acpi?

Because it is only convention that when mptables are used that the
first apic pins 0-15 are the ISA irqs. This thread witnessed and a
pci irq that came in pin < 16 that was not an ISA irq. The truly rare
and exotic case would be for the ISA irqs to be outside the first 16
ioapic pins but the es7000 did exactly that.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/