From: Stephen Rothwell on
Hi all,

My Power7 boot test paniced like this: (next-20100722)

%GQLogic Fibre Channel HBA Driver: 8.03.03-k0
qla2xxx 0002:01:00.2: enabling device (0144 -> 0146)
qla2xxx 0002:01:00.2: Found an ISP8001, irq 35, iobase 0xd000080080014000
------------[ cut here ]------------
kernel BUG at drivers/pci/msi.c:205!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=128 NUMA pSeries
last sysfs file: /sys/devices/virtual/tty/ptyz8/uevent
Modules linked in: qla2xxx(+)
NIP: c0000000002fcd54 LR: c000000000048d9c CTR: 0000000000000001
REGS: c00000000278aff0 TRAP: 0700 Not tainted (2.6.35-rc5-autokern1-next-20100721)
MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 28422488 XER: 20000008
TASK = c000000002008000[2226] 'modprobe' THREAD: c000000002788000 CPU: 12
GPR00: 0000000000000001 c00000000278b270 c0000000009a36d0 c0000000009b8900
GPR04: c00000000278b2e8 ffffffffffffffff 0000000000000000 0000000000020000
GPR08: 00000000000033e7 c00000000a38b280 0000000000000000 0000000000000000
GPR12: 0000000088422488 c00000000f331800 00000fff921750a0 0000000000000000
GPR16: 0000000010033110 00000000100334b8 0000000000000000 0000000000000000
GPR20: d000080080018000 0000000000022225 c0000000009f7bb4 0000000000010200
GPR24: 000000002000020d 0000000000000025 c00000000278b2e0 c00000000278b2e8
GPR28: 0000000000000001 c00000000d0ac5f8 c000000000af8f00 c00000000a38b280
NIP [c0000000002fcd54] .read_msi_msg_desc+0x24/0x3c
LR [c000000000048d9c] .rtas_setup_msi_irqs+0x1d8/0x254
Call Trace:
[c00000000278b270] [c000000000048d9c] .rtas_setup_msi_irqs+0x1d8/0x254 (unreliable)
[c00000000278b360] [c00000000002a9cc] .arch_setup_msi_irqs+0x34/0x4c
[c00000000278b3e0] [c0000000002fd3fc] .pci_enable_msix+0x49c/0x4ac
[c00000000278b4c0] [d0000000001a5e30] .qla2x00_request_irqs+0x158/0x5b4 [qla2xxx]
[c00000000278b580] [d0000000001cb41c] .qla2x00_probe_one+0xeac/0x63b0 [qla2xxx]
[c00000000278b6f0] [c0000000002f5c4c] .local_pci_probe+0x34/0x48
[c00000000278b760] [c0000000002f6078] .pci_device_probe+0xe8/0x130
[c00000000278b810] [c00000000036e648] .driver_probe_device+0xdc/0x1a4
[c00000000278b8a0] [c00000000036e7a4] .__driver_attach+0x94/0xd8
[c00000000278b930] [c00000000036dabc] .bus_for_each_dev+0x7c/0xe0
[c00000000278b9e0] [c00000000036e410] .driver_attach+0x28/0x40
[c00000000278ba60] [c00000000036d134] .bus_add_driver+0x144/0x310
[c00000000278bb10] [c00000000036ec28] .driver_register+0xd8/0x198
[c00000000278bbb0] [c0000000002f63d0] .__pci_register_driver+0x60/0x10c
[c00000000278bc50] [d0000000001ca520] .qla2x00_module_init+0x150/0x1a0 [qla2xxx]
[c00000000278bce0] [c00000000000947c] .do_one_initcall+0x80/0x1a8
[c00000000278bd90] [c0000000000a4364] .SyS_init_module+0xd8/0x244
[c00000000278be30] [c00000000000852c] syscall_exit+0x0/0x40
Instruction dump:
ebe1fff8 7c0803a6 4e800020 e9230028 81490030 80090034 81690038 7d400378
7c005b78 7c000034 5400d97e 78000020 <0b000000> 81690038 e8090030 91640008
---[ end trace f67a78811ed47c60 ]---
%Gudevd-work[1379]: '/sbin/modprobe -b pci:v00001077d00008001sv00001077sd0000017Fbc0Csc04i00' unexpected exit with status 0x0005

That line number is this:

BUG_ON(!(entry->msg.address_hi | entry->msg.address_lo |
entry->msg.data));

in read_msi_msg_desc(). That BUG_ON was added by commit
2ca1af9aa3285c6a5f103ed31ad09f7399fc65d7 ("PCI: MSI: Remove unsafe and
unnecessary hardware access") from the pci tree.

--
Cheers,
Stephen Rothwell sfr(a)canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
From: Ben Hutchings on
On Fri, 2010-07-23 at 10:22 +1000, Stephen Rothwell wrote:
> Hi all,
>
> My Power7 boot test paniced like this: (next-20100722)
>
> %GQLogic Fibre Channel HBA Driver: 8.03.03-k0
> qla2xxx 0002:01:00.2: enabling device (0144 -> 0146)
> qla2xxx 0002:01:00.2: Found an ISP8001, irq 35, iobase 0xd000080080014000
> ------------[ cut here ]------------
> kernel BUG at drivers/pci/msi.c:205!
[...]
> Call Trace:
> [c00000000278b270] [c000000000048d9c] .rtas_setup_msi_irqs+0x1d8/0x254 (unreliable)
> [c00000000278b360] [c00000000002a9cc] .arch_setup_msi_irqs+0x34/0x4c
> [c00000000278b3e0] [c0000000002fd3fc] .pci_enable_msix+0x49c/0x4ac
[...]
> That line number is this:
>
> BUG_ON(!(entry->msg.address_hi | entry->msg.address_lo |
> entry->msg.data));
>
> in read_msi_msg_desc(). That BUG_ON was added by commit
> 2ca1af9aa3285c6a5f103ed31ad09f7399fc65d7 ("PCI: MSI: Remove unsafe and
> unnecessary hardware access") from the pci tree.

I wanted to assert that read_msi_msg_desc() is only used to update
MSI/MSI-X descriptors that have already been generated by Linux. It
looks like you found an exception.

We could make read_msi_msg() fall back to reading from the hardware, but
I think that what the pSeries code is trying to do - save an MSI message
generated by firmware - is different from what the other callers want.
Instead we could add:

void save_msi_msg(unsigned int irq)
{
struct irq_desc *desc = irq_to_desc(irq);
struct msi_desc *entry = get_irq_desc_msi(desc);
struct msi_msg *msg = &entry->msg;

/* ...followed by the old implementation of read_msi_msg_desc() */
}

Possibly conditional on something like CONFIG_ARCH_NEEDS_SAVE_MSI_MSG.

Ben.

--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael Ellerman on
On Fri, 2010-07-23 at 02:19 +0100, Ben Hutchings wrote:
> On Fri, 2010-07-23 at 10:22 +1000, Stephen Rothwell wrote:
> > Hi all,
> >
> > My Power7 boot test paniced like this: (next-20100722)
> >
> > %GQLogic Fibre Channel HBA Driver: 8.03.03-k0
> > qla2xxx 0002:01:00.2: enabling device (0144 -> 0146)
> > qla2xxx 0002:01:00.2: Found an ISP8001, irq 35, iobase 0xd000080080014000
> > ------------[ cut here ]------------
> > kernel BUG at drivers/pci/msi.c:205!
> [...]
> > Call Trace:
> > [c00000000278b270] [c000000000048d9c] .rtas_setup_msi_irqs+0x1d8/0x254 (unreliable)
> > [c00000000278b360] [c00000000002a9cc] .arch_setup_msi_irqs+0x34/0x4c
> > [c00000000278b3e0] [c0000000002fd3fc] .pci_enable_msix+0x49c/0x4ac
> [...]
> > That line number is this:
> >
> > BUG_ON(!(entry->msg.address_hi | entry->msg.address_lo |
> > entry->msg.data));
> >
> > in read_msi_msg_desc(). That BUG_ON was added by commit
> > 2ca1af9aa3285c6a5f103ed31ad09f7399fc65d7 ("PCI: MSI: Remove unsafe and
> > unnecessary hardware access") from the pci tree.
>
> I wanted to assert that read_msi_msg_desc() is only used to update
> MSI/MSI-X descriptors that have already been generated by Linux. It
> looks like you found an exception.
>
> We could make read_msi_msg() fall back to reading from the hardware, but
> I think that what the pSeries code is trying to do - save an MSI message
> generated by firmware - is different from what the other callers want.
> Instead we could add:
>
> void save_msi_msg(unsigned int irq)
> {
> struct irq_desc *desc = irq_to_desc(irq);
> struct msi_desc *entry = get_irq_desc_msi(desc);
> struct msi_msg *msg = &entry->msg;
>
> /* ...followed by the old implementation of read_msi_msg_desc() */
> }
>
> Possibly conditional on something like CONFIG_ARCH_NEEDS_SAVE_MSI_MSG.

Maybe.

But then you end up with read_msi_msg(), which doesn't actually read
anything, which I think is confusing. I'd rather read_msi_msg() read the
message, from the device, and we have another routine which returns the
previously saved msg from the msi_desc.

cheers