From: H. Peter Anvin on
On 07/28/2010 11:10 AM, James Bottomley wrote:
>
> So I don't understand the problem. Proper shutdown of the old kernel
> will halt all the DMA engines (by design ... we can't have DMA ongoing
> if the next action might be power off). The only case I know where DMA
> engines may be active is the crash kernel case.
>

I'm not sure I fully understand the exact problem, either; not being
familiar with this putative "logging" facility of the Qlogic devices.
My point was largely that if a device causes failures because of the
choice of the allocation order, then we have a much bigger problem and
papering over it by trying to muck with the allocation order is just wrong.

This logging facility of Qlogic is DMA, no more, no less. It needs to
be shut down on a "overwrite" kexec, where we replace one kernel with
another, as opposed to a crash dump kexec, where we use a reserved chunk
of virgin memory. What I don't know/understand at the moment is if
there is something "special" about this particular logging facility,
e.g. if the Qlogic card ignore the bus mastering control bit -- which
would be reckless but I can see someone having the bright idea to do that.

Yinghai, do you have any more detail, or know who would? Also copying
the Qlogic Infinipath maintainer email...

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Yinghai Lu on
On 07/28/2010 11:30 AM, H. Peter Anvin wrote:
> On 07/28/2010 11:10 AM, James Bottomley wrote:
>>
>> So I don't understand the problem. Proper shutdown of the old kernel
>> will halt all the DMA engines (by design ... we can't have DMA ongoing
>> if the next action might be power off). The only case I know where DMA
>> engines may be active is the crash kernel case.
>>
>
> I'm not sure I fully understand the exact problem, either; not being
> familiar with this putative "logging" facility of the Qlogic devices.
> My point was largely that if a device causes failures because of the
> choice of the allocation order, then we have a much bigger problem and
> papering over it by trying to muck with the allocation order is just wrong.
>
> This logging facility of Qlogic is DMA, no more, no less. It needs to
> be shut down on a "overwrite" kexec, where we replace one kernel with
> another, as opposed to a crash dump kexec, where we use a reserved chunk
> of virgin memory. What I don't know/understand at the moment is if
> there is something "special" about this particular logging facility,
> e.g. if the Qlogic card ignore the bus mastering control bit -- which
> would be reckless but I can see someone having the bright idea to do that.
>
> Yinghai, do you have any more detail, or know who would? Also copying
> the Qlogic Infinipath maintainer email...

when I was debug memblock with x86, found the strange crash when high/low.

then use kexec with "memtest" in command line, and the early memtest does find

some bad memory.

then I add more print about EPT physical address for first kernel,

it does show that range is used by qla driver in first kernel.

I built all needed drivers in kernel so can pxeboot the kernel on all test platforms easily.

Thanks

Yinghai


---
drivers/scsi/qla2xxx/qla_init.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux-2.6/drivers/scsi/qla2xxx/qla_init.c
===================================================================
--- linux-2.6.orig/drivers/scsi/qla2xxx/qla_init.c
+++ linux-2.6/drivers/scsi/qla2xxx/qla_init.c
@@ -1327,8 +1327,8 @@ qla2x00_alloc_fw_dump(scsi_qla_host_t *v
goto try_eft;
}

- qla_printk(KERN_INFO, ha, "Allocated (%d KB) for FCE...\n",
- FCE_SIZE / 1024);
+ qla_printk(KERN_INFO, ha, "Allocated (%d KB) at %p for FCE...\n",
+ FCE_SIZE / 1024, tc);

fce_size = sizeof(struct qla2xxx_fce_chain) + FCE_SIZE;
ha->flags.fce_enabled = 1;
@@ -1354,8 +1354,8 @@ try_eft:
goto cont_alloc;
}

- qla_printk(KERN_INFO, ha, "Allocated (%d KB) for EFT...\n",
- EFT_SIZE / 1024);
+ qla_printk(KERN_INFO, ha, "Allocated (%d KB) at %p for EFT...\n",
+ EFT_SIZE / 1024, tc);

eft_size = EFT_SIZE;
ha->eft_dma = tc_dma;
@@ -1383,8 +1383,8 @@ cont_alloc:
}
return;
}
- qla_printk(KERN_INFO, ha, "Allocated (%d KB) for firmware dump...\n",
- dump_size / 1024);
+ qla_printk(KERN_INFO, ha, "Allocated (%d KB) at %p for firmware dump...\n",
+ dump_size / 1024, ha->fw_dump);

ha->fw_dump_len = dump_size;
ha->fw_dump->signature[0] = 'Q';
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on
On 07/28/2010 12:27 PM, Yinghai Lu wrote:
>>
>> Yinghai, do you have any more detail, or know who would? Also copying
>> the Qlogic Infinipath maintainer email...
>
> when I was debug memblock with x86, found the strange crash when high/low.
> then use kexec with "memtest" in command line, and the early memtest does find
> some bad memory.
>
> then I add more print about EPT physical address for first kernel,
> it does show that range is used by qla driver in first kernel.
> I built all needed drivers in kernel so can pxeboot the kernel on all test platforms easily.
>

[Cc: Andrew Vasquez, who seems to have written the offending code,
checkin df613b96077cee826b14089ae6e75eeabf71faa3.]

The question is still open why this particular DMA activity was not shut
down before the kexec. I'm not familiar with how non-crashdump kexec
idles the hardware, but it obviously better do so.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ralph Campbell on
On Wed, 2010-07-28 at 11:30 -0700, H. Peter Anvin wrote:
> On 07/28/2010 11:10 AM, James Bottomley wrote:
> >
> > So I don't understand the problem. Proper shutdown of the old kernel
> > will halt all the DMA engines (by design ... we can't have DMA ongoing
> > if the next action might be power off). The only case I know where DMA
> > engines may be active is the crash kernel case.
> >
>
> I'm not sure I fully understand the exact problem, either; not being
> familiar with this putative "logging" facility of the Qlogic devices.
> My point was largely that if a device causes failures because of the
> choice of the allocation order, then we have a much bigger problem and
> papering over it by trying to muck with the allocation order is just wrong.
>
> This logging facility of Qlogic is DMA, no more, no less. It needs to
> be shut down on a "overwrite" kexec, where we replace one kernel with
> another, as opposed to a crash dump kexec, where we use a reserved chunk
> of virgin memory. What I don't know/understand at the moment is if
> there is something "special" about this particular logging facility,
> e.g. if the Qlogic card ignore the bus mastering control bit -- which
> would be reckless but I can see someone having the bright idea to do that.
>
> Yinghai, do you have any more detail, or know who would? Also copying
> the Qlogic Infinipath maintainer email...
>
> -hpa

I read the messages in this thread but I don't understand what the
problem is. Something to do with logging, DMA and crash dumps but
it also sounds like the original discussion may be confused about
how the Infiniband HCA cards work.

Can someone summarize what is going on...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on
On 07/28/2010 03:58 PM, Ralph Campbell wrote:
>
> I read the messages in this thread but I don't understand what the
> problem is. Something to do with logging, DMA and crash dumps but
> it also sounds like the original discussion may be confused about
> how the Infiniband HCA cards work.
>
> Can someone summarize what is going on...
>

Sorry, I was confused... this had to do with the qla driver, not Infinipath.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/