From: Benjamin Herrenschmidt on
On Tue, 2010-07-27 at 23:08 -0700, Yinghai Lu wrote:
>
> for example:
> high/low allocation, from first kernel to kexec second kernel, always
> work fine except system with Qlogic card.
> because Qlogic card is using main RAM as EFT etc for card's FW log
> trace. second kernel have not idea that those RAM
> is used by first kernel for that purpose. that the CARD still use
> that between two kernels.
> second kernel could have crash it try to use those ram.
>
> low/high allocation seems to be safe, second kernel can slip to boot
> fine.

No, it works 'by chance'. You need kexec to somewhat mark those regions
as reserved. I don't know how x86 does those things, on architectures
using the flat device-tree, we have added a concept of "reserve map" to
the flat device tree blob to mark that kind of region.

Also, because you mark your new function as weak but not the one that's
actually used by memblock_alloc(), it will still end up being top-down,
so if you want to switch to bottom up, make the internal function weak,
not the wrapper.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on
On 07/28/2010 12:12 AM, Yinghai Lu wrote:
>
> the problem is later if the user hit the problem, it will be called "Regression" after bisecting to the memblock/x86 changes.
> because low/high does work before.
>
> BTW, that design from qlogic to save log in RAM is not good one, they may save some cents for the ram in card.
>
> other vendors seems put log/trace in the ram on card.
>

It's broken NOW. The only reason it's not exploding is by accident.
The fact that you knew about the problem and had the notion of working
around it instead of fixing the root cause by either fixing or
blacklisting the broken driver is disgusting beyond belief.

Either the driver needs to map the memory off limit in the e820 map
passed to kexec, or better yet it should have its DMA disabled across
kexec. I'm truly appalled.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: James Bottomley on
On Tue, 2010-07-27 at 23:38 -0700, H. Peter Anvin wrote:
> On 07/27/2010 11:08 PM, Yinghai Lu wrote:
> >
> > for example:
> > high/low allocation, from first kernel to kexec second kernel, always work fine except system with Qlogic card.
> > because Qlogic card is using main RAM as EFT etc for card's FW log trace. second kernel have not idea that those RAM
> > is used by first kernel for that purpose. that the CARD still use that between two kernels.
> > second kernel could have crash it try to use those ram.
> >
>
> Uhm, no. That's a bug in the Qlogic driver not shutting the card down
> cleanly. Hacking around that in memory allocation order is braindamaged
> in the extreme. kexec *cannot* be safe in any way if we don't shut down
> pending DMA, and what you describe above is DMA.

That's not the kexec for crash dump requirement as it was communicated
to us. We were specifically told that the shutdown routines *may* not
be called before booting the kexec kernel and thus we have to take
action to stop the DMA engines in the init routines so the kexec kernel
can halt all in-progress DMA as it boots. This implies that kexec must
be able to cope with in-progress DMA.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on
On 07/28/2010 10:02 AM, James Bottomley wrote:
> On Tue, 2010-07-27 at 23:38 -0700, H. Peter Anvin wrote:
>> On 07/27/2010 11:08 PM, Yinghai Lu wrote:
>>>
>>> for example:
>>> high/low allocation, from first kernel to kexec second kernel, always work fine except system with Qlogic card.
>>> because Qlogic card is using main RAM as EFT etc for card's FW log trace. second kernel have not idea that those RAM
>>> is used by first kernel for that purpose. that the CARD still use that between two kernels.
>>> second kernel could have crash it try to use those ram.
>>>
>>
>> Uhm, no. That's a bug in the Qlogic driver not shutting the card down
>> cleanly. Hacking around that in memory allocation order is braindamaged
>> in the extreme. kexec *cannot* be safe in any way if we don't shut down
>> pending DMA, and what you describe above is DMA.
>
> That's not the kexec for crash dump requirement as it was communicated
> to us. We were specifically told that the shutdown routines *may* not
> be called before booting the kexec kernel and thus we have to take
> action to stop the DMA engines in the init routines so the kexec kernel
> can halt all in-progress DMA as it boots. This implies that kexec must
> be able to cope with in-progress DMA.
>

kexec for crash dump is a special case: for crash dump, there is a chunk
of memory pre-reserved for the crash kernel, and that is the *only*
memory that the crash kernel will use. In other words, everything else
is reserved memory as far as the crash kernel is concerned. As such, it
should not be affected; there may be DMA still pending to the main
kernel's memory area, of course, but as far as the crash kernel is
concerned, that should just be input data.

If allocation order somehow matters for the *crash kernel*, then we have
even more fundamental problems...

Obviously, if there is DMA going on to the crash kernel reserved region
then all bets are off, but at that point the system is so screwed anyway
that it shouldn't matter.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: James Bottomley on
On Wed, 2010-07-28 at 10:53 -0700, H. Peter Anvin wrote:
> On 07/28/2010 10:02 AM, James Bottomley wrote:
> > On Tue, 2010-07-27 at 23:38 -0700, H. Peter Anvin wrote:
> >> On 07/27/2010 11:08 PM, Yinghai Lu wrote:
> >>>
> >>> for example:
> >>> high/low allocation, from first kernel to kexec second kernel, always work fine except system with Qlogic card.
> >>> because Qlogic card is using main RAM as EFT etc for card's FW log trace. second kernel have not idea that those RAM
> >>> is used by first kernel for that purpose. that the CARD still use that between two kernels.
> >>> second kernel could have crash it try to use those ram.
> >>>
> >>
> >> Uhm, no. That's a bug in the Qlogic driver not shutting the card down
> >> cleanly. Hacking around that in memory allocation order is braindamaged
> >> in the extreme. kexec *cannot* be safe in any way if we don't shut down
> >> pending DMA, and what you describe above is DMA.
> >
> > That's not the kexec for crash dump requirement as it was communicated
> > to us. We were specifically told that the shutdown routines *may* not
> > be called before booting the kexec kernel and thus we have to take
> > action to stop the DMA engines in the init routines so the kexec kernel
> > can halt all in-progress DMA as it boots. This implies that kexec must
> > be able to cope with in-progress DMA.
> >
>
> kexec for crash dump is a special case: for crash dump, there is a chunk
> of memory pre-reserved for the crash kernel, and that is the *only*
> memory that the crash kernel will use. In other words, everything else
> is reserved memory as far as the crash kernel is concerned. As such, it
> should not be affected; there may be DMA still pending to the main
> kernel's memory area, of course, but as far as the crash kernel is
> concerned, that should just be input data.
>
> If allocation order somehow matters for the *crash kernel*, then we have
> even more fundamental problems...
>
> Obviously, if there is DMA going on to the crash kernel reserved region
> then all bets are off, but at that point the system is so screwed anyway
> that it shouldn't matter.

So I don't understand the problem. Proper shutdown of the old kernel
will halt all the DMA engines (by design ... we can't have DMA ongoing
if the next action might be power off). The only case I know where DMA
engines may be active is the crash kernel case.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/