From: Yinghai Lu on
Jens Axboe wrote:
> On Tue, Dec 15 2009, Jens Axboe wrote:
>> On Tue, Dec 15 2009, Jens Axboe wrote:
>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>> Jens Axboe wrote:
>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>>>> Jens Axboe wrote:
>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>>>>>> Jens Axboe wrote:
>>>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
>>>>>>>>>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
>>>>>>>>>>
>>>>>>>>>> it is above 0x100, so if mmconf is not enable, need to skip it
>>>>>>>>> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
>>>>>>>>> mmconf problem to begin with, are we now just working around the issue?
>>>>>>>>> SRAT still reports issues, numa doesn't work.
>>>>>>>> that patch will be bullet proof... we need it.
>>>>>>>>
>>>>>>>> also still need to figure out why memmap range is not passed properly.
>>>>>>>>
>>>>>>>> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
>>>>>>>> second kernel?
>>>>>>> Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
>>>>>>> complaints and NUMA works fine.
>>>>>> do you need
>>>>>> memmap=62G(a)4G
>>>>>> in this case?
>>>>> Yes, I've needed that always.
>>>> good,
>>>>
>>>> can you enable debug option in kexec to see why kexec can not pass
>>>> whole 38? range to second kernel?
>>> Not getting any output so far, -d doesn't do much. Poking around in the
>>> source...
>> OK, cold boot and kexec 2.0.1 gets all 39 ranges passed properly to
>> kexec'ed kernels. Since the older kexec stopped at range 30 (31 ranges
>> total), that smells like just a kexec bug. Retesting -git...
>
> Current -git works fine when all the ranges are passed correctly. So, I
> think, the only existing regression is the SRAT issue.

did you change node_shift?

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Markus Trippelsdorf on
On Tue, Dec 15, 2009 at 11:04:55AM -0800, Yinghai Lu wrote:
> Jens Axboe wrote:
> > On Tue, Dec 15 2009, Yinghai Lu wrote:
> >> [ 13.018720] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> >>
> >> [ 13.100724] [Firmware Bug]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
> >
> > On a "normal" non-kexec boot, I get:
> >
> > [ 12.173583] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> > [ 12.184075] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
> > [ 12.216874] PCI: Using configuration type 1 for base access
> >
>
> can you run following scripts in first kernel?
>
> cd /sys/firmware/memmap
> for dir in * ; do
> start=$(cat $dir/start)
> end=$(cat $dir/end)
> type=$(cat $dir/type)
> printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type" >> /tmp/memmap.txt
> done
>
> and send out /tmp/memmap.txt
>
> what is your kexec tools version? could be too old?

I have the same symptoms on my machine, but the underlying cause must be
different. I once reverted all Radeon related changes since 2.6.32 and
kexec started working again.

Full dmesg and the output of the script is attached.

kexec-tools 2.0.1 released 13th August 2009

--
Markus
From: Jens Axboe on
On Tue, Dec 15 2009, Yinghai Lu wrote:
> Jens Axboe wrote:
> > On Tue, Dec 15 2009, Jens Axboe wrote:
> >> On Tue, Dec 15 2009, Jens Axboe wrote:
> >>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>> Jens Axboe wrote:
> >>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>>>> Jens Axboe wrote:
> >>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>>>>>> Jens Axboe wrote:
> >>>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>>>>>>>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
> >>>>>>>>>>
> >>>>>>>>>> it is above 0x100, so if mmconf is not enable, need to skip it
> >>>>>>>>> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
> >>>>>>>>> mmconf problem to begin with, are we now just working around the issue?
> >>>>>>>>> SRAT still reports issues, numa doesn't work.
> >>>>>>>> that patch will be bullet proof... we need it.
> >>>>>>>>
> >>>>>>>> also still need to figure out why memmap range is not passed properly.
> >>>>>>>>
> >>>>>>>> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
> >>>>>>>> second kernel?
> >>>>>>> Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
> >>>>>>> complaints and NUMA works fine.
> >>>>>> do you need
> >>>>>> memmap=62G(a)4G
> >>>>>> in this case?
> >>>>> Yes, I've needed that always.
> >>>> good,
> >>>>
> >>>> can you enable debug option in kexec to see why kexec can not pass
> >>>> whole 38? range to second kernel?
> >>> Not getting any output so far, -d doesn't do much. Poking around in the
> >>> source...
> >> OK, cold boot and kexec 2.0.1 gets all 39 ranges passed properly to
> >> kexec'ed kernels. Since the older kexec stopped at range 30 (31 ranges
> >> total), that smells like just a kexec bug. Retesting -git...
> >
> > Current -git works fine when all the ranges are passed correctly. So, I
> > think, the only existing regression is the SRAT issue.
>
> did you change node_shift?

Yes:

CONFIG_NODES_SHIFT=6

What I don't get is that 2.6.32 and -git print the same PXM map, and in
both cases it's totalling exactly 64G. Yet it says:

SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Tue, Dec 15 2009, Yinghai Lu wrote:
> Jens Axboe wrote:
> > On Tue, Dec 15 2009, Jens Axboe wrote:
> >> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>> Jens Axboe wrote:
> >>>> On Tue, Dec 15 2009, Jens Axboe wrote:
> >>>>> On Tue, Dec 15 2009, Jens Axboe wrote:
> >>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>>>>> Jens Axboe wrote:
> >>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>>>>>>> Jens Axboe wrote:
> >>>>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>>>>>>>>> Jens Axboe wrote:
> >>>>>>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> >>>>>>>>>>>>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> it is above 0x100, so if mmconf is not enable, need to skip it
> >>>>>>>>>>>> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
> >>>>>>>>>>>> mmconf problem to begin with, are we now just working around the issue?
> >>>>>>>>>>>> SRAT still reports issues, numa doesn't work.
> >>>>>>>>>>> that patch will be bullet proof... we need it.
> >>>>>>>>>>>
> >>>>>>>>>>> also still need to figure out why memmap range is not passed properly.
> >>>>>>>>>>>
> >>>>>>>>>>> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
> >>>>>>>>>>> second kernel?
> >>>>>>>>>> Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
> >>>>>>>>>> complaints and NUMA works fine.
> >>>>>>>>> do you need
> >>>>>>>>> memmap=62G(a)4G
> >>>>>>>>> in this case?
> >>>>>>>> Yes, I've needed that always.
> >>>>>>> good,
> >>>>>>>
> >>>>>>> can you enable debug option in kexec to see why kexec can not pass
> >>>>>>> whole 38? range to second kernel?
> >>>>>> Not getting any output so far, -d doesn't do much. Poking around in the
> >>>>>> source...
> >>>>> OK, cold boot and kexec 2.0.1 gets all 39 ranges passed properly to
> >>>>> kexec'ed kernels. Since the older kexec stopped at range 30 (31 ranges
> >>>>> total), that smells like just a kexec bug. Retesting -git...
> >>>> Current -git works fine when all the ranges are passed correctly. So, I
> >>>> think, the only existing regression is the SRAT issue.
> >>> did you change node_shift?
> >> Yes:
> >>
> >> CONFIG_NODES_SHIFT=6
> >>
> >> What I don't get is that 2.6.32 and -git print the same PXM map, and in
> >> both cases it's totalling exactly 64G. Yet it says:
> >>
> >> SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used.
> >
> > Clue:
> >
> > [ 0.000000] SRAT: Node 0 PXM 0 0-80000000
> > [ 0.000000] SRAT: Node 0 PXM 0 100000000-480000000
> > [ 0.000000] SRAT: Node 2 PXM 1 480000000-880000000
> > [ 0.000000] SRAT: Node 1 PXM 2 880000000-c80000000
> > [ 0.000000] SRAT: Node 3 PXM 3 c80000000-1080000000
> > [ 0.000000] NUMA: Using 31 for the hash shift.
> > [ 0.000000] pxm0: 0-480000 (4718592), absent 553990
> > [ 0.000000] pxm1: 880000-c80000 (4194304), absent 0
> > [ 0.000000] pxm2: 480000-880000 (4194304), absent 4194304
> > [ 0.000000] pxm3: c80000-1080000 (4194304), absent 0
> > [ 0.000000] SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used.
> > [ 0.000000] SRAT: SRAT not used.
> >
>
> oh, i post one patch last week,
>
> can you check it?

Sure, let me try it. I already found out that commit 8716273c is the
guilty one (x86: Export srat physical topology).

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Tue, Dec 15 2009, Jens Axboe wrote:
> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > Jens Axboe wrote:
> > > On Tue, Dec 15 2009, Jens Axboe wrote:
> > >> On Tue, Dec 15 2009, Jens Axboe wrote:
> > >>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > >>>> Jens Axboe wrote:
> > >>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > >>>>>> Jens Axboe wrote:
> > >>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > >>>>>>>> Jens Axboe wrote:
> > >>>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > >>>>>>>>>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
> > >>>>>>>>>>
> > >>>>>>>>>> it is above 0x100, so if mmconf is not enable, need to skip it
> > >>>>>>>>> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
> > >>>>>>>>> mmconf problem to begin with, are we now just working around the issue?
> > >>>>>>>>> SRAT still reports issues, numa doesn't work.
> > >>>>>>>> that patch will be bullet proof... we need it.
> > >>>>>>>>
> > >>>>>>>> also still need to figure out why memmap range is not passed properly.
> > >>>>>>>>
> > >>>>>>>> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
> > >>>>>>>> second kernel?
> > >>>>>>> Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
> > >>>>>>> complaints and NUMA works fine.
> > >>>>>> do you need
> > >>>>>> memmap=62G(a)4G
> > >>>>>> in this case?
> > >>>>> Yes, I've needed that always.
> > >>>> good,
> > >>>>
> > >>>> can you enable debug option in kexec to see why kexec can not pass
> > >>>> whole 38? range to second kernel?
> > >>> Not getting any output so far, -d doesn't do much. Poking around in the
> > >>> source...
> > >> OK, cold boot and kexec 2.0.1 gets all 39 ranges passed properly to
> > >> kexec'ed kernels. Since the older kexec stopped at range 30 (31 ranges
> > >> total), that smells like just a kexec bug. Retesting -git...
> > >
> > > Current -git works fine when all the ranges are passed correctly. So, I
> > > think, the only existing regression is the SRAT issue.
> >
> > did you change node_shift?
>
> Yes:
>
> CONFIG_NODES_SHIFT=6
>
> What I don't get is that 2.6.32 and -git print the same PXM map, and in
> both cases it's totalling exactly 64G. Yet it says:
>
> SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used.

Clue:

[ 0.000000] SRAT: Node 0 PXM 0 0-80000000
[ 0.000000] SRAT: Node 0 PXM 0 100000000-480000000
[ 0.000000] SRAT: Node 2 PXM 1 480000000-880000000
[ 0.000000] SRAT: Node 1 PXM 2 880000000-c80000000
[ 0.000000] SRAT: Node 3 PXM 3 c80000000-1080000000
[ 0.000000] NUMA: Using 31 for the hash shift.
[ 0.000000] pxm0: 0-480000 (4718592), absent 553990
[ 0.000000] pxm1: 880000-c80000 (4194304), absent 0
[ 0.000000] pxm2: 480000-880000 (4194304), absent 4194304
[ 0.000000] pxm3: c80000-1080000 (4194304), absent 0
[ 0.000000] SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used.
[ 0.000000] SRAT: SRAT not used.

It's essentially disregarding pxm2, claiming all pages are absent.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/