From: Stephen Lord on

Hi,

I am having troubles getting any recent kernel to boot successfully
on one of my machines, a generic 2.6GHz P4 box with HT enabled
running an updated Fedora Core 3 distro. This is present in
2.6.12-rc6. It does not manifest itself with the Fedora Core
kernels which have identical initrd contents as far as the
init script and the set of modules included goes.

The problem manifests itself as various undefined symbols from
module loads. Here is the relevant section from the init script:

echo Starting udev
/sbin/udevstart
echo -n "/sbin/hotplug" > /proc/sys/kernel/hotplug
echo "Loading scsi_mod.ko module"
insmod /lib/scsi_mod.ko
echo "Loading sd_mod.ko module"
insmod /lib/sd_mod.ko
echo "Loading libata.ko module"
insmod /lib/libata.ko
echo "Loading ata_piix.ko module"
insmod /lib/ata_piix.ko
echo "Loading ieee1394.ko module"
insmod /lib/ieee1394.ko
echo "Loading ohci1394.ko module"
insmod /lib/ohci1394.ko
echo "Loading sbp2.ko module"
insmod /lib/sbp2.ko
echo "Loading dm-mod.ko module"
insmod /lib/dm-mod.ko
echo "Loading jbd.ko module"
insmod /lib/jbd.ko
echo "Loading ext3.ko module"
insmod /lib/ext3.ko
echo "Loading dm-mirror.ko module"
insmod /lib/dm-mirror.ko
echo "Loading dm-zero.ko module"
insmod /lib/dm-zero.ko
echo "Loading dm-snapshot.ko module"
insmod /lib/dm-snapshot.ko
/sbin/udevstart

The failures are different on different boots, sometimes the ata_piix
module cannot find symbols from libata, sometimes ext3 cannot find jbd
symbols, sometimes dm modules cannot find things from dm-mod, usually
it is a combination of these. End result is a panic when it cannot
find the root device.

From the behavior, it appears that a module load is returning
control to user space before the previous one has got its symbols
loaded.

The loadable module section of my config file looks like this:

CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
CONFIG_OBSOLETE_MODPARM=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y

Since the fedora core kernels work consistently, this is either
something about my config triggering this, or something
redhat has in their kernels - I suspect the former.

My module tools are "module-init-tools version 3.1-pre5"
which seems to be as far as Redhat has updated their rpms
to. I will try a later version of these to see if it makes
any difference.

Complete .config attached.

Steve

From: Andrew Morton on
Stephen Lord <lord(a)xfs.org> wrote:
>
> I am having troubles getting any recent kernel to boot successfully
> on one of my machines, a generic 2.6GHz P4 box with HT enabled
> running an updated Fedora Core 3 distro. This is present in
> 2.6.12-rc6. It does not manifest itself with the Fedora Core
> kernels which have identical initrd contents as far as the
> init script and the set of modules included goes.
>
> The problem manifests itself as various undefined symbols from
> module loads.

Peculiar. Module loading is all synchronous, isn't it?

> ...
> The failures are different on different boots, sometimes the ata_piix
> module cannot find symbols from libata, sometimes ext3 cannot find jbd
> symbols, sometimes dm modules cannot find things from dm-mod, usually
> it is a combination of these. End result is a panic when it cannot
> find the root device.
>
> From the behavior, it appears that a module load is returning
> control to user space before the previous one has got its symbols
> loaded.

I wonder if rather than the intermittency being time-based, it is
load-address-based? For example, suppose there's a bug in the symbol
lookup code?

Have you tried using a different gcc version?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Steve Lord on
Andrew Morton wrote:
> Stephen Lord <lord(a)xfs.org> wrote:
>
>>I am having troubles getting any recent kernel to boot successfully
>> on one of my machines, a generic 2.6GHz P4 box with HT enabled
>> running an updated Fedora Core 3 distro. This is present in
>> 2.6.12-rc6. It does not manifest itself with the Fedora Core
>> kernels which have identical initrd contents as far as the
>> init script and the set of modules included goes.
>>
>> The problem manifests itself as various undefined symbols from
>> module loads.
>
>
> Peculiar. Module loading is all synchronous, isn't it?


Hmm, now that I found the code, yes it is. insmod itself appears
to do no fancy foot work either.

>
>
>>...
>> The failures are different on different boots, sometimes the ata_piix
>> module cannot find symbols from libata, sometimes ext3 cannot find jbd
>> symbols, sometimes dm modules cannot find things from dm-mod, usually
>> it is a combination of these. End result is a panic when it cannot
>> find the root device.
>>
>> From the behavior, it appears that a module load is returning
>> control to user space before the previous one has got its symbols
>> loaded.
>
>
> I wonder if rather than the intermittency being time-based, it is
> load-address-based? For example, suppose there's a bug in the symbol
> lookup code?
>
> Have you tried using a different gcc version?
>

Don't have one handy at the moment, I am away from the machine right
now as well. I have been updating the machine using redhat's update
tools, so the compiler should be the same one I have here:

gcc (GCC) 3.4.3 20050227 (Red Hat 3.4.3-22.fc3)

That should also be a fairly common compiler variant.

I presume this is what redhat does their kernel builds with, so that
should be the same too. Shouldn't the memory map be pretty much
identical on each boot? Things are pretty deterministic at this
stage in the process, and the symbol match failures are not always
the same.

If this was a memory problem it seems like I would see more random
oopses than this. I added more memory to the machine a month or so
back, and had to detune the bios settings a little to make it stable.
It would be odd that a 2.6.11 kernel was rock solid and a 2.6.12-rc6
falls over so quickly if that was the case.

I can play with the init script some and maybe dump out the symbol
table after an insmod.

Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Stephen Lord on
Andrew Morton wrote:
> Stephen Lord <lord(a)xfs.org> wrote:
>
>>I am having troubles getting any recent kernel to boot successfully
>> on one of my machines, a generic 2.6GHz P4 box with HT enabled
>> running an updated Fedora Core 3 distro. This is present in
>> 2.6.12-rc6. It does not manifest itself with the Fedora Core
>> kernels which have identical initrd contents as far as the
>> init script and the set of modules included goes.
>>
>> The problem manifests itself as various undefined symbols from
>> module loads.
>
>
> Peculiar. Module loading is all synchronous, isn't it?
>

Well, things are getting more bizarre, adding sleeps between
module loads cures the problem with missing symbols. I then
run into a problem with device mapper/lvm which seems to be
having problems setting up devices. In this section of
the init script:

umount /sys
echo Mounting root filesystem
mount -o defaults --ro -t ext3 /dev/root /sysroot
mount -t tmpfs --bind /dev /sysroot/dev
echo Switching to new root
switchroot /sysroot
umount /initrd/dev

The correct number of volumes are found, but adding a showlabels
command to the init script fails to display them, it spits out
errors about readdir failures in /dev/Volume00

The umount of /sys fails, the root mount fails and obviously, the
switchroot then fails.

I tried using the same config options as the redhat supplied
kernel without any success, this still has module symbol
problems.

I am baffled, but it looks like it is not a symbol table problem.

Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Pozsár Balázs on
On Fri, Jun 10, 2005 at 11:25:15AM -0700, Andrew Morton wrote:
> I wonder if rather than the intermittency being time-based, it is
> load-address-based? For example, suppose there's a bug in the symbol
> lookup code?

Just a data point: I met the same problem with 2.6.12-rc5, using
gcc 3.3.4.
I think it's time-based issue, because I was playing around with the
initscripts, and the bug shows up when there are lots of modprobes in a
short time.



--
pozsy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
 |  Next  |  Last
Pages: 1 2 3 4 5 6
Next: Am1771 wireless driver?