|
Next: Am1771 wireless driver?
From: Stephen Lord on 10 Jun 2005 10:10 Hi, I am having troubles getting any recent kernel to boot successfully on one of my machines, a generic 2.6GHz P4 box with HT enabled running an updated Fedora Core 3 distro. This is present in 2.6.12-rc6. It does not manifest itself with the Fedora Core kernels which have identical initrd contents as far as the init script and the set of modules included goes. The problem manifests itself as various undefined symbols from module loads. Here is the relevant section from the init script: echo Starting udev /sbin/udevstart echo -n "/sbin/hotplug" > /proc/sys/kernel/hotplug echo "Loading scsi_mod.ko module" insmod /lib/scsi_mod.ko echo "Loading sd_mod.ko module" insmod /lib/sd_mod.ko echo "Loading libata.ko module" insmod /lib/libata.ko echo "Loading ata_piix.ko module" insmod /lib/ata_piix.ko echo "Loading ieee1394.ko module" insmod /lib/ieee1394.ko echo "Loading ohci1394.ko module" insmod /lib/ohci1394.ko echo "Loading sbp2.ko module" insmod /lib/sbp2.ko echo "Loading dm-mod.ko module" insmod /lib/dm-mod.ko echo "Loading jbd.ko module" insmod /lib/jbd.ko echo "Loading ext3.ko module" insmod /lib/ext3.ko echo "Loading dm-mirror.ko module" insmod /lib/dm-mirror.ko echo "Loading dm-zero.ko module" insmod /lib/dm-zero.ko echo "Loading dm-snapshot.ko module" insmod /lib/dm-snapshot.ko /sbin/udevstart The failures are different on different boots, sometimes the ata_piix module cannot find symbols from libata, sometimes ext3 cannot find jbd symbols, sometimes dm modules cannot find things from dm-mod, usually it is a combination of these. End result is a panic when it cannot find the root device. From the behavior, it appears that a module load is returning control to user space before the previous one has got its symbols loaded. The loadable module section of my config file looks like this: CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_MODULE_FORCE_UNLOAD is not set CONFIG_OBSOLETE_MODPARM=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y CONFIG_STOP_MACHINE=y Since the fedora core kernels work consistently, this is either something about my config triggering this, or something redhat has in their kernels - I suspect the former. My module tools are "module-init-tools version 3.1-pre5" which seems to be as far as Redhat has updated their rpms to. I will try a later version of these to see if it makes any difference. Complete .config attached. Steve
From: Andrew Morton on 10 Jun 2005 14:30 Stephen Lord <lord(a)xfs.org> wrote: > > I am having troubles getting any recent kernel to boot successfully > on one of my machines, a generic 2.6GHz P4 box with HT enabled > running an updated Fedora Core 3 distro. This is present in > 2.6.12-rc6. It does not manifest itself with the Fedora Core > kernels which have identical initrd contents as far as the > init script and the set of modules included goes. > > The problem manifests itself as various undefined symbols from > module loads. Peculiar. Module loading is all synchronous, isn't it? > ... > The failures are different on different boots, sometimes the ata_piix > module cannot find symbols from libata, sometimes ext3 cannot find jbd > symbols, sometimes dm modules cannot find things from dm-mod, usually > it is a combination of these. End result is a panic when it cannot > find the root device. > > From the behavior, it appears that a module load is returning > control to user space before the previous one has got its symbols > loaded. I wonder if rather than the intermittency being time-based, it is load-address-based? For example, suppose there's a bug in the symbol lookup code? Have you tried using a different gcc version? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Steve Lord on 10 Jun 2005 15:20 Andrew Morton wrote: > Stephen Lord <lord(a)xfs.org> wrote: > >>I am having troubles getting any recent kernel to boot successfully >> on one of my machines, a generic 2.6GHz P4 box with HT enabled >> running an updated Fedora Core 3 distro. This is present in >> 2.6.12-rc6. It does not manifest itself with the Fedora Core >> kernels which have identical initrd contents as far as the >> init script and the set of modules included goes. >> >> The problem manifests itself as various undefined symbols from >> module loads. > > > Peculiar. Module loading is all synchronous, isn't it? Hmm, now that I found the code, yes it is. insmod itself appears to do no fancy foot work either. > > >>... >> The failures are different on different boots, sometimes the ata_piix >> module cannot find symbols from libata, sometimes ext3 cannot find jbd >> symbols, sometimes dm modules cannot find things from dm-mod, usually >> it is a combination of these. End result is a panic when it cannot >> find the root device. >> >> From the behavior, it appears that a module load is returning >> control to user space before the previous one has got its symbols >> loaded. > > > I wonder if rather than the intermittency being time-based, it is > load-address-based? For example, suppose there's a bug in the symbol > lookup code? > > Have you tried using a different gcc version? > Don't have one handy at the moment, I am away from the machine right now as well. I have been updating the machine using redhat's update tools, so the compiler should be the same one I have here: gcc (GCC) 3.4.3 20050227 (Red Hat 3.4.3-22.fc3) That should also be a fairly common compiler variant. I presume this is what redhat does their kernel builds with, so that should be the same too. Shouldn't the memory map be pretty much identical on each boot? Things are pretty deterministic at this stage in the process, and the symbol match failures are not always the same. If this was a memory problem it seems like I would see more random oopses than this. I added more memory to the machine a month or so back, and had to detune the bios settings a little to make it stable. It would be odd that a 2.6.11 kernel was rock solid and a 2.6.12-rc6 falls over so quickly if that was the case. I can play with the init script some and maybe dump out the symbol table after an insmod. Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Stephen Lord on 10 Jun 2005 23:40 Andrew Morton wrote: > Stephen Lord <lord(a)xfs.org> wrote: > >>I am having troubles getting any recent kernel to boot successfully >> on one of my machines, a generic 2.6GHz P4 box with HT enabled >> running an updated Fedora Core 3 distro. This is present in >> 2.6.12-rc6. It does not manifest itself with the Fedora Core >> kernels which have identical initrd contents as far as the >> init script and the set of modules included goes. >> >> The problem manifests itself as various undefined symbols from >> module loads. > > > Peculiar. Module loading is all synchronous, isn't it? > Well, things are getting more bizarre, adding sleeps between module loads cures the problem with missing symbols. I then run into a problem with device mapper/lvm which seems to be having problems setting up devices. In this section of the init script: umount /sys echo Mounting root filesystem mount -o defaults --ro -t ext3 /dev/root /sysroot mount -t tmpfs --bind /dev /sysroot/dev echo Switching to new root switchroot /sysroot umount /initrd/dev The correct number of volumes are found, but adding a showlabels command to the init script fails to display them, it spits out errors about readdir failures in /dev/Volume00 The umount of /sys fails, the root mount fails and obviously, the switchroot then fails. I tried using the same config options as the redhat supplied kernel without any success, this still has module symbol problems. I am baffled, but it looks like it is not a symbol table problem. Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Pozsár Balázs on 11 Jun 2005 05:00
On Fri, Jun 10, 2005 at 11:25:15AM -0700, Andrew Morton wrote: > I wonder if rather than the intermittency being time-based, it is > load-address-based? For example, suppose there's a bug in the symbol > lookup code? Just a data point: I met the same problem with 2.6.12-rc5, using gcc 3.3.4. I think it's time-based issue, because I was playing around with the initscripts, and the bug shows up when there are lots of modprobes in a short time. -- pozsy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |