|
Prev: [stable PATCH 2.6.21] ACPICA: Ignore ACPI table signature for Load() operator
Next: adjust cpu power for secondary threads on POWER6
From: Daniel J Blueman on 18 Jun 2008 22:40 Hi Ian, On 17 Jun, 17:10, Ian Soboroff <isoboroff(a)gmail.com> wrote: > I have a server that hosts some large XFS filesystems and serves them > out over NFS. Every so often I get the following Oops, and then the > machine locks hard with blinky keyboard lights. ("Every so often" == I > can't reproduce this reliably. It comes up about once a week, we've > seen it three times.) > > Unable to handle kernel NULL pointer dereference at virtual address 00000000 > printing eip: > 00000000 > *pde = 355bf001 > Oops: 0000 [#1] > SMP > Modules linked in: nfs nfsd exportfs lockd nfs_acl md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc button battery ac ohci_hcd tg3 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod aacraid aic7xxx sd_mod scsi_m > od > CPU: 0 > EIP: 0060:[<00000000>] Not tainted VLI > EFLAGS: 00010282 (2.6.9-67.0.15.ELirsmp) > EIP is at 0x0 > eax: e1c86c30 ebx: c04ba260 ecx: 00000000 edx: d820304c > esi: d820304c edi: f6ecbf00 ebp: 00000000 esp: f6ecbee4 > ds: 007b es: 007b ss: 0068 > Process nfsd (pid: 4339, threadinfo=f6ecb000 task=f6c470b0) > Stack: c0168c5f e1c86c30 ffffffff f5f96090 60229cac cc751afc c0168cd3 60229cac > 00000008 f5f96088 e1c86ca0 e1c86ca0 e1c86c30 cc751afc f5f95004 f8bcee28 > f5f96088 f7e6ba00 f7d351c0 f7e6ba00 f8b2b46a f5f95800 f5f95000 f5f951d4 > Call Trace: > [<c0168c5f>] __lookup_hash+0x70/0x89 > [<c0168cd3>] lookup_one_len+0x54/0x63 > [<f8bcee28>] nfsd_lookup+0x321/0x3ad [nfsd] > [<f8b2b46a>] svcauth_unix_set_client+0xa7/0xb5 [sunrpc] > [<f8bd6b49>] nfsd3_proc_lookup+0xa9/0xb3 [nfsd] > [<f8bd8b37>] nfs3svc_decode_diropargs+0x0/0xfa [nfsd] > [<f8bcc681>] nfsd_dispatch+0xba/0x16d [nfsd] > [<f8b2862d>] svc_process+0x444/0x6f3 [sunrpc] > [<f8bcc45a>] nfsd+0x1cc/0x339 [nfsd] > [<f8bcc28e>] nfsd+0x0/0x339 [nfsd] > [<c01041f5>] kernel_thread_helper+0x5/0xb > Code: Bad EIP value. > <0>Fatal exception: panic in 5 seconds Has 4KB stacks been disabled? You can check the config file for CONFIG_4KSTACKS. It may also be worth feeding that into the bugzilla entry, to eliminate one possibility, as 'bad EIP value' looks suspicious of stack corrption. Daniel > This machine is running RHEL4, using the stock kernel but with XFS > enabled. I would have reported it to Redhat instead, but in googling > around found a nearly identical kernel bugzilla report: > > http://bugzilla.kernel.org/show_bug.cgi?id=7809 > > In there, the bug reporter has tracked the Oops to __lookup_hash() in > fs/namei.c, and includes a patch which basically just takes care to not > dereference inode->i_op->lookup without checking it first. > > I looked at the latest fs/namei.c via gitweb and it's the same code. So > here I am reporting it here, where more knowledgable and responsive > people lurk anyway. > > Is this a NFS problem, or an XFS one? (Since XFS is common in both my > report and in the bugzilla one... I'm not sure whether the 'inode' in > question is NFS or from the underlying filesystem). > > Is the bugzilla report's patch papering over a real problem, or does it > fix a real possible null-pointer case in __lookup_hash? > > Thanks, > Ian -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Ian Soboroff on 23 Jun 2008 13:10 "Daniel J Blueman" <daniel.blueman(a)gmail.com> writes: > Has 4KB stacks been disabled? You can check the config file for > CONFIG_4KSTACKS. This kernel has 4KSTACKS enabled. > It may also be worth feeding that into the bugzilla entry, to > eliminate one possibility, as 'bad EIP value' looks suspicious of > stack corrption. Ok, will do. Although that bugzilla entry is from 2007 and no one seems to have looked at it at all... Ian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Daniel J Blueman on 23 Jun 2008 14:20 On Mon, Jun 23, 2008 at 5:47 PM, Ian Soboroff <isoboroff(a)gmail.com> wrote: > The following message is a courtesy copy of an article > that has been posted to gmane.linux.kernel as well. > > "Daniel J Blueman" <daniel.blueman(a)gmail.com> writes: > >> Has 4KB stacks been disabled? You can check the config file for >> CONFIG_4KSTACKS. > > This kernel has 4KSTACKS enabled. There is chance that you've overrun the 4KB stack. Can you retest with CONFIG_4KSTACK disabled perhaps? >> It may also be worth feeding that into the bugzilla entry, to >> eliminate one possibility, as 'bad EIP value' looks suspicious of >> stack corrption. > > Ok, will do. Although that bugzilla entry is from 2007 and no one seems > to have looked at it at all... -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Ian on 23 Jun 2008 14:40
On Mon, Jun 23, 2008 at 2:17 PM, Daniel J Blueman <daniel.blueman(a)gmail.com> wrote: > There is chance that you've overrun the 4KB stack. Can you retest with > CONFIG_4KSTACK disabled perhaps? Testing is hard as the oops is not easily reproducible, but I'll prepare a non-4KSTACKS kernel so that I can boot to it if we oops again. I'm still interested to hear from someone if the patch in bugzilla is good for catching a real error case, or if it's papering over a larger problem (for example a stack overrun). Ian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |