|
Prev: quota: Add a convenience macro for filesystems
Next: i386: Execute stack overflow warning on interrupt stack II
From: Eric Sesterhenn on 5 May 2008 06:10 * Eric Sesterhenn (snakebyte(a)gmx.de) wrote: > hi, > > running the strace_test from ltp 20080229 (ltp.sf.net) gives me > two different oopses, so far i was not able to pinpoint to a specific > testcase (propably because the strace uses the rng to decided what fails > and what not) one oops is in iret_exc(), the other in __copy_from_user_ll() > The oopses dont happen with 2.6.24 so this appears to be a regression, i am starting > a git-bisect, but this might take some time > > Here is a full dmesg with the oopses happening, it usually takes less than a minute > until they trigger, strangely they dont appear in /var/log/messages, only on the netconsole [ first part of dmesg snipped ] > [ 194.539607] BUG: unable to handle kernel NULL pointer dereference at 00000000 > [ 194.539919] IP: [<c03eb611>] __copy_from_user_ll+0x61/0xe0 > [ 194.540150] Oops: 0003 [#1] PREEMPT DEBUG_PAGEALLOC > [ 194.540432] Modules linked in: nfsd exportfs > [ 194.540788] > [ 194.540949] Pid: 7340, comm: strace Tainted: G W (2.6.25-06679-g0ff5ce7 #30) > [ 194.541105] EIP: 0060:[<c03eb611>] EFLAGS: 00010216 CPU: 0 > [ 194.541187] EIP is at __copy_from_user_ll+0x61/0xe0 > [ 194.541187] EAX: 00000000 EBX: 00000073 ECX: 00000200 EDX: 00000000 > [ 194.541187] ESI: 00000073 EDI: 00000000 EBP: cbf23e98 ESP: cbf23e90 > [ 194.541187] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 > [ 194.541187] Process strace (pid: 7340, ti=cbf23000 task=caa5af40 task.ti=cbf23000) > [ 194.541187] Stack: 00000000 caa5af40 cbf23f1c c010a599 00000003 00000000 00000000 cbf23ed4 > [ 194.541187] c8001dd8 00000046 00000002 00000001 c0138f7a 00000002 00000000 cbf37694 > [ 194.541187] caa5af40 00000046 00000002 00000000 cbf37694 caa5af40 00000002 00000000 > [ 194.541187] Call Trace: > [ 194.541187] [<c010a599>] ? restore_i387+0x89/0x190 > [ 194.541187] [<c0138f7a>] ? remove_wait_queue+0x1a/0x40 > [ 194.541187] [<c0103067>] ? restore_sigcontext+0x1c7/0x1f0 > [ 194.541187] [<c01032e9>] ? sys_sigreturn+0xe9/0x190 > [ 194.541187] [<c012fdd8>] ? sys_rt_sigaction+0x68/0x90 > [ 194.541187] [<c0103d7d>] ? sysenter_past_esp+0x6a/0xb1 > [ 194.541187] ======================= > [ 194.541187] Code: c1 e9 02 83 e0 03 f3 a5 89 c1 f3 a4 8b 34 24 89 c8 8b 7c 24 04 89 ec 5d c3 90 8b 46 20 83 f9 43 76 04 8b 46 40 90 8b 06 8b 56 04 <89> 07 89 57 04 8b 46 08 8b 56 0c 89 47 08 89 57 0c 8b 46 10 8b > [ 194.541187] EIP: [<c03eb611>] __copy_from_user_ll+0x61/0xe0 SS:ESP 0068:cbf23e90 > [ 194.594334] ---[ end trace a7919e7f17c0a725 ]--- > [ 204.592336] BUG: unable to handle kernel NULL pointer dereference at 00000000 > [ 204.592678] IP: [<c03eb611>] __copy_from_user_ll+0x61/0xe0 > [ 204.592880] *pde = 0a94d067 *pte = 09ffb065 > [ 204.593140] Oops: 0003 [#2] PREEMPT DEBUG_PAGEALLOC > [ 204.593423] Modules linked in: nfsd exportfs > [ 204.593780] > [ 204.593941] Pid: 7638, comm: strace Tainted: G D W (2.6.25-06679-g0ff5ce7 #30) > [ 204.594097] EIP: 0060:[<c03eb611>] EFLAGS: 00010216 CPU: 0 > [ 204.594215] EIP is at __copy_from_user_ll+0x61/0xe0 > [ 204.594328] EAX: 00000000 EBX: 00000074 ECX: 00000200 EDX: 00000000 > [ 204.594506] ESI: 00000074 EDI: 00000000 EBP: cc82de98 ESP: cc82de90 > [ 204.594625] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 > [ 204.594645] Process strace (pid: 7638, ti=cc82d000 task=ca9bbf00 task.ti=cc82d000) > [ 204.594645] Stack: 00000000 ca9bbf00 cc82df1c c010a599 00000004 00000000 00000000 cc82ded4 > [ 204.594645] ca98acd8 00000046 00000002 00000001 c0138f7a 00000002 00000000 cbfceb94 > [ 204.594645] ca9bbf00 00000046 00000002 00000000 cbfceb94 ca9bbf00 00000002 00000000 > [ 204.594645] Call Trace: > [ 204.594645] [<c010a599>] ? restore_i387+0x89/0x190 > [ 204.594645] [<c0138f7a>] ? remove_wait_queue+0x1a/0x40 > [ 204.594645] [<c0103067>] ? restore_sigcontext+0x1c7/0x1f0 > [ 204.594645] [<c01032e9>] ? sys_sigreturn+0xe9/0x190 > [ 204.594645] [<c012fdd8>] ? sys_rt_sigaction+0x68/0x90 > [ 204.594645] [<c0103d7d>] ? sysenter_past_esp+0x6a/0xb1 > [ 204.594645] ======================= > [ 204.594645] Code: c1 e9 02 83 e0 03 f3 a5 89 c1 f3 a4 8b 34 24 89 c8 8b 7c 24 04 89 ec 5d c3 90 8b 46 20 83 f9 43 76 04 8b 46 40 90 8b 06 8b 56 04 <89> 07 89 57 04 8b 46 08 8b 56 0c 89 47 08 89 57 0c 8b 46 10 8b > [ 204.594645] EIP: [<c03eb611>] __copy_from_user_ll+0x61/0xe0 SS:ESP 0068:cc82de90 > [ 204.785891] ---[ end trace a7919e7f17c0a725 ]--- > [ 207.866195] BUG: unable to handle kernel NULL pointer dereference at 00000000 > [ 207.866555] IP: [<c067b1b5>] iret_exc+0x605/0x992 > [ 207.866753] *pde = 0aac3067 *pte = 00000000 > [ 207.867013] Oops: 0002 [#3] PREEMPT DEBUG_PAGEALLOC > [ 207.867296] Modules linked in: nfsd exportfs > [ 207.867653] > [ 207.867815] Pid: 7897, comm: strace Tainted: G D W (2.6.25-06679-g0ff5ce7 #30) > [ 207.867973] EIP: 0060:[<c067b1b5>] EFLAGS: 00010246 CPU: 0 > [ 207.868091] EIP is at iret_exc+0x605/0x992 > [ 207.868201] EAX: 00000000 EBX: 00000077 ECX: 00000200 EDX: 00000077 > [ 207.868379] ESI: 00000077 EDI: 00000000 EBP: ccbaae98 ESP: ccbaae88 > [ 207.868498] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 > [ 207.868521] Process strace (pid: 7897, ti=ccbaa000 task=cbf7cec0 task.ti=ccbaa000) > [ 207.868521] Stack: 00000077 00000200 00000000 cbf7cec0 ccbaaf1c c010a599 00000007 00000000 > [ 207.868521] 00000000 ccbaaed4 c8000ef8 00000046 00000002 00000001 c0138f7a 00000002 > [ 207.868521] 00000000 cbf57c14 cbf7cec0 00000046 00000002 00000000 cbf57c14 cbf7cec0 > [ 207.868521] Call Trace: > [ 207.868521] [<c010a599>] ? restore_i387+0x89/0x190 > [ 207.868521] [<c0138f7a>] ? remove_wait_queue+0x1a/0x40 > [ 207.868521] [<c0103067>] ? restore_sigcontext+0x1c7/0x1f0 > [ 207.868521] [<c01032e9>] ? sys_sigreturn+0xe9/0x190 > [ 207.868521] [<c012fdd8>] ? sys_rt_sigaction+0x68/0x90 > [ 207.868521] [<c0103d7d>] ? sysenter_past_esp+0x6a/0xb1 > [ 207.868521] ======================= > [ 207.868521] Code: ff 01 c1 e9 00 04 d7 ff 8d 0c 88 e9 f8 03 d7 ff 01 c1 eb 03 8d 0c 88 51 50 31 c0 f3 aa 58 59 e9 44 04 d7 ff 8d 0c 88 51 50 31 c0 <f3> aa 58 59 e9 c9 04 d7 ff 01 c1 e9 0d 05 d7 ff 8d 0c 88 e9 05 > [ 207.868521] EIP: [<c067b1b5>] iret_exc+0x605/0x992 SS:ESP 0068:ccbaae88 > [ 208.002448] ---[ end trace a7919e7f17c0a725 ]--- after some bisecting i found commit http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=aa283f49276e7d840a40fb01eee6de97eaa7e012;hp=61c4628b538608c1a85211ed8438136adfeb9a95 to be guilty. After reverting this manually (didnt revert cleanly) i was unable to reproduce the oopses Greetings, Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Frederik Deweerdt on 5 May 2008 14:10 On Mon, May 05, 2008 at 12:00:08PM +0200, Eric Sesterhenn wrote: > * Eric Sesterhenn (snakebyte(a)gmx.de) wrote: > > hi, > > > > running the strace_test from ltp 20080229 (ltp.sf.net) gives me > > two different oopses, so far i was not able to pinpoint to a specific > > testcase (propably because the strace uses the rng to decided what fails > > and what not) one oops is in iret_exc(), the other in __copy_from_user_ll() > > The oopses dont happen with 2.6.24 so this appears to be a regression, i am starting > > a git-bisect, but this might take some time > > [...] > > after some bisecting i found commit > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=aa283f49276e7d840a40fb01eee6de97eaa7e012;hp=61c4628b538608c1a85211ed8438136adfeb9a95 > to be guilty. After reverting this manually (didnt revert cleanly) Hi Eric, This appears to be caused by init_fpu() missing from the restore_sigcontext->restore_i387->restore_fpu_checking code path. I believe that moving the init_fpu() call from math_state_restore to restore_fpu_checking should fix the problem? Regards, Frederik Signed-off-by: Frederik Deweerdt <frederik.deweerdt(a)gmail.com> diff --git a/arch/x86/kernel/traps_64.c b/arch/x86/kernel/traps_64.c index 8069073..5b1af48 100644 --- a/arch/x86/kernel/traps_64.c +++ b/arch/x86/kernel/traps_64.c @@ -1142,22 +1142,6 @@ asmlinkage void math_state_restore(void) { struct task_struct *me = current; - if (!used_math()) { - local_irq_enable(); - /* - * does a slab alloc which can sleep - */ - if (init_fpu(me)) { - /* - * ran out of memory! - */ - do_group_exit(SIGKILL); - return; - } - local_irq_disable(); - } - - clts(); /* Allow maths ops (or we recurse) */ restore_fpu_checking(&me->thread.xstate->fxsave); task_thread_info(me)->status |= TS_USEDFPU; me->fpu_counter++; diff --git a/include/asm-x86/i387.h b/include/asm-x86/i387.h index da2adb4..bf1cabe 100644 --- a/include/asm-x86/i387.h +++ b/include/asm-x86/i387.h @@ -47,7 +47,20 @@ static inline void tolerant_fwait(void) static inline int restore_fpu_checking(struct i387_fxsave_struct *fx) { - int err; + int err = -1; + + if (!used_math()) { + local_irq_enable(); + /* + * does a slab alloc which can sleep + */ + if (init_fpu(current)) + return err; + local_irq_disable(); + } + + clts(); /* Allow maths ops (or we recurse) */ asm volatile("1: rex64/fxrstor (%[fx])\n\t" "2:\n" -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on 6 May 2008 08:20 * Frederik Deweerdt <deweerdt(a)free.fr> wrote: > Hi Eric, > > This appears to be caused by init_fpu() missing from the > restore_sigcontext->restore_i387->restore_fpu_checking code path. > > I believe that moving the init_fpu() call from math_state_restore to > restore_fpu_checking should fix the problem? thanks Eric and Frederik for tracking this down. Eric, does Frederik's patch fix the problem for you? Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Eric Sesterhenn on 6 May 2008 11:10 * Ingo Molnar (mingo(a)elte.hu) wrote: > > * Frederik Deweerdt <deweerdt(a)free.fr> wrote: > > > Hi Eric, > > > > This appears to be caused by init_fpu() missing from the > > restore_sigcontext->restore_i387->restore_fpu_checking code path. > > > > I believe that moving the init_fpu() call from math_state_restore to > > restore_fpu_checking should fix the problem? > > thanks Eric and Frederik for tracking this down. Eric, does Frederik's > patch fix the problem for you? didnt have a chance to test this yet, I'll be back home tomorrow and test Frederiks patch. Greetings, Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Eric Sesterhenn on 7 May 2008 06:40
* Ingo Molnar (mingo(a)elte.hu) wrote: > > * Frederik Deweerdt <deweerdt(a)free.fr> wrote: > > > Hi Eric, > > > > This appears to be caused by init_fpu() missing from the > > restore_sigcontext->restore_i387->restore_fpu_checking code path. > > > > I believe that moving the init_fpu() call from math_state_restore to > > restore_fpu_checking should fix the problem? sadly the patch does not work for me :( I still get the oopses. I am running this on a 32 Bit CPU, so arch/x86/kernel/traps_64.c doesnt get compiled. I tried removing the same part from traps_32.c but then the kernel oopses before netconsole is active. Greetings, Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |