From: Sachin Sant on
While executing libhugetlbfs tests against 2.6.35-rc2 on
a x86_64 box came across the following GPF

eneral protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
CPU 3
Modules linked in: ipv6 mperf fuse loop dm_mod sr_mod cdrom usb_storage sg i2c_piix4 rtc_cmos bnx2 k8temp pcspkr serio_raw mptctl i2c_core rtc_core rtc_lib shpchp button pci_hotplug usbhid hid ohci_hcd ehci_hcd sd_mod crc_t10dif usbcore edd ext3 jbd fan thermal processor thermal_sys hwmon mptsas mptscsih mptbase scsi_transport_sas scsi_mod

Pid: 20232, comm: autotest Not tainted 2.6.35-rc2-autotest #1 Server Blade/BladeCenter LS21 -[79716AA]-
RIP: 0010:[<ffffffff813968ca>] [<ffffffff813968ca>] _raw_spin_lock+0x9/0x20
RSP: 0018:ffff880126e43d88 EFLAGS: 00010202
RAX: 0000000000010000 RBX: 0720072007200720 RCX: 0000000000000000
RDX: 0000000000000011 RSI: ffff8801293a7470 RDI: 0720072007200720
RBP: ffff880126e43d88 R08: ffff8801279df270 R09: 09f911029d74e35b
R10: 09f911029d74e35b R11: dead000000100100 R12: ffff8801278cae00
R13: 0720072007200710 R14: ffff8801297e71f8 R15: 0000000000000000
FS: 00007f461d6866f0(0000) GS:ffff880006180000(0000) knlGS:0000000055731b00
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f461d45a7b8 CR3: 0000000001713000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process autotest (pid: 20232, threadinfo ffff880126e42000, task ffff8801297e4190)
Stack:
ffff880126e43db8 ffffffff810f6b80 ffff8801297ae858 ffff8801297e7190
<0> ffff8801297e7190 00007f461940e000 ffff880126e43e08 ffffffff810f025e
<0> 00000000ffffffff 0000000000000000 ffff88000618d690 ffff88000618d690
Call Trace:
[<ffffffff810f6b80>] unlink_anon_vmas+0x37/0xf2
[<ffffffff810f025e>] free_pgtables+0x5f/0xc9
[<ffffffff810f1ac1>] exit_mmap+0xe6/0x141
[<ffffffff81064a6d>] mmput+0x39/0xdb
[<ffffffff81068b4b>] exit_mm+0x119/0x126
[<ffffffff8106a3bb>] do_exit+0x225/0x721
[<ffffffff8106a928>] do_group_exit+0x71/0x9a
[<ffffffff8106a963>] sys_exit_group+0x12/0x16
[<ffffffff8102896b>] system_call_fastpath+0x16/0x1b
Code: c2 c1 c0 10 39 c2 8d 90 00 00 01 00 75 04 f0 0f b1 17 0f 94 c2 0f b6 c2 85 c0 c9 0f 95 c0 0f b6 c0 c3 55 b8 00 00 01 00 48 89 e5 <f0> 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 07 f3 90 0f b7 17 eb f5
RIP [<ffffffff813968ca>] _raw_spin_lock+0x9/0x20
RSP <ffff880126e43d88>
---[ end trace 844bcf9372ef8fa1 ]---
Clocksource tsc unstable (delta = 4398037966381 ns)
Fixing recursive fault but reboot is needed!

Previous snapshot release (2.6.35-rc1-git5 6c5de280b6..) was good.
I am using version 2.8 of libhugetlbfs tests from
http://sourceforge.net/projects/libhugetlbfs/files/

thanks
-Sachin


--

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mel Gorman on
On Sun, Jun 06, 2010 at 09:38:16PM +0530, Sachin Sant wrote:
> While executing libhugetlbfs tests against 2.6.35-rc2 on
> a x86_64 box came across the following GPF
>
> eneral protection fault: 0000 [#1] SMP
> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
> CPU 3
> Modules linked in: ipv6 mperf fuse loop dm_mod sr_mod cdrom usb_storage sg i2c_piix4 rtc_cmos bnx2 k8temp pcspkr serio_raw mptctl i2c_core rtc_core rtc_lib shpchp button pci_hotplug usbhid hid ohci_hcd ehci_hcd sd_mod crc_t10dif usbcore edd ext3 jbd fan thermal processor thermal_sys hwmon mptsas mptscsih mptbase scsi_transport_sas scsi_mod
>
> Pid: 20232, comm: autotest Not tainted 2.6.35-rc2-autotest #1 Server Blade/BladeCenter LS21 -[79716AA]-
> RIP: 0010:[<ffffffff813968ca>] [<ffffffff813968ca>] _raw_spin_lock+0x9/0x20
> RSP: 0018:ffff880126e43d88 EFLAGS: 00010202
> RAX: 0000000000010000 RBX: 0720072007200720 RCX: 0000000000000000
> RDX: 0000000000000011 RSI: ffff8801293a7470 RDI: 0720072007200720
> RBP: ffff880126e43d88 R08: ffff8801279df270 R09: 09f911029d74e35b
> R10: 09f911029d74e35b R11: dead000000100100 R12: ffff8801278cae00
> R13: 0720072007200710 R14: ffff8801297e71f8 R15: 0000000000000000
> FS: 00007f461d6866f0(0000) GS:ffff880006180000(0000) knlGS:0000000055731b00
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f461d45a7b8 CR3: 0000000001713000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process autotest (pid: 20232, threadinfo ffff880126e42000, task ffff8801297e4190)
> Stack:
> ffff880126e43db8 ffffffff810f6b80 ffff8801297ae858 ffff8801297e7190
> <0> ffff8801297e7190 00007f461940e000 ffff880126e43e08 ffffffff810f025e
> <0> 00000000ffffffff 0000000000000000 ffff88000618d690 ffff88000618d690
> Call Trace:
> [<ffffffff810f6b80>] unlink_anon_vmas+0x37/0xf2
> [<ffffffff810f025e>] free_pgtables+0x5f/0xc9
> [<ffffffff810f1ac1>] exit_mmap+0xe6/0x141

While at first glance this looks like a general bug, it might still be
some oddity in hugetlbfs. Sachin, how reproducible is this? I just ran the
libhugetlbfs tests just fine on x86-64. Can you post your .config please?

> [<ffffffff81064a6d>] mmput+0x39/0xdb
> [<ffffffff81068b4b>] exit_mm+0x119/0x126
> [<ffffffff8106a3bb>] do_exit+0x225/0x721
> [<ffffffff8106a928>] do_group_exit+0x71/0x9a
> [<ffffffff8106a963>] sys_exit_group+0x12/0x16
> [<ffffffff8102896b>] system_call_fastpath+0x16/0x1b
> Code: c2 c1 c0 10 39 c2 8d 90 00 00 01 00 75 04 f0 0f b1 17 0f 94 c2 0f b6 c2 85 c0 c9 0f 95 c0 0f b6 c0 c3 55 b8 00 00 01 00 48 89 e5 <f0> 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 07 f3 90 0f b7 17 eb f5
> RIP [<ffffffff813968ca>] _raw_spin_lock+0x9/0x20
> RSP <ffff880126e43d88>
> ---[ end trace 844bcf9372ef8fa1 ]---
> Clocksource tsc unstable (delta = 4398037966381 ns)
> Fixing recursive fault but reboot is needed!
>
> Previous snapshot release (2.6.35-rc1-git5 6c5de280b6..) was good.
> I am using version 2.8 of libhugetlbfs tests from
> http://sourceforge.net/projects/libhugetlbfs/files/
>

This implies it might not be easily reproducible because no commits
happened between that window that affected anon_vma locking. I have the
test running in a loop to see can I reproduce it.

Thanks

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Sachin Sant on
Mel Gorman wrote:
> If the problem has gone away since 2.6.35-rc2, the most likely candidate fix
> patch is commit [386f40: Revert "tty: fix a little bug in scrup, vt.c"] which
> reverts the patch you previously identified as being a problem. The commit
> message also matches roughly what you are seeing with the 0x0720 patterns.
>
> Can you retest with 2.6.35-rc2 with commit 386f40 applied and see if it
> also fixes up your problem please?
>
I could not recreate this problem against 2.6.35-rc2 + commit 386f40.

Thanks
-Sachin

--

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mel Gorman on
On Fri, Jun 11, 2010 at 11:02:33AM +0530, Sachin Sant wrote:
> Mel Gorman wrote:
>> If the problem has gone away since 2.6.35-rc2, the most likely candidate fix
>> patch is commit [386f40: Revert "tty: fix a little bug in scrup, vt.c"] which
>> reverts the patch you previously identified as being a problem. The commit
>> message also matches roughly what you are seeing with the 0x0720 patterns.
>>
>> Can you retest with 2.6.35-rc2 with commit 386f40 applied and see if it
>> also fixes up your problem please?
>>
> I could not recreate this problem against 2.6.35-rc2 + commit 386f40.
>

Great, I will consider this bug resolved so. Thanks for testing.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/