kmemleak, cpu usage jump out of nowhere [Kernel]

Prev: deadline: don't allow aliased requests to starve others
Next: Employment

From: Zeno Davatz on 14 Jul 2010 02:20

Hi

I got a new Intel core-8 i7 processor.

I am on kernel uname -a

Linux zenogentoo 2.6.35-rc5 #97 SMP Tue Jul 13 16:13:25 CEST 2010 i686
Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz GenuineIntel GNU/Linux

Sometimes in the middle of nowhere all of a sudden all of my 8-cores
are at 100% CPU usage and my machine really lags and hangs and is not
useable anymore. Some random process just grabs a bunch CPUs according
to htop.

dmesg tell me that

kmemleak: 38 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

I am attaching you the file from /sys/kernel/debug/kmemleak

Let me know if you need anything else.

Best
Zeno

From: Pekka Enberg on 14 Jul 2010 04:10

On Wed, Jul 14, 2010 at 9:12 AM, Zeno Davatz <zdavatz(a)gmail.com> wrote:
> I got a new Intel core-8 i7 processor.
>
> I am on kernel uname -a
>
> Linux zenogentoo 2.6.35-rc5 #97 SMP Tue Jul 13 16:13:25 CEST 2010 i686
> Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz GenuineIntel GNU/Linux
>
> Sometimes in the middle of nowhere all of a sudden all of my 8-cores
> are at 100% CPU usage and my machine really lags and hangs and is not
> useable anymore. Some random process just grabs a bunch CPUs according
> to htop.

Why did you enable CONFIG_DEBUG_KMEMLEAK? Memory leak scanning is
likely the source of these pauses.

> dmesg tell me that
>
> kmemleak: 38 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
>
> I am attaching you the file from /sys/kernel/debug/kmemleak

Zeno, can you post your dmesg and .config, please?

We have a bunch of suspected leaks here. The first class of leaks is
related to reserve_region():

unreferenced object 0xf6d80740 (size 64):
comm "swapper", pid 1, jiffies 4294892590 (age 57258.752s)
hex dump (first 32 bytes):
00 00 ee c7 00 00 00 00 ff b7 ee c7 00 00 00 00 ................
7c 09 52 c1 00 00 00 80 00 f2 5e c1 20 ac 6f c1 |.R.......^. .o.
backtrace:
[<c145d4eb>] kmemleak_alloc+0x27/0x4d
[<c10ad53f>] kmem_cache_alloc+0xa3/0xd4
[<c163b782>] __reserve_region_with_split+0x29/0x149
[<c163b86a>] __reserve_region_with_split+0x111/0x149
[<c163b89a>] __reserve_region_with_split+0x141/0x149
[<c163b89a>] __reserve_region_with_split+0x141/0x149
[<c163b89a>] __reserve_region_with_split+0x141/0x149
[<c163b8de>] reserve_region_with_split+0x3c/0x4f
[<c162e307>] e820_reserve_resources_late+0xea/0x108
[<c16504e6>] pcibios_resource_survey+0x23/0x2a
[<c1652022>] pcibios_init+0x61/0x73
[<c165172b>] pci_subsys_init+0x43/0x48
[<c1001114>] do_one_initcall+0x27/0x178
[<c162b357>] kernel_init+0x129/0x1c7
[<c10238b6>] kernel_thread_helper+0x6/0x10
[<ffffffff>] 0xffffffff

unreferenced object 0xf6d232a0 (size 32):
comm "swapper", pid 1, jiffies 4294892601 (age 57258.708s)
hex dump (first 32 bytes):
70 6e 70 20 30 30 3a 30 31 00 d2 f6 fa 00 0b c1 pnp 00:01.......
00 00 00 00 04 aa dc f6 2c 00 00 00 01 00 00 00 ........,.......
backtrace:
[<c145d4eb>] kmemleak_alloc+0x27/0x4d
[<c10ad53f>] kmem_cache_alloc+0xa3/0xd4
[<c123040b>] reserve_range+0x3b/0x13f
[<c1230597>] system_pnp_probe+0x88/0xb0
[<c122b0f7>] pnp_device_probe+0x67/0xaf
[<c12d5246>] driver_probe_device+0x5b/0x148
[<c12d539a>] __driver_attach+0x67/0x69
[<c12d4c33>] bus_for_each_dev+0x46/0x64
[<c12d512c>] driver_attach+0x19/0x1b
[<c12d46f5>] bus_add_driver+0x17a/0x225
[<c12d55b8>] driver_register+0x65/0x110
[<c122af44>] pnp_register_driver+0x17/0x19
[<c1647a91>] pnp_system_init+0xd/0xf
[<c1001114>] do_one_initcall+0x27/0x178
[<c162b357>] kernel_init+0x129/0x1c7
[<c10238b6>] kernel_thread_helper+0x6/0x10

I scanned through both call sites briefly but didn't find anything obvious.

The second class of leaks seems to be related to kobjects:

unreferenced object 0xf6951920 (size 32):
comm "swapper", pid 1, jiffies 4294892614 (age 57258.656s)
hex dump (first 32 bytes):
63 70 75 69 64 6c 65 00 2f 76 69 72 74 75 61 6c cpuidle./virtual
2f 67 72 61 70 68 69 63 73 2f 66 62 63 6f 6e 00 /graphics/fbcon.
backtrace:
[<c11e33c6>] kvasprintf+0x2a/0x47
[<c11db5d7>] kobject_set_name_vargs+0x17/0x52
[<c11db629>] kobject_add_varg+0x17/0x41
[<c11db67a>] kobject_init_and_add+0x27/0x2d
[<c1389b0c>] cpuidle_add_sysfs+0x3e/0x56
[<c138944e>] __cpuidle_register_device+0xfb/0x116
[<c13895fc>] cpuidle_register_device+0x18/0x54
[<c1645397>] intel_idle_init+0x2b9/0x327
[<c1001114>] do_one_initcall+0x27/0x178
[<c162b357>] kernel_init+0x129/0x1c7
[<c10238b6>] kernel_thread_helper+0x6/0x10
[<ffffffff>] 0xffffffff

unreferenced object 0xf60045c0 (size 32):
comm "swapper", pid 1, jiffies 4294893885 (age 57253.572s)
hex dump (first 32 bytes):
30 00 64 4b bc a3 bc a3 80 f5 80 f5 a7 15 a7 15 0.dK............
34 07 34 07 69 4f 69 4f f4 47 f4 47 ef 27 ef 27 4.4.iOiO.G.G.'.'
backtrace:
[<c145d4eb>] kmemleak_alloc+0x27/0x4d
[<c10adb0c>] __kmalloc+0xd4/0x10d
[<c11e33c6>] kvasprintf+0x2a/0x47
[<c11db5d7>] kobject_set_name_vargs+0x17/0x52
[<c11db629>] kobject_add_varg+0x17/0x41
[<c11db6ac>] kobject_add+0x2c/0x54
[<c138ad14>] add_sysfs_fw_map_entry+0x43/0x7c
[<c164f00f>] memmap_init+0x16/0x30
[<c1001114>] do_one_initcall+0x27/0x178
[<c162b357>] kernel_init+0x129/0x1c7
[<c10238b6>] kernel_thread_helper+0x6/0x10
[<ffffffff>] 0xffffffff

The third class of leaks is relateed to drm_setversion():

unreferenced object 0xf6b10620 (size 32):
comm "X", pid 2268, jiffies 4294894722 (age 57250.228s)
hex dump (first 32 bytes):
6e 6f 75 76 65 61 75 40 70 63 69 3a 30 30 30 30 nouveau(a)pci:0000
3a 30 35 3a 30 30 2e 30 00 00 00 00 00 00 00 00 :05:00.0........
backtrace:
[<c145d4eb>] kmemleak_alloc+0x27/0x4d
[<c10adb0c>] __kmalloc+0xd4/0x10d
[<c125315e>] drm_setversion+0x140/0x1bf
[<c12514f2>] drm_ioctl+0x258/0x3d7
[<c10bdd42>] vfs_ioctl+0x27/0x9b
[<c10bdee2>] do_vfs_ioctl+0x66/0x54b
[<c10be3fa>] sys_ioctl+0x33/0x4f
[<c102339c>] sysenter_do_call+0x12/0x2c
[<ffffffff>] 0xffffffff

for which I wasn't able to find the allocation call-site. Maybe Zeno
has some out-of-tree DRM module?

The fourth class of leaks is related to per-CPU allocations in the block layer:

unreferenced object 0xf6681400 (size 1024):
comm "async/2", pid 1307, jiffies 4294894138 (age 57252.564s)
hex dump (first 32 bytes):
80 87 ff ff c4 ff ff ff c4 ff ff ff c4 ff ff ff ................
fc ff ff ff fc ff ff ff fc ff ff ff fc ff ff ff ................
backtrace:
[<c145d4eb>] kmemleak_alloc+0x27/0x4d
[<c10adb0c>] __kmalloc+0xd4/0x10d
[<c10ae982>] pcpu_mem_alloc+0x18/0x3a
[<c10af239>] pcpu_extend_area_map+0x1a/0xad
[<c10af578>] pcpu_alloc+0x2ac/0x82b
[<c10afb10>] __alloc_percpu+0xa/0xc
[<c11d4518>] alloc_disk_node+0x2e/0xbf
[<c11d45b6>] alloc_disk+0xd/0xf
[<c130260c>] sd_probe+0x54/0x298
[<c12d5246>] driver_probe_device+0x5b/0x148
[<c12d53ca>] __device_attach+0x2e/0x32
[<c12d49f3>] bus_for_each_drv+0x46/0x64
[<c12d5449>] device_attach+0x5c/0x60
[<c12d484d>] bus_probe_device+0x1a/0x30
[<c12d358a>] device_add+0x448/0x509
[<c12fb881>] scsi_sysfs_add_sdev+0x54/0x212

for which I didn't find anything obvious that could explain it.

I suspect most of the reports are false positives. Catalin, what do
you make out of them?

Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Zeno Davatz on 14 Jul 2010 04:30

Dear Pekka

On Wed, Jul 14, 2010 at 10:05 AM, Pekka Enberg <penberg(a)cs.helsinki.fi> wrote:
> On Wed, Jul 14, 2010 at 9:12 AM, Zeno Davatz <zdavatz(a)gmail.com> wrote:
>> I got a new Intel core-8 i7 processor.
>>
>> I am on kernel uname -a
>>
>> Linux zenogentoo 2.6.35-rc5 #97 SMP Tue Jul 13 16:13:25 CEST 2010 i686
>> Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz GenuineIntel GNU/Linux
>>
>> Sometimes in the middle of nowhere all of a sudden all of my 8-cores
>> are at 100% CPU usage and my machine really lags and hangs and is not
>> useable anymore. Some random process just grabs a bunch CPUs according
>> to htop.
>
> Why did you enable CONFIG_DEBUG_KMEMLEAK? Memory leak scanning is
> likely the source of these pauses.

Shall I disable that? I will do that and try again.

>> I am attaching you the file from /sys/kernel/debug/kmemleak
>
> Zeno, can you post your dmesg and .config, please?

Sure, see attached files.

> The third class of leaks is relateed to drm_setversion():
>
> unreferenced object 0xf6b10620 (size 32):
> comm "X", pid 2268, jiffies 4294894722 (age 57250.228s)
> hex dump (first 32 bytes):
> 6e 6f 75 76 65 61 75 40 70 63 69 3a 30 30 30 30 nouveau(a)pci:000
0
> 3a 30 35 3a 30 30 2e 30 00 00 00 00 00 00 00 00 :05:00.0.........
> backtrace:
> [<c145d4eb>] kmemleak_alloc+0x27/0x4d
> [<c10adb0c>] __kmalloc+0xd4/0x10d
> [<c125315e>] drm_setversion+0x140/0x1bf
> [<c12514f2>] drm_ioctl+0x258/0x3d7
> [<c10bdd42>] vfs_ioctl+0x27/0x9b
> [<c10bdee2>] do_vfs_ioctl+0x66/0x54b
> [<c10be3fa>] sys_ioctl+0x33/0x4f
> [<c102339c>] sysenter_do_call+0x12/0x2c
> [<ffffffff>] 0xffffffff
>
> for which I wasn't able to find the allocation call-site. Maybe Zeno
> has some out-of-tree DRM module?

I am using the nouveau drivers in the kernel as I got an Nvidia Graphics card.

05:00.0 VGA compatible controller: nVidia Corporation G98 [GeForce
8400 GS] (rev a1) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. Device 8321
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
Memory at d0000000 (64-bit, prefetchable) [size=256M]
Memory at f8000000 (64-bit, non-prefetchable) [size=32M]
I/O ports at ec00 [size=128]
[virtual] Expansion ROM at fb000000 [disabled] [size=128K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Virtual Channel <?>
Capabilities: [128] Power Budgeting <?>
Capabilities: [600] Vendor Specific Information <?>
Kernel driver in use: nouveau

Best
Zeno

From: Pekka Enberg on 14 Jul 2010 04:40

Zeno Davatz wrote:
> On Wed, Jul 14, 2010 at 10:31 AM, Damien Wyart <damien.wyart(a)free.fr> wrote:
>
>>> On Wed, Jul 14, 2010 at 9:12 AM, Zeno Davatz <zdavatz(a)gmail.com> wrote:
>>>> I got a new Intel core-8 i7 processor.
>>>> I am on kernel uname -a
>>>> Linux zenogentoo 2.6.35-rc5 #97 SMP Tue Jul 13 16:13:25 CEST 2010 i686
>>>> Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz GenuineIntel GNU/Linux
>>>> Sometimes in the middle of nowhere all of a sudden all of my 8-cores
>>>> are at 100% CPU usage and my machine really lags and hangs and is not
>>>> useable anymore. Some random process just grabs a bunch CPUs according
>>>> to htop.
>> * Pekka Enberg <penberg(a)cs.helsinki.fi> [2010-07-14 11:05]:
>>> Why did you enable CONFIG_DEBUG_KMEMLEAK? Memory leak scanning is
>>> likely the source of these pauses.
>> I am seeing the same problem with a Core i7 920 and 2.6.35-rc5, and I do
>> not have CONFIG_DEBUG_KMEMLEAK enabled, so I think this is not related.
>>
>> I do not see anything special in the logs, just the load becoming mad
>> and almost preventing ssh access. I've been seeing that since the first
>> 2.6.35 rc I tested (-rc2 or -rc3, I don't remember) and I did not have
>> time to report it before but I was surprised nobody else did. No problem
>> with 2.6.34 and 2.6.34.1.
>
> same with me. My last build I tested was 2.6.34-rc7. No problems
> there. No CPU jumps out of nowhere.
>
> It is like any application all of a sudden use 400% CPU i.e. htop.

Interesting. Lets CC some scheduler folks for help.

Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Zeno Davatz on 14 Jul 2010 04:40

On Wed, Jul 14, 2010 at 10:31 AM, Damien Wyart <damien.wyart(a)free.fr> wrote:

>> On Wed, Jul 14, 2010 at 9:12 AM, Zeno Davatz <zdavatz(a)gmail.com> wrote:
>> > I got a new Intel core-8 i7 processor.
>
>> > I am on kernel uname -a
>
>> > Linux zenogentoo 2.6.35-rc5 #97 SMP Tue Jul 13 16:13:25 CEST 2010 i686
>> > Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz GenuineIntel GNU/Linux
>
>> > Sometimes in the middle of nowhere all of a sudden all of my 8-cores
>> > are at 100% CPU usage and my machine really lags and hangs and is not
>> > useable anymore. Some random process just grabs a bunch CPUs according
>> > to htop.
>
> * Pekka Enberg <penberg(a)cs.helsinki.fi> [2010-07-14 11:05]:
>> Why did you enable CONFIG_DEBUG_KMEMLEAK? Memory leak scanning is
>> likely the source of these pauses.
>
> I am seeing the same problem with a Core i7 920 and 2.6.35-rc5, and I do
> not have CONFIG_DEBUG_KMEMLEAK enabled, so I think this is not related.
>
> I do not see anything special in the logs, just the load becoming mad
> and almost preventing ssh access. I've been seeing that since the first
> 2.6.35 rc I tested (-rc2 or -rc3, I don't remember) and I did not have
> time to report it before but I was surprised nobody else did. No problem
> with 2.6.34 and 2.6.34.1.

same with me. My last build I tested was 2.6.34-rc7. No problems
there. No CPU jumps out of nowhere.

It is like any application all of a sudden use 400% CPU i.e. htop.

Best
Zeno
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2 3 4 5 6 7 8
Prev: deadline: don't allow aliased requests to starve others
Next: Employment