From: Thomas Renninger on
Hi,

Doing:
echo "file ec.c +p" >/sys/kernel/debug/dynamic_debug/control

I got x
RIP: 0010:[<ffffffff81251267>] [<ffffffff81251267>]
ddebug_change+0x87/0x240

More output is in the attachment.

Thanks for any hints,

Thomas
From: Jason Baron on
On Thu, Jul 01, 2010 at 05:44:19PM +0200, Thomas Renninger wrote:
>
> Hi,
>
> Doing:
> echo "file ec.c +p" >/sys/kernel/debug/dynamic_debug/control
>
> I got x
> RIP: 0010:[<ffffffff81251267>] [<ffffffff81251267>]
> ddebug_change+0x87/0x240
>
> More output is in the attachment.
>
> Thanks for any hints,
>
> Thomas

Hi Thomas,

I just tried the same command on 2.6.35-rc3, and it worked fine. I
noticed that the kernel your running is: "2.6.35-rc3-0.0.10.4cae135-default",
so perhaps there are some other changes there causing this problem? Can
you re-produce the bug on a purely upstream kernel?

thanks,

-Jason

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Renninger on
On Thursday 01 July 2010 18:26:01 Jason Baron wrote:
> On Thu, Jul 01, 2010 at 05:44:19PM +0200, Thomas Renninger wrote:
> >
> > Hi,
> >
> > Doing:
> > echo "file ec.c +p" >/sys/kernel/debug/dynamic_debug/control
> >
> > I got x
> > RIP: 0010:[<ffffffff81251267>] [<ffffffff81251267>]
> > ddebug_change+0x87/0x240
....
> I just tried the same command on 2.6.35-rc3, and it worked fine. I
> noticed that the kernel your running is: "2.6.35-rc3-0.0.10.4cae135-default",
> so perhaps there are some other changes there causing this problem? Can
> you re-produce the bug on a purely upstream kernel?
I am able to create another crash with plain 2.6.35-rc3 kernel.
Find attached my .config.
I again initiated the crash by:
echo "file ec.c +p" >/sys/kernel/debug/dynamic_debug/control

First I got a crash in strcmp without a backtrace.

BUG: unable to handle kernel NULL pointer dereference at (null)
[ 252.832036] IP: [<ffffffff8122b4f4>] strcmp+0x14/0x30
[ 252.832036] PGD 37a76067 PUD 7a60c067 PMD 0
[ 252.832036] Oops: 0000 [#1] SMP
[ 252.832036] last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map

Thus I clustered lib/dynamic_debug.c with NULL pointer checks before
strcmp(), but couldn't find anything -> got other crashs.

Hmm, with attached config I can easily produce crashs, e.g.:
cat /sys/kernel/debug/dynamic_debug/control |wc
622 5683 64665

BUG: unable to handle kernel paging request at ffffffffa01c480f
[ 412.226289] IP: [<ffffffff8123b31b>] ddebug_proc_show+0x1b/0xa0
[ 412.226289] PGD 1806067 PUD 180a063 PMD 376e8067 PTE 0
[ 412.226289] Oops: 0000 [#1] SMP
[ 412.226289] last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
[ 412.226289] CPU 5
[ 412.226289] Modules linked in: autofs4 edd nfs lockd fscache nfs_acl auth_rpcgss sunrpc af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave
acpi_cpufreq mperf fuse reiserfs loop dm_mod igb i5400_edac ioatdma edac_core floppy sr_mod shpchp iTCO_wdt iTCO_vendor_support cdrom container button
pci_hotplug i2c_i801 i5k_amb serio_raw dca sg pcspkr ext4 jbd2 crc16 uhci_hcd radeon ttm drm_kms_helper ehci_hcd drm usbcore sd_mod i2c_algo_bit fan thermal
processor thermal_sys ata_generic ata_piix ahci libahci libata scsi_mod
[ 412.226289]
[ 412.226289] Pid: 3451, comm: cat Not tainted 2.6.35-rc3-vanilla #5 X7DWN/X7DW3
[ 412.226289] RIP: 0010:[<ffffffff8123b31b>] [<ffffffff8123b31b>] ddebug_proc_show+0x1b/0xa0
[ 412.226289] RSP: 0018:ffff880078761e68 EFLAGS: 00010286
[ 412.226289] RAX: ffffffff8163fe20 RBX: ffffffffa01c47e8 RCX: ffff8800370ec3e0
[ 412.226289] RDX: ffff8800370ec3e0 RSI: ffffffffa01c47e8 RDI: ffff880036e24740
[ 412.226289] RBP: ffff880078761e98 R08: 0000000000000001 R09: 000000000000ffff
[ 412.226289] R10: 0000000000000003 R11: 000000000000000a R12: ffff880036e24740
[ 412.226289] R13: 0000000000000ec7 R14: ffff880078761ed0 R15: 0000000000000000
[ 412.226289] FS: 00007f9f0982c700(0000) GS:ffff880001d40000(0000) knlGS:0000000000000000
[ 412.226289] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 412.226289] CR2: ffffffffa01c480f CR3: 0000000037a08000 CR4: 00000000000006e0
[ 412.226289] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 412.226289] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 412.226289] Process cat (pid: 3451, threadinfo ffff880078760000, task ffff880036536480)
[ 412.226289] Stack:
[ 412.226289] ffff880078761e78 ffffffff8123b3e7 000000000000002d ffff8800379cae40
[ 412.226289] <0> ffffffffa01c47e8 ffff880036e24740 ffff880078761f08 ffffffff8114eca9
[ 412.226289] <0> ffff880078761f48 0000000000001000 0000000000611000 ffff8800379cae40
[ 412.226289] Call Trace:
[ 412.226289] [<ffffffff8123b3e7>] ? ddebug_proc_start+0x27/0x100
[ 412.226289] [<ffffffff8114eca9>] seq_read+0x269/0x430
[ 412.226289] [<ffffffff81131ec8>] vfs_read+0xc8/0x1a0
[ 412.226289] [<ffffffff81132090>] sys_read+0x50/0x90
[ 412.226289] [<ffffffff81002e2b>] system_call_fastpath+0x16/0x1b
[ 412.226289] Code: ee ff 48 83 c4 08 5b c9 c3 0f 1f 80 00 00 00 00 55 48 89 e5 41 54 49 89 fc 53 48 89 f3 48 83 ec 20 48 83 fe 01 48 8b 57 60 74 75 <0f> b6 46
27 c6 45 e1 00 83 e0 01 83 f8 01 19 c0 83 e0 bd 83 c0
[ 412.226289] RIP [<ffffffff8123b31b>] ddebug_proc_show+0x1b/0xa0
[ 412.226289] RSP <ffff880078761e68>
[ 412.226289] CR2: ffffffffa01c480f


Here another try with verbose=1 (in dynamic_debug.c).
This time no backtrace again:

[ 222.788975] ddebug_proc_open: called
[ 222.797015] ddebug_proc_start: called m=ffff88015a100340 *pos=0
[ 222.809390] ddebug_proc_show: called m=ffff88015a100340 p=0000000000000001
[ 222.823733] ddebug_proc_next: called m=ffff88015a100340 p=0000000000000001 *pos=0
[ 222.839181] ddebug_proc_show: called m=ffff88015a100340 p=ffffffff81909820
[ 222.853423] ddebug_proc_next: called m=ffff88015a100340 p=ffffffff81909820 *pos=1
[ 222.868866] ddebug_proc_show: called m=ffff88015a100340 p=ffffffff81909848
[ 222.868879] ddebug_proc_next: called m=ffff88015a100340 p=ffffffff81909848 *pos=2
[ 222.868887] ddebug_proc_show: called m=ffff88015a100340 p=ffffffff81909870
[ 222.868896] ddebug_proc_next: called m=ffff88015a100340 p=ffffffff81909870 *pos=3
[ 222.868904] ddebug_proc_show: called m=ffff88015a100340 p=ffffffff81909898
[ 222.868913] ddebug_proc_next: called m=ffff88015a100340 p=ffffffff81909898 *pos=4
[ 222.868920] ddebug_proc_show: called m=ffff88015a100340 p=ffffffff819098c0
[ 222.868930] ddebug_proc_next: called m=ffff88015a100340 p=ffffffff819098c0 *pos=5
[ 222.868937] ddebug_proc_show: called m=ffff88015a100340 p=ffffffff819098e8
[ 222.868947] ddebug_proc_next: called m=ffff88015a10034

<- KDUMP/CRASH kernel gets loaded ->

[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 2.6.35-rc3-vanilla (trenn(a)ett) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #6 SMP Fri Jul 2 18:39:34 CEST 2010
[ 0.000000] Command line: root=/dev/sda7 console=tty0 console=ttyS0,57600 sysrq_always_enabled panic=100 ignore_loglevel resume=/dev/disk/by-id/ata-
WDC_WD1200JS-00NCB1_WD-WCANM5535606-part2 quiet vga=normal elevator=deadline sysrq=1 reset_devices irqpoll maxcpus=1 memmap=exactmap
memmap=640K(a)0K memmap=261484K(a)33408K elfco
From: Jason Baron on
On Fri, Jul 02, 2010 at 06:55:59PM +0200, Thomas Renninger wrote:
> On Thursday 01 July 2010 18:26:01 Jason Baron wrote:
> > On Thu, Jul 01, 2010 at 05:44:19PM +0200, Thomas Renninger wrote:
> > >
> > > Hi,
> > >
> > > Doing:
> > > echo "file ec.c +p" >/sys/kernel/debug/dynamic_debug/control
> > >
> > > I got x
> > > RIP: 0010:[<ffffffff81251267>] [<ffffffff81251267>]
> > > ddebug_change+0x87/0x240
> ...
> > I just tried the same command on 2.6.35-rc3, and it worked fine. I
> > noticed that the kernel your running is: "2.6.35-rc3-0.0.10.4cae135-default",
> > so perhaps there are some other changes there causing this problem? Can
> > you re-produce the bug on a purely upstream kernel?
> I am able to create another crash with plain 2.6.35-rc3 kernel.

Hi,

So dynamic debug was not properly unregistering debug calls when modules
were removed. Specifically, we were leaving around references to debug
statements in modules that were no longer loaded. Thus, we would end up
touching invalid parts of memory, leading to these panics. The missing
unregister calls, were during module loading races, and error
conditions, and that probably explains why we haven't seen them before.
Also, I didn't see this problem initially b/c I was not using modules.
Anyways, please try the patch below to verify that it resolves this
issue.

thanks,

-Jason

Make sure we properly call ddebug_remove_module() when a module fails to
load. In addition, pass the pointer to the "debug table", to both
ddebug_add_module(), and ddebug_remove_module() so that we can uniquely
identify each set of debug statements. In this way even modules with the
same name can be properly identified and removed.


Signed-off-by: Jason Baron <jbaron(a)redhat.com>
---
include/linux/dynamic_debug.h | 7 ++-----
include/linux/module.h | 4 ++++
kernel/module.c | 10 +++++++---
lib/dynamic_debug.c | 4 ++--
4 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index b3cd4de..a5c133e 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -40,7 +40,7 @@ int ddebug_add_module(struct _ddebug *tab, unsigned int n,
const char *modname);

#if defined(CONFIG_DYNAMIC_DEBUG)
-extern int ddebug_remove_module(char *mod_name);
+extern int ddebug_remove_module(struct _ddebug *tab, char *mod_name);

#define __dynamic_dbg_enabled(dd) ({ \
int __ret = 0; \
@@ -73,10 +73,7 @@ extern int ddebug_remove_module(char *mod_name);

#else

-static inline int ddebug_remove_module(char *mod)
-{
- return 0;
-}
+#define ddebug_remove_module(tab, name) do {} while (0)

#define dynamic_pr_debug(fmt, ...) \
do { if (0) printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__); } while (0)
diff --git a/include/linux/module.h b/include/linux/module.h
index 8a6b9fd..97ce090 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -387,6 +387,10 @@ struct module
ctor_fn_t *ctors;
unsigned int num_ctors;
#endif
+
+#ifdef CONFIG_DYNAMIC_DEBUG
+ struct _ddebug *ddebug;
+#endif
};
#ifndef MODULE_ARCH_INIT
#define MODULE_ARCH_INIT {}
diff --git a/kernel/module.c b/kernel/module.c
index 8c6b428..16bb044 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -787,7 +787,6 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,

/* Store the name of the last unloaded module for diagnostic purposes */
strlcpy(last_unloaded_module, mod->name, sizeof(last_unloaded_module));
- ddebug_remove_module(mod->name);

free_module(mod);
return 0;
@@ -1550,6 +1549,8 @@ static void free_module(struct module *mod)
remove_sect_attrs(mod);
mod_kobject_remove(mod);

+ ddebug_remove_module(mod->ddebug, mod->name);
+
/* Arch-specific cleanup. */
module_arch_cleanup(mod);

@@ -2053,12 +2054,14 @@ static inline void add_kallsyms(struct module *mod,
}
#endif /* CONFIG_KALLSYMS */

-static void dynamic_debug_setup(struct _ddebug *debug, unsigned int num)
+static void dynamic_debug_setup(struct _ddebug *debug, unsigned int num,
+ struct module *mod)
{
#ifdef CONFIG_DYNAMIC_DEBUG
if (ddebug_add_module(debug, num, debug->modname))
printk(KERN_ERR "dynamic debug error adding module: %s\n",
debug->modname);
+ mod->ddebug = debug;
#endif
}

@@ -2483,7 +2486,7 @@ static noinline struct module *load_module(void __user *umod,
debug = section_objs(hdr, sechdrs, secstrings, "__verbose",
sizeof(*debug), &num_debug);
if (debug)
- dynamic_debug_setup(debug, num_debug);
+ dynamic_debug_setup(debug, num_debug, mod);
}

err = module_finalize(hdr, sechdrs, mod);
@@ -2562,6 +2565,7 @@ static noinline struct module *load_module(void __user *umod,
synchronize_sched();
module_arch_cleanup(mod);
cleanup:
+ ddebug_remove_module(mod->ddebug, mod->name);
free_modinfo(mod);
module_unload_free(mod);
#if defined(CONFIG_MODULE_UNLOAD)
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 3df8eb1..7d66180 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -692,7 +692,7 @@ static void ddebug_table_free(struct ddebug_table *dt)
* Called in response to a module being unloaded. Removes
* any ddebug_table's which point at the module.
*/
-int ddebug_remove_module(char *mod_name)
+int ddebug_remove_module(struct _ddebug *tab, char *mod_name)
{
struct ddebug_table *dt, *nextdt;
int ret = -ENOENT;
@@ -703,7 +703,7 @@ int ddebug_remove_module(char *mod_name)

mutex_lock(&ddebug_lock);
list_for_each_entry_safe(dt, nextdt, &ddebug_tables, link) {
- if (!strcmp(dt->mod_name, mod_name)) {
+ if (!strcmp(dt->mod_name, mod_name) && (tab == dt->ddebugs)) {
ddebug_table_free(dt);
ret = 0;
}
--
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrew Morton on
On Thu, 8 Jul 2010 17:39:28 -0400
Jason Baron <jbaron(a)redhat.com> wrote:

> Make sure we properly call ddebug_remove_module() when a module fails to
> load. In addition, pass the pointer to the "debug table", to both
> ddebug_add_module(), and ddebug_remove_module() so that we can uniquely
> identify each set of debug statements. In this way even modules with the
> same name can be properly identified and removed.
>
>
> Signed-off-by: Jason Baron <jbaron(a)redhat.com>

It'd be nice to track the Reported-by:s. And the Tested-by:s if/when
they arrive. SighIllDoIt.

The patch (almost) applies to 2.6.34. So are we missing a Cc:stable tag
as well?

Also, this patch's title is now "Re: Dynamic Debug broken on
2.6.35-rc3?" which isn't very good. I can invent a new title for it,
but that means we don't have a well-understood handle to refer to this
patch, and to perform searches with, etc.

Maybe a resend would be best..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/