From: Ondrej Zary on
Hello,
I have problems debbugging an oops. It happens when Nexio USB touchscreen
(using my new code http://lkml.org/lkml/2009/11/25/568) is disconnected:

BUG: unable to handle kernel NULL pointer dereference at 00000048
IP: [<f7c38afd>] start_unlink_async+0xb2/0x160 [ehci_hcd]
*pde = 00000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1b.0/sound/card0/controlC0/uevent
Modules linked in: uvesafb cn i915 drm i2c_algo_bit joydev usbtouchscreen loop snd_usb_audio snd_usb_lib snd_rawmidi snd_seq_device
snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd ftdi_sio soundcore snd_page_alloc
gspca_ov519 usblp usbhid hid usbserial gspca_main videodev rng_core v4l1_compat i2c_i801 i2c_core processor pcspkr psmouse
asus_atk0110 evdev serio_raw button ext3 jbd mbcache usb_storage sd_mod crc_t10dif ata_generic ata_piix libata scsi_mod
ide_pci_generic r8169 mii video output uhci_hcd intel_agp agpgart ehci_hcd ide_core usbcore nls_base thermal fan thermal_sys
Pid: 195, comm: khubd Not tainted (2.6.31 #1) B202
EIP: 0060:[<f7c38afd>] EFLAGS: 00010003 CPU: 0
EIP is at start_unlink_async+0xb2/0x160 [ehci_hcd]
EAX: 00000000 EBX: f648c8e8 ECX: 78bd7dee EDX: 78bd7dee
ESI: 00000000 EDI: f65fc080 EBP: 00010030 ESP: f65bfddc
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process hbuhd (pid: 195, ti=f65be000 task=f644e1c0 task.ti=f65be000)
Stack:
78bd7dee fffffffe f65fc080 f648c800 f648c8e8 f7c3ab29 f648c8f8 00000246
<0> 00000000 78bd7dee f7c3e278 f648c800 f605d840 fffffffe f7c977fc f6481800
<0> 78bd7dee 00000000 f605d840 00000246 fffffffe f7c9795d 78bd7dee f605d840
Call Trace:
[<f7c3ab29>] ? ehci_urb_dequeue+0x7c/0x11a [ehci_hcd]
[<f7c977fc>] ? unlink1+0xaa/0xc7 [usbcore]
[<f7c9795d>] ? usb_hcd_unlink_urb+0x57/0x84 [usbcore]
[<f7c98b28>] ? usb_kill_urb+0x40/0xbe [usbcore]
[<c1034ec2>] ? default_wake_function+0x0/0x2b
[<f7c99ff9>] ? usb_start_wait_urb+0x6e/0xb0 [usbcore]
[<f7c9a2cf>] ? usb_control_msg+0x10a/0x136 [usbcore]
[<f7c92e46>] ? hub_port_status+0x77/0xf7 [usbcore]
[<f7c95f9d>] ? hub_thread+0x56d/0xe14 [usbcore]
[<c1050003>] ? autoremove_wake_function+0x0/0x4f
[<f7c95a30>] ? hub_thread+0x0/0xe14 [usbcore]
[<c104fc73>] ? kthread+0x7a/0x7f
[<c104fbf9>] ? kthread+0x0/0x7f
[<c1004027>] ? kernel_thread_helper+0x7/0x10
Code: 00 fb e9 bb 00 00 00 c6 46 68 02 89 f0 e8 ee e8 ff ff 85 db 89 c7 89 43 18 75 06 68 c5 e4 c3 f7 e8 b4 5f 68 c9 50 8b 43 14 89 c6
<8b> 40 48 39 f8 75 f7 85 f6 75 0b 68 0c e5 c3 f7 e8 99 5f 68 c9
EIP: [<f7c38afd>] start_unlink_async+0xb2/0x160 [ehci_hcd] SS:ESP 0068:f65bfddc
CR2: 0000000000000048
---[ end trace 040b72a526aa0755 ]---


It does not happen everytime - sometimes it survives the first disconnect.
Tried adding printk()s to start_unlink_async function - and the oops does not appear.
Looks like a race. It might be a bug in my code but I'm not able to find it.

It also happens only when the touchscreen is connected through a hub:
Bus 001 Device 002: ID 2001:f103 D-Link Corp. [hex] DUB-H7 7-port USB 2.0 hub
When connected directly to the machine, it does not oops.

Tried decodecode:
Code: 00 fb e9 bb 00 00 00 c6 46 68 02 89 f0 e8 ee e8 ff ff 85 db 89 c7 89 43 18 75 06 68 c5 e4 c3 f7 e8 b4 5f 68 c9 50 8b 43 14 89 c6 <8b> 40 48 39 f8 75
f7 85 f6 75 0b 68 0c e5 c3 f7 e8 99 5f 68 c9
All code
========
0: 00 fb add %bh,%bl
2: e9 bb 00 00 00 jmp 0xc2
7: c6 46 68 02 movb $0x2,0x68(%esi)
b: 89 f0 mov %esi,%eax
d: e8 ee e8 ff ff call 0xffffe900
12: 85 db test %ebx,%ebx
14: 89 c7 mov %eax,%edi
16: 89 43 18 mov %eax,0x18(%ebx)
19: 75 06 jne 0x21
1b: 68 c5 e4 c3 f7 push $0xf7c3e4c5
20: e8 b4 5f 68 c9 call 0xc9685fd9
25: 50 push %eax
26: 8b 43 14 mov 0x14(%ebx),%eax
29: 89 c6 mov %eax,%esi
2b:* 8b 40 48 mov 0x48(%eax),%eax <-- trapping instruction
2e: 39 f8 cmp %edi,%eax
30: 75 f7 jne 0x29
32: 85 f6 test %esi,%esi
34: 75 0b jne 0x41
36: 68 0c e5 c3 f7 push $0xf7c3e50c
3b: e8 99 5f 68 c9 call 0xc9685fd9

Code starting with the faulting instruction
===========================================
0: 8b 40 48 mov 0x48(%eax),%eax
3: 39 f8 cmp %edi,%eax
5: 75 f7 jne 0xfffffffe
7: 85 f6 test %esi,%esi
9: 75 0b jne 0x16
b: 68 0c e5 c3 f7 push $0xf7c3e50c
10: e8 99 5f 68 c9 call 0xc9685fae

and "make drivers/usb/host/ehci-hcd.s" but I'm not able to find the above code in ehci-hcd.s.

What am I doing wrong?

--
Ondrej Zary
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on
On Fri, 27 Nov 2009, Ondrej Zary wrote:

> Hello,
> I have problems debbugging an oops. It happens when Nexio USB touchscreen
> (using my new code http://lkml.org/lkml/2009/11/25/568) is disconnected:
>
> BUG: unable to handle kernel NULL pointer dereference at 00000048
> IP: [<f7c38afd>] start_unlink_async+0xb2/0x160 [ehci_hcd]
....

> It does not happen everytime - sometimes it survives the first disconnect.
> Tried adding printk()s to start_unlink_async function - and the oops does not appear.
> Looks like a race. It might be a bug in my code but I'm not able to find it.
>
> It also happens only when the touchscreen is connected through a hub:
> Bus 001 Device 002: ID 2001:f103 D-Link Corp. [hex] DUB-H7 7-port USB 2.0 hub
> When connected directly to the machine, it does not oops.

That's understandable, since the stack trace showed that the oops
occurred while the hub driver was running.

> Tried decodecode:
> Code: 00 fb e9 bb 00 00 00 c6 46 68 02 89 f0 e8 ee e8 ff ff 85 db 89 c7 89 43 18 75 06 68 c5 e4 c3 f7 e8 b4 5f 68 c9 50 8b 43 14 89 c6 <8b> 40 48 39 f8 75
> f7 85 f6 75 0b 68 0c e5 c3 f7 e8 99 5f 68 c9
> All code
> ========
> 0: 00 fb add %bh,%bl
> 2: e9 bb 00 00 00 jmp 0xc2
> 7: c6 46 68 02 movb $0x2,0x68(%esi)
> b: 89 f0 mov %esi,%eax
> d: e8 ee e8 ff ff call 0xffffe900
> 12: 85 db test %ebx,%ebx
> 14: 89 c7 mov %eax,%edi
> 16: 89 43 18 mov %eax,0x18(%ebx)
> 19: 75 06 jne 0x21
> 1b: 68 c5 e4 c3 f7 push $0xf7c3e4c5
> 20: e8 b4 5f 68 c9 call 0xc9685fd9
> 25: 50 push %eax
> 26: 8b 43 14 mov 0x14(%ebx),%eax
> 29: 89 c6 mov %eax,%esi
> 2b:* 8b 40 48 mov 0x48(%eax),%eax <-- trapping instruction
> 2e: 39 f8 cmp %edi,%eax
> 30: 75 f7 jne 0x29
> 32: 85 f6 test %esi,%esi
> 34: 75 0b jne 0x41
> 36: 68 0c e5 c3 f7 push $0xf7c3e50c
> 3b: e8 99 5f 68 c9 call 0xc9685fd9
>
> Code starting with the faulting instruction
> ===========================================
> 0: 8b 40 48 mov 0x48(%eax),%eax
> 3: 39 f8 cmp %edi,%eax
> 5: 75 f7 jne 0xfffffffe
> 7: 85 f6 test %esi,%esi
> 9: 75 0b jne 0x16
> b: 68 0c e5 c3 f7 push $0xf7c3e50c
> 10: e8 99 5f 68 c9 call 0xc9685fae
>
> and "make drivers/usb/host/ehci-hcd.s" but I'm not able to find the above code in ehci-hcd.s.
>
> What am I doing wrong?

With your disassembly? Nothing that I can see. You might be able to
locate the code in question by comparing the output above and the
contents of ehci-hcd.s with the output of "objdump -D
drivers/usb/host/ehci-hcd.o" -- search for the start of the
start_unlink_async() routine and go forward from there.

For what it's worth, your disassembly doesn't bear any relation to the
code for start_unlink_async() on my system.

As for what your driver is doing wrong... Perhaps it is writing to a
memory area after freeing it. Have you tried using usbmon to see
what's going on before the oops occurs?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ondrej Zary on
On Friday 27 November 2009, Alan Stern wrote:
> On Fri, 27 Nov 2009, Ondrej Zary wrote:
> > Hello,
> > I have problems debbugging an oops. It happens when Nexio USB touchscreen
> > (using my new code http://lkml.org/lkml/2009/11/25/568) is disconnected:
> >
> > BUG: unable to handle kernel NULL pointer dereference at 00000048
> > IP: [<f7c38afd>] start_unlink_async+0xb2/0x160 [ehci_hcd]
>
> ...
>
> > It does not happen everytime - sometimes it survives the first
> > disconnect. Tried adding printk()s to start_unlink_async function - and
> > the oops does not appear. Looks like a race. It might be a bug in my code
> > but I'm not able to find it.
> >
> > It also happens only when the touchscreen is connected through a hub:
> > Bus 001 Device 002: ID 2001:f103 D-Link Corp. [hex] DUB-H7 7-port USB 2.0
> > hub When connected directly to the machine, it does not oops.
>
> That's understandable, since the stack trace showed that the oops
> occurred while the hub driver was running.
>
> > Tried decodecode:
> > Code: 00 fb e9 bb 00 00 00 c6 46 68 02 89 f0 e8 ee e8 ff ff 85 db 89 c7
> > 89 43 18 75 06 68 c5 e4 c3 f7 e8 b4 5f 68 c9 50 8b 43 14 89 c6 <8b> 40 48
> > 39 f8 75 f7 85 f6 75 0b 68 0c e5 c3 f7 e8 99 5f 68 c9
> > All code
> > ========
> > 0: 00 fb add %bh,%bl
> > 2: e9 bb 00 00 00 jmp 0xc2
> > 7: c6 46 68 02 movb $0x2,0x68(%esi)
> > b: 89 f0 mov %esi,%eax
> > d: e8 ee e8 ff ff call 0xffffe900
> > 12: 85 db test %ebx,%ebx
> > 14: 89 c7 mov %eax,%edi
> > 16: 89 43 18 mov %eax,0x18(%ebx)
> > 19: 75 06 jne 0x21
> > 1b: 68 c5 e4 c3 f7 push $0xf7c3e4c5
> > 20: e8 b4 5f 68 c9 call 0xc9685fd9
> > 25: 50 push %eax
> > 26: 8b 43 14 mov 0x14(%ebx),%eax
> > 29: 89 c6 mov %eax,%esi
> > 2b:* 8b 40 48 mov 0x48(%eax),%eax <-- trapping
> > instruction 2e: 39 f8 cmp %edi,%eax
> > 30: 75 f7 jne 0x29
> > 32: 85 f6 test %esi,%esi
> > 34: 75 0b jne 0x41
> > 36: 68 0c e5 c3 f7 push $0xf7c3e50c
> > 3b: e8 99 5f 68 c9 call 0xc9685fd9
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: 8b 40 48 mov 0x48(%eax),%eax
> > 3: 39 f8 cmp %edi,%eax
> > 5: 75 f7 jne 0xfffffffe
> > 7: 85 f6 test %esi,%esi
> > 9: 75 0b jne 0x16
> > b: 68 0c e5 c3 f7 push $0xf7c3e50c
> > 10: e8 99 5f 68 c9 call 0xc9685fae
> >
> > and "make drivers/usb/host/ehci-hcd.s" but I'm not able to find the above
> > code in ehci-hcd.s.
> >
> > What am I doing wrong?
>
> With your disassembly? Nothing that I can see. You might be able to
> locate the code in question by comparing the output above and the
> contents of ehci-hcd.s with the output of "objdump -D
> drivers/usb/host/ehci-hcd.o" -- search for the start of the
> start_unlink_async() routine and go forward from there.

Thanks, found it there:
00001a4b <start_unlink_async>:
1a4b: 55 push %ebp
1a4c: 57 push %edi
1a4d: 56 push %esi
1a4e: 89 d6 mov %edx,%esi
1a50: 53 push %ebx
1a51: 89 c3 mov %eax,%ebx
1a53: 83 ec 04 sub $0x4,%esp
1a56: 65 a1 14 00 00 00 mov %gs:0x14,%eax
1a5c: 89 04 24 mov %eax,(%esp)
1a5f: 31 c0 xor %eax,%eax
1a61: 85 db test %ebx,%ebx
1a63: 75 0b jne 1a70 <start_unlink_async+0x25>
1a65: 68 57 01 00 00 push $0x157
1a6a: e8 fc ff ff ff call 1a6b <start_unlink_async+0x20>
1a6f: 58 pop %eax
1a70: 83 7b 04 00 cmpl $0x0,0x4(%ebx)
1a74: 75 0b jne 1a81 <start_unlink_async+0x36>
1a76: 68 91 01 00 00 push $0x191
1a7b: e8 fc ff ff ff call 1a7c <start_unlink_async+0x31>
1a80: 58 pop %eax
1a81: 85 f6 test %esi,%esi
1a83: 75 0b jne 1a90 <start_unlink_async+0x45>
1a85: 68 d1 01 00 00 push $0x1d1
1a8a: e8 fc ff ff ff call 1a8b <start_unlink_async+0x40>
1a8f: 58 pop %eax
1a90: 8b 43 04 mov 0x4(%ebx),%eax
1a93: 8b 28 mov (%eax),%ebp
1a95: 3b 73 14 cmp 0x14(%ebx),%esi
1a98: 75 3f jne 1ad9 <start_unlink_async+0x8e>
1a9a: 68 0b 02 00 00 push $0x20b
1a9f: e8 fc ff ff ff call 1aa0 <start_unlink_async+0x55>
1aa4: 83 7b fc 00 cmpl $0x0,-0x4(%ebx)
1aa8: 58 pop %eax
1aa9: 0f 84 e5 00 00 00 je 1b94 <start_unlink_async+0x149>
1aaf: 83 7b 18 00 cmpl $0x0,0x18(%ebx)
1ab3: 0f 85 db 00 00 00 jne 1b94 <start_unlink_async+0x149>
1ab9: 83 e5 df and $0xffffffdf,%ebp
1abc: 8b 43 04 mov 0x4(%ebx),%eax
1abf: 89 28 mov %ebp,(%eax)
1ac1: f0 83 04 24 00 lock addl $0x0,(%esp)
1ac6: 8d 83 08 01 00 00 lea 0x108(%ebx),%eax
1acc: f0 80 a3 08 01 00 00 lock andb $0xfb,0x108(%ebx)
1ad3: fb
1ad4: e9 bb 00 00 00 jmp 1b94 <start_unlink_async+0x149>
1ad9: c6 46 68 02 movb $0x2,0x68(%esi)
1add: 89 f0 mov %esi,%eax
1adf: e8 ee e8 ff ff call 3d2 <qh_get>
1ae4: 85 db test %ebx,%ebx
1ae6: 89 c7 mov %eax,%edi
1ae8: 89 43 18 mov %eax,0x18(%ebx)
1aeb: 75 0b jne 1af8 <start_unlink_async+0xad>
1aed: 68 d1 01 00 00 push $0x1d1
1af2: e8 fc ff ff ff call 1af3 <start_unlink_async+0xa8>
1af7: 58 pop %eax
1af8: 8b 43 14 mov 0x14(%ebx),%eax
1afb: 89 c6 mov %eax,%esi
==> 1afd: 8b 40 48 mov 0x48(%eax),%eax
1b00: 39 f8 cmp %edi,%eax
1b02: 75 f7 jne 1afb <start_unlink_async+0xb0>
1b04: 85 f6 test %esi,%esi
1b06: 75 0b jne 1b13 <start_unlink_async+0xc8>
1b08: 68 18 02 00 00 push $0x218
1b0d: e8 fc ff ff ff call 1b0e <start_unlink_async+0xc3>
1b12: 58 pop %eax
1b13: 8b 07 mov (%edi),%eax
1b15: 89 06 mov %eax,(%esi)
1b17: 8b 47 48 mov 0x48(%edi),%eax
1b1a: 89 46 48 mov %eax,0x48(%esi)
1b1d: f0 83 04 24 00 lock addl $0x0,(%esp)
1b22: f6 43 fc 01 testb $0x1,-0x4(%ebx)
1b26: 75 18 jne 1b40 <start_unlink_async+0xf5>
1b28: 8b 14 24 mov (%esp),%edx
1b2b: 65 33 15 14 00 00 00 xor %gs:0x14,%edx
1b32: 75 6c jne 1ba0 <start_unlink_async+0x155>
1b34: 5d pop %ebp
1b35: 89 d8 mov %ebx,%eax
1b37: 5b pop %ebx
1b38: 5e pop %esi
1b39: 5f pop %edi
1b3a: 5d pop %ebp
1b3b: e9 50 fe ff ff jmp 1990 <end_unlink_async>
1b40: 83 cd 40 or $0x40,%ebp
1b43: 8b 43 04 mov 0x4(%ebx),%eax
1b46: 89 28 mov %ebp,(%eax)
1b48: 8b 43 04 mov 0x4(%ebx),%eax
1b4b: 8b 00 mov (%eax),%eax
1b4d: 83 bb a8 00 00 00 00 cmpl $0x0,0xa8(%ebx)
1b54: 74 0f je 1b65 <start_unlink_async+0x11a>
1b56: ba ac 00 00 00 mov $0xac,%edx
1b5b: b8 33 02 00 00 mov $0x233,%eax
1b60: e8 fc ff ff ff call 1b61 <start_unlink_async+0x116>
1b65: b8 0a 00 00 00 mov $0xa,%eax
1b6a: 8b 35 00 00 00 00 mov 0x0,%esi
1b70: e8 fc ff ff ff call 1b71 <start_unlink_async+0x126>
1b75: 8b 14 24 mov (%esp),%edx
1b78: 65 33 15 14 00 00 00 xor %gs:0x14,%edx
1b7f: 75 1f jne 1ba0 <start_unlink_async+0x155>
1b81: 5f pop %edi
1b82: 8d 14 30 lea (%eax,%esi,1),%edx
1b85: 8d 83 a8 00 00 00 lea 0xa8(%ebx),%eax
1b8b: 5b pop %ebx
1b8c: 5e pop %esi
1b8d: 5f pop %edi
1b8e: 5d pop %ebp
1b8f: e9 fc ff ff ff jmp 1b90 <start_unlink_async+0x145>
1b94: 8b 04 24 mov (%esp),%eax
1b97: 65 33 05 14 00 00 00 xor %gs:0x14,%eax
1b9e: 74 05 je 1ba5 <start_unlink_async+0x15a>
1ba0: e8 fc ff ff ff call 1ba1 <start_unlink_async+0x156>
1ba5: 5e pop %esi
1ba6: 5b pop %ebx
1ba7: 5e pop %esi
1ba8: 5f pop %edi
1ba9: 5d pop %ebp
1baa: c3 ret


It does not make much sense to me but I think that it crashes iside this list
manipulation:

prev = ehci->async;
while (prev->qh_next.qh != qh)
prev = prev->qh_next.qh;

prev->hw_next = qh->hw_next;
prev->qh_next = qh->qh_next;
wmb ();

--
Ondrej Zary
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on
On Mon, 30 Nov 2009, Ondrej Zary wrote:

> It does not make much sense to me but I think that it crashes iside this list
> manipulation:
>
> prev = ehci->async;
> while (prev->qh_next.qh != qh)
> prev = prev->qh_next.qh;

Yes, it's crashing in the "while" test because prev is NULL. This
means the code is looking for qh in the async list but not finding it.
That's supposed to be impossible.

The assembly code is peculiar because it includes stuff that isn't in
the source code! For example, right at this point (after the end of
the loop) there's a test to see whether prev is NULL. Where could that
have come from? Do you have any idea?

> prev->hw_next = qh->hw_next;
> prev->qh_next = qh->qh_next;
> wmb ();

These lines aren't reached.

Does this happen every time you disconnect the Nexio?

You can try patching that loop. If prev is NULL then print an error
message in the log, including the value of qh and the value of
ehci->async, and jump past the following three statements.

With that change the system shouldn't crash, although khubd might hang.
But we still need to find out how this could have happened. Try
collecting a usbmon trace while running the test; then let's compare
the usbmon output with the error messages in the log.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ondrej Zary on
On Monday 30 November 2009, Alan Stern wrote:
> On Mon, 30 Nov 2009, Ondrej Zary wrote:
> > It does not make much sense to me but I think that it crashes iside this
> > list manipulation:
> >
> > prev = ehci->async;
> > while (prev->qh_next.qh != qh)
> > prev = prev->qh_next.qh;
>
> Yes, it's crashing in the "while" test because prev is NULL. This
> means the code is looking for qh in the async list but not finding it.
> That's supposed to be impossible.
>
> The assembly code is peculiar because it includes stuff that isn't in
> the source code! For example, right at this point (after the end of
> the loop) there's a test to see whether prev is NULL. Where could that
> have come from? Do you have any idea?

I'm not sure, I might did something wrong and left it there from my previous
debugging attempt.

> > prev->hw_next = qh->hw_next;
> > prev->qh_next = qh->qh_next;
> > wmb ();
>
> These lines aren't reached.
>
> Does this happen every time you disconnect the Nexio?

The crash happens almost always when disconnecting the touchscreen.
When booted without X, it often survives the first disconnect.

> You can try patching that loop. If prev is NULL then print an error
> message in the log, including the value of qh and the value of
> ehci->async, and jump past the following three statements.
>
> With that change the system shouldn't crash, although khubd might hang.
> But we still need to find out how this could have happened. Try
> collecting a usbmon trace while running the test; then let's compare
> the usbmon output with the error messages in the log.

gcc version is: gcc (Debian 4.3.4-6) 4.3.4

Tried something like that before but it did not help at all.
The check is not triggered and it still oopses. Now it looks like this:

qh->qh_state = QH_STATE_UNLINK;
ehci->reclaim = qh = qh_get (qh);

prev = ehci->async;
if (!prev) {
printk("prev is NULL, qh=%p, ehci->async=%p\n", qh, ehci->async);
goto after;
}
while (prev->qh_next.qh != qh) {
if (!prev) {
printk("prev is NULL, qh=%p, ehci->async=%p\n", qh, ehci->async);
goto after;
}
prev = prev->qh_next.qh;
}

prev->hw_next = qh->hw_next;
prev->qh_next = qh->qh_next;
wmb ();
after:


objdump -D drivers/usb/host/ehci-hcd.o:

00002497 <start_unlink_async>:
2497: 57 push %edi
2498: 56 push %esi
2499: 53 push %ebx
249a: 89 c3 mov %eax,%ebx
249c: 83 ec 04 sub $0x4,%esp
249f: 65 a1 14 00 00 00 mov %gs:0x14,%eax
24a5: 89 04 24 mov %eax,(%esp)
24a8: 31 c0 xor %eax,%eax
24aa: 8b 43 04 mov 0x4(%ebx),%eax
24ad: 8b 38 mov (%eax),%edi
24af: 3b 53 14 cmp 0x14(%ebx),%edx
24b2: 75 34 jne 24e8 <start_unlink_async+0x51>
24b4: 83 7b fc 00 cmpl $0x0,-0x4(%ebx)
24b8: 0f 84 e6 00 00 00 je 25a4 <start_unlink_async+0x10d>
24be: 83 7b 18 00 cmpl $0x0,0x18(%ebx)
24c2: 0f 85 dc 00 00 00 jne 25a4 <start_unlink_async+0x10d>
24c8: 83 e7 df and $0xffffffdf,%edi
24cb: 8b 43 04 mov 0x4(%ebx),%eax
24ce: 89 38 mov %edi,(%eax)
24d0: f0 83 04 24 00 lock addl $0x0,(%esp)
24d5: 8d 83 08 01 00 00 lea 0x108(%ebx),%eax
24db: f0 80 a3 08 01 00 00 lock andb $0xfb,0x108(%ebx)
24e2: fb
24e3: e9 bc 00 00 00 jmp 25a4 <start_unlink_async+0x10d>
24e8: c6 42 68 02 movb $0x2,0x68(%edx)
24ec: 89 d0 mov %edx,%eax
24ee: e8 d6 e0 ff ff call 5c9 <qh_get>
24f3: 89 c1 mov %eax,%ecx
24f5: 89 43 18 mov %eax,0x18(%ebx)
24f8: 8b 43 14 mov 0x14(%ebx),%eax
24fb: 85 c0 test %eax,%eax
24fd: 89 c2 mov %eax,%edx
24ff: 75 1d jne 251e <start_unlink_async+0x87>
2501: 6a 00 push $0x0
2503: eb 09 jmp 250e <start_unlink_async+0x77>
2505: 85 d2 test %edx,%edx
2507: 74 04 je 250d <start_unlink_async+0x76>
2509: 89 f2 mov %esi,%edx
250b: eb 11 jmp 251e <start_unlink_async+0x87>
250d: 50 push %eax
250e: 51 push %ecx
250f: 68 53 01 00 00 push $0x153
2514: e8 fc ff ff ff call 2515 <start_unlink_async+0x7e>
2519: 83 c4 0c add $0xc,%esp
251c: eb 16 jmp 2534 <start_unlink_async+0x9d>
==> 251e: 8b 72 48 mov 0x48(%edx),%esi
2521: 39 ce cmp %ecx,%esi
2523: 75 e0 jne 2505 <start_unlink_async+0x6e>
2525: 8b 01 mov (%ecx),%eax
2527: 89 02 mov %eax,(%edx)
2529: 8b 41 48 mov 0x48(%ecx),%eax
252c: 89 42 48 mov %eax,0x48(%edx)
252f: f0 83 04 24 00 lock addl $0x0,(%esp)
2534: f6 43 fc 01 testb $0x1,-0x4(%ebx)
2538: 75 17 jne 2551 <start_unlink_async+0xba>
253a: 8b 14 24 mov (%esp),%edx
253d: 65 33 15 14 00 00 00 xor %gs:0x14,%edx
2544: 75 6a jne 25b0 <start_unlink_async+0x119>
2546: 5f pop %edi
2547: 89 d8 mov %ebx,%eax
2549: 5b pop %ebx
254a: 5e pop %esi
254b: 5f pop %edi
254c: e9 8b fe ff ff jmp 23dc <end_unlink_async>
2551: 83 cf 40 or $0x40,%edi
2554: 8b 43 04 mov 0x4(%ebx),%eax
2557: 89 38 mov %edi,(%eax)
2559: 8b 43 04 mov 0x4(%ebx),%eax
255c: 8b 00 mov (%eax),%eax
255e: 83 bb a8 00 00 00 00 cmpl $0x0,0xa8(%ebx)
2565: 74 0f je 2576 <start_unlink_async+0xdf>
2567: ba ac 00 00 00 mov $0xac,%edx
256c: b8 78 01 00 00 mov $0x178,%eax
2571: e8 fc ff ff ff call 2572 <start_unlink_async+0xdb>
2576: b8 0a 00 00 00 mov $0xa,%eax
257b: 8b 35 00 00 00 00 mov 0x0,%esi
2581: e8 fc ff ff ff call 2582 <start_unlink_async+0xeb>
2586: 8b 14 24 mov (%esp),%edx
2589: 65 33 15 14 00 00 00 xor %gs:0x14,%edx
2590: 75 1e jne 25b0 <start_unlink_async+0x119>
2592: 8d 14 30 lea (%eax,%esi,1),%edx
2595: 5e pop %esi
2596: 8d 83 a8 00 00 00 lea 0xa8(%ebx),%eax
259c: 5b pop %ebx
259d: 5e pop %esi
259e: 5f pop %edi
259f: e9 fc ff ff ff jmp 25a0 <start_unlink_async+0x109>
25a4: 8b 04 24 mov (%esp),%eax
25a7: 65 33 05 14 00 00 00 xor %gs:0x14,%eax
25ae: 74 05 je 25b5 <start_unlink_async+0x11e>
25b0: e8 fc ff ff ff call 25b1 <start_unlink_async+0x11a>
25b5: 5b pop %ebx
25b6: 5b pop %ebx
25b7: 5e pop %esi
25b8: 5f pop %edi
25b9: c3 ret


Decoded code from oops is obviously modified (push at 1c, call at 21
and sfence at 3c):


All code
========
0: 89 c1 mov %eax,%ecx
2: 89 43 18 mov %eax,0x18(%ebx)
5: 8b 43 14 mov 0x14(%ebx),%eax
8: 85 c0 test %eax,%eax
a: 89 c2 mov %eax,%edx
c: 75 1d jne 0x2b
e: 6a 00 push $0x0
10: eb 09 jmp 0x1b
12: 85 d2 test %edx,%edx
14: 74 04 je 0x1a
16: 89 f2 mov %esi,%edx
18: eb 11 jmp 0x2b
1a: 50 push %eax
1b: 51 push %ecx
1c: 68 5f 7f d4 f7 push $0xf7d47f5f
21: e8 92 a5 57 c9 call 0xc957a5b8
26: 83 c4 0c add $0xc,%esp
29: eb 16 jmp 0x41
2b:* 8b 72 48 mov 0x48(%edx),%esi <-- trapping instruction
2e: 39 ce cmp %ecx,%esi
30: 75 e0 jne 0x12
32: 8b 01 mov (%ecx),%eax
34: 89 02 mov %eax,(%edx)
36: 8b 41 48 mov 0x48(%ecx),%eax
39: 89 42 48 mov %eax,0x48(%edx)
3c: 0f ae f8 sfence
3f: 89 .byte 0x89

Code starting with the faulting instruction
===========================================
0: 8b 72 48 mov 0x48(%edx),%esi
3: 39 ce cmp %ecx,%esi
5: 75 e0 jne 0xffffffe7
7: 8b 01 mov (%ecx),%eax
9: 89 02 mov %eax,(%edx)
b: 8b 41 48 mov 0x48(%ecx),%eax
e: 89 42 48 mov %eax,0x48(%edx)
11: 0f ae f8 sfence
14: 89 .byte 0x89



--
Ondrej Zary
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/