From: Justin Piszcz on


On Tue, 30 Mar 2010, Alan Stern wrote:

> On Tue, 30 Mar 2010, Justin Piszcz wrote:
>
> Also, I'd like to see the contents of your /proc/interrupts. It looks
> like the OHCI controller shares an IRQ line with some other device.

Hi, you are correct:

$ cat /proc/interrupts
CPU0 CPU1
0: 127 32 IO-APIC-edge timer
1: 0 2 IO-APIC-edge i8042
7: 1 0 IO-APIC-edge
9: 0 0 IO-APIC-fasteoi acpi
20: 0 3 IO-APIC-fasteoi ehci_hcd:usb1
22: 0 0 IO-APIC-fasteoi sata_nv
23: 216 134543 IO-APIC-fasteoi sata_nv, ohci_hcd:usb2
27: 0 68 PCI-MSI-edge hda_intel
28: 4722 1583395 PCI-MSI-edge eth0
NMI: 0 0 Non-maskable interrupts
LOC: 5414110 5415173 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 0 0 Performance monitoring interrupts
PND: 0 0 Performance pending work
RES: 766744 123073 Rescheduling interrupts
CAL: 113 25 Function call interrupts
TLB: 1014 1029 TLB shootdowns
THR: 0 0 Threshold APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 19 19 Machine check polls
ERR: 1
MIS: 0
$
>
> Well, I'm making progress. Below is a new debugging patch to try in
> place of the first one. This time the dmesg log alone will be
> sufficient, no need for a usbmon trace. And the output should be a lot
> smaller, since the new patch doesn't print something every time an
> interrupt occurs, but rather only when you unplug the mouse.
>
> In fact, you might try unplugging the mouse while it still works and
> then plugging it back in. The difference between the debugging
> messages while everything is working and the same thing after the mouse
> fails should be informative.
Ok, I can try this as well.

>
> (By the way, these tests are meant to find out why your Xorg and khubd
> processes hang when the mouse fails, not for finding the original cause
> behind the mouse failure. That can be addressed later.)
This appears to occur only AFTER the mouse locks up, I do ctrl-alt-f1
and then X freezes up after that.

> Some of those reports indicate that a BIOS update could fix the
> problem. Have you checked your BIOS version?
The BIOS is outdated, I will create a Windows Boot CD and flash the BIOS
to the latest version. The hardware in question is an Optiplex 740. It is
running an older firmware version.. The latest firmware is from late 2009
(2.2.4): O740-224.EXE, but you cannot flash it in Linux so will test this
tomorrow, flash the latest bios, apply your latest patch, see if it recurs.
I did check the DIFF's for the Dell BIOS updates, none mention a USB problem
like the one in the kernel bug post (earlier Dell system).

Justin.


>
> Alan Stern
>
>
>
> Index: usb-2.6/drivers/usb/host/ohci-hcd.c
> ===================================================================
> --- usb-2.6.orig/drivers/usb/host/ohci-hcd.c
> +++ usb-2.6/drivers/usb/host/ohci-hcd.c
> @@ -292,6 +292,8 @@ static int ohci_urb_dequeue(struct usb_h
> if (urb_priv) {
> if (urb_priv->ed->state == ED_OPER)
> start_ed_unlink (ohci, urb_priv->ed);
> + ohci_info(ohci, "start unlink urb %p, ed %p tick %u\n",
> + urb, urb_priv->ed, urb_priv->ed->tick);
> }
> } else {
> /*
> @@ -324,6 +326,9 @@ ohci_endpoint_disable (struct usb_hcd *h
>
> if (!ed)
> return;
> + ohci_info(ohci, "disable ed %p (#%02x) state %d%s\n",
> + ed, ep->desc.bEndpointAddress, ed->state,
> + list_empty(&ed->td_list) ? "" : " (has tds)");
>
> rescan:
> spin_lock_irqsave (&ohci->lock, flags);
> Index: usb-2.6/drivers/usb/host/ohci-q.c
> ===================================================================
> --- usb-2.6.orig/drivers/usb/host/ohci-q.c
> +++ usb-2.6/drivers/usb/host/ohci-q.c
> @@ -912,6 +912,9 @@ rescan_all:
> * frame counter wraps and EDs with partially retired TDs
> */
> if (likely (HC_IS_RUNNING(ohci_to_hcd(ohci)->state))) {
> + ohci_info(ohci, "finish_unlinks: tick %u, ed %p %u, %d\n",
> + tick, ed, ed->tick,
> + tick_before(tick, ed->tick));
> if (tick_before (tick, ed->tick)) {
> skip_ed:
> last = &ed->ed_next;
> @@ -928,6 +931,8 @@ skip_ed:
> TD_MASK;
>
> /* INTR_WDH may need to clean up first */
> + ohci_info(ohci, "dma %llx head %x\n",
> + (unsigned long long) td->td_dma, head);
> if (td->td_dma != head) {
> if (ed == ohci->ed_to_check)
> ohci->ed_to_check = NULL;
> @@ -990,6 +995,8 @@ rescan_this:
> /* HC may have partly processed this TD */
> td_done (ohci, urb, td);
> urb_priv->td_cnt++;
> + ohci_info(ohci, "td_cnt %d length %d\n",
> + urb_priv->td_cnt, urb_priv->length);
>
> /* if URB is done, clean up */
> if (urb_priv->td_cnt == urb_priv->length) {
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Justin Piszcz on


On Wed, 31 Mar 2010, Tiago Vignatti wrote:

> Justin Piszcz wrote:
>>
>>
>> On Thu, 25 Mar 2010, Justin Piszcz wrote:
>>
>>>
>>>
>>> On Thu, 25 Mar 2010, Justin Piszcz wrote:
>>>
>>> The same problem has been reported by another person, he says his entire
>>> system freezes, which, it appears to do unless you can SSH into the box:
>>> http://www.openoffice.org/issues/show_bug.cgi?id=76797
>>>
>>> Look at his lspci listing.
>>> james(a)dv6105us:~$ lspci
>>> 00:00.0 RAM memory: nVidia Corporation C51 Host Bridge (rev a2)
>>> 00:00.1 RAM memory: nVidia Corporation C51 Memory Controller 0 (rev a2)
>>> 00:00.2 RAM memory: nVidia Corporation C51 Memory Controller 1 (rev a2)
>>>
>>> Here is mine:
>>> $ lspci
>>> 00:00.0 RAM memory: nVidia Corporation C51 Host Bridge (rev a2)
>>> 00:00.1 RAM memory: nVidia Corporation C51 Memory Controller 0 (rev a2)
>>> 00:00.2 RAM memory: nVidia Corporation C51 Memory Controller 1 (rev a2)
>>> 00:00.3 RAM memory: nVidia Corporation C51 Memory Controller 5 (rev a2)
>>>
>>> Looks like the bug may be in the USB subsystem for this chipset.
>>>
>>> Justin.
>>>
>>>
>>
>> Hi,
>>
>> And there it goes again *LOCK*
>> root 2190 0.5 1.5 37832 31424 tty7 Ds+ 09:00 0:12 /usr/bin/X
>> :0 vt7 -nolisten tcp -auth /var/lib/xdm/authdir/authfiles/A:0-N5V00o
>
> running X server with -nosilk helps something?

Hi,

After the BIOS update and request from Alan, if it *STILL* persists, I can try
this, thanks.

# grep -i silken Xorg.0.log*
Xorg.0.log:(==) NV(0): Silken mouse enabled
Xorg.0.log.old:(==) NV(0): Silken mouse enabled

Justin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Justin Piszcz on


On Wed, 31 Mar 2010, Justin Piszcz wrote:


Hi,

With the latest BIOS, I used the system today for 1-2 hours and could not
get it to repeat the crash (I had full debugging enabled) and Alan's patch
applied, I will continue to test because when you want it to crash, it
usually does not, but the BIOS update may have fixed it.

Latest BIOS for Dell Optiplex 740: 2.2.4 (upgraded today), it was upgraded
from 2.0.12.

Justin.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/