s2ram slow (radeon) / failing (usb) [Kernel]

Prev: intel_txt: enable VMXON check with SMX in KVM
Next: SCSI: Support HW reset for EH and polling scheme for scsi device

From: Alan Stern on 5 May 2010 17:40

On Wed, 5 May 2010, Jiri Kosina wrote:

> > > Ok, I've been digging some further...
> > >
> > > The hid_device_probe properly returns -ENODEV, but:
> > >
> > > Call trace:
> > > [ 3228.866146] [<ffffffffa01a00e6>] hid_device_probe+0xd6/0x1f0 [hid]
> > > return -ENODEV
> > > [ 3228.874594] [<ffffffff8130995a>] driver_probe_device+0xaa/0x1d0
> > > calls inlined really_probe from drivers/base/dd.c
> > > which ALLWAYS returns 0:
> > > dd.c:147 /*
> > > 148 * Ignore errors returned by ->probe so that the next driver can try
> > > 149 * its luck.
> > > 150 */
> > > 151 ret = 0;
> > > and has on line 139 (under same failure label):
> > > dev->driver = NULL;
> > > [ 3228.882758] [<ffffffff81309b20>] ? __device_attach+0x0/0x50
> > > [ 3228.890555] [<ffffffff81309b6b>] __device_attach+0x4b/0x50
> > > lets 0 bubble up
> > > [ 3228.898272] [<ffffffff81308d28>] bus_for_each_drv+0x68/0x90
> > > lets 0 bubble up
> > > [ 3228.906080] [<ffffffff81309c3b>] device_attach+0x8b/0xa0
> > > lets 0 bubble up
> > > [ 3228.913603] [<ffffffff81308b15>] bus_probe_device+0x25/0x40
> > > returns void and does WARN_ON(device_attach() < 0)
> > > [ 3228.921356] [<ffffffff81307166>] device_add+0x3d6/0x610
> > > returns 0 here as there was no local error
> > > [ 3228.928772] [<ffffffffa019fc53>] hid_add_device+0x183/0x1e0 [hid]
> > > [ 3228.937098] [<ffffffffa01b4a77>] usbhid_probe+0x287/0x420 [usbhid]
> > > [ 3228.945535] [<ffffffffa005006d>] usb_probe_interface+0x14d/0x230 [usbcore]
> > > ...
> > >
> > > So IMHO in hid_add_device() we should also check for hdev->dev.driver
> > > when device_add() returns 0 and consider that one being NULL as a
> > > (possible) error.

Note that it is perfectly normal for devices to be registered on a bus
without a driver. Perhaps the usbhid core doesn't expect this, though,
or perhaps it doesn't make sense for HID devices. Regardless, I don't
see how this could cause the problem.

Earlier, Bruno said that the hang occurs in hid_cancel_delayed_stuff(),
presumably during one of its cancel_work_sync() calls, and presumably
because the workqueue has been frozen. But as far as I can tell,
cancel_work_sync() should work just fine if the workqueue has been
frozen. Maybe this should be investigated more closely.

Bruno, can you confirm that the hang occurs during one of those
cancel_work_sync() calls?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Bruno Prémont on 6 May 2010 13:50

On Wed, 05 May 2010 Alan Stern <stern(a)rowland.harvard.edu> wrote:
> On Wed, 5 May 2010, Jiri Kosina wrote:
> > > > Ok, I've been digging some further...
> > > >
> > > > The hid_device_probe properly returns -ENODEV, but:
> > > >
> > > > Call trace:
> > > > [ 3228.866146] [<ffffffffa01a00e6>] hid_device_probe+0xd6/0x1f0 [hid]
> > > > return -ENODEV
> > > > [ 3228.874594] [<ffffffff8130995a>] driver_probe_device+0xaa/0x1d0
> > > > calls inlined really_probe from drivers/base/dd.c
> > > > which ALLWAYS returns 0:
> > > > dd.c:147 /*
> > > > 148 * Ignore errors returned by ->probe so that the next driver can try
> > > > 149 * its luck.
> > > > 150 */
> > > > 151 ret = 0;
> > > > and has on line 139 (under same failure label):
> > > > dev->driver = NULL;
> > > > [ 3228.882758] [<ffffffff81309b20>] ? __device_attach+0x0/0x50
> > > > [ 3228.890555] [<ffffffff81309b6b>] __device_attach+0x4b/0x50
> > > > lets 0 bubble up
> > > > [ 3228.898272] [<ffffffff81308d28>] bus_for_each_drv+0x68/0x90
> > > > lets 0 bubble up
> > > > [ 3228.906080] [<ffffffff81309c3b>] device_attach+0x8b/0xa0
> > > > lets 0 bubble up
> > > > [ 3228.913603] [<ffffffff81308b15>] bus_probe_device+0x25/0x40
> > > > returns void and does WARN_ON(device_attach() < 0)
> > > > [ 3228.921356] [<ffffffff81307166>] device_add+0x3d6/0x610
> > > > returns 0 here as there was no local error
> > > > [ 3228.928772] [<ffffffffa019fc53>] hid_add_device+0x183/0x1e0 [hid]
> > > > [ 3228.937098] [<ffffffffa01b4a77>] usbhid_probe+0x287/0x420 [usbhid]
> > > > [ 3228.945535] [<ffffffffa005006d>] usb_probe_interface+0x14d/0x230 [usbcore]
> > > > ...
> > > >
> > > > So IMHO in hid_add_device() we should also check for hdev->dev.driver
> > > > when device_add() returns 0 and consider that one being NULL as a
> > > > (possible) error.
>
> Note that it is perfectly normal for devices to be registered on a bus
> without a driver. Perhaps the usbhid core doesn't expect this, though,
> or perhaps it doesn't make sense for HID devices. Regardless, I don't
> see how this could cause the problem.
>
> Earlier, Bruno said that the hang occurs in hid_cancel_delayed_stuff(),
> presumably during one of its cancel_work_sync() calls, and presumably
> because the workqueue has been frozen. But as far as I can tell,
> cancel_work_sync() should work just fine if the workqueue has been
> frozen. Maybe this should be investigated more closely.
>
> Bruno, can you confirm that the hang occurs during one of those
> cancel_work_sync() calls?

No, it's not one of the cancel_work_sync() that hangs but it's the
del_timer_sync() right before them that hangs!
(del_timer_sync() also hangs if I put it last, so the cancel_work_sync()
don't hang anything)

static void hid_cancel_delayed_stuff(struct usbhid_device *usbhid)
{
del_timer_sync(&usbhid->io_retry); /* this one never returns */
cancel_work_sync(&usbhid->restart_work);
cancel_work_sync(&usbhid->reset_work);
}

Thanks,
Bruno
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Alan Stern on 6 May 2010 14:50

On Thu, 6 May 2010, Bruno [UTF-8] Prémont wrote:

> > Bruno, can you confirm that the hang occurs during one of those
> > cancel_work_sync() calls?
>
> No, it's not one of the cancel_work_sync() that hangs but it's the
> del_timer_sync() right before them that hangs!
> (del_timer_sync() also hangs if I put it last, so the cancel_work_sync()
> don't hang anything)
>
> static void hid_cancel_delayed_stuff(struct usbhid_device *usbhid)
> {
> del_timer_sync(&usbhid->io_retry); /* this one never returns */
> cancel_work_sync(&usbhid->restart_work);
> cancel_work_sync(&usbhid->reset_work);
> }

Okay, I see what the problem is. In usbhid_start() there's a bunch of
statements initializing parts of the usbhid structure. When probing
fails those statements don't get executed, so the timer and workqueue
things aren't set up properly.

This patch should fix it.

Alan Stern

Index: usb-2.6/drivers/hid/usbhid/hid-core.c
===================================================================
--- usb-2.6.orig/drivers/hid/usbhid/hid-core.c
+++ usb-2.6/drivers/hid/usbhid/hid-core.c
@@ -998,13 +998,6 @@ static int usbhid_start(struct hid_devic
}
}

- init_waitqueue_head(&usbhid->wait);
- INIT_WORK(&usbhid->reset_work, hid_reset);
- INIT_WORK(&usbhid->restart_work, __usbhid_restart_queues);
- setup_timer(&usbhid->io_retry, hid_retry_timeout, (unsigned long) hid);
-
- spin_lock_init(&usbhid->lock);
-
usbhid->urbctrl = usb_alloc_urb(0, GFP_KERNEL);
if (!usbhid->urbctrl) {
ret = -ENOMEM;
@@ -1180,6 +1173,12 @@ static int usbhid_probe(struct usb_inter
usbhid->intf = intf;
usbhid->ifnum = interface->desc.bInterfaceNumber;

+ init_waitqueue_head(&usbhid->wait);
+ INIT_WORK(&usbhid->reset_work, hid_reset);
+ INIT_WORK(&usbhid->restart_work, __usbhid_restart_queues);
+ setup_timer(&usbhid->io_retry, hid_retry_timeout, (unsigned long) hid);
+ spin_lock_init(&usbhid->lock);
+
ret = hid_add_device(hid);
if (ret) {
if (ret != -ENODEV)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Bruno Prémont on 6 May 2010 17:00

On Thu, 06 May 2010 Alan Stern <stern(a)rowland.harvard.edu> wrote:
> On Thu, 6 May 2010, Bruno [UTF-8] Prémont wrote:
>
> > > Bruno, can you confirm that the hang occurs during one of those
> > > cancel_work_sync() calls?
> >
> > No, it's not one of the cancel_work_sync() that hangs but it's the
> > del_timer_sync() right before them that hangs!
> > (del_timer_sync() also hangs if I put it last, so the cancel_work_sync()
> > don't hang anything)
> >
> > static void hid_cancel_delayed_stuff(struct usbhid_device *usbhid)
> > {
> > del_timer_sync(&usbhid->io_retry); /* this one never returns */
> > cancel_work_sync(&usbhid->restart_work);
> > cancel_work_sync(&usbhid->reset_work);
> > }
>
> Okay, I see what the problem is. In usbhid_start() there's a bunch of
> statements initializing parts of the usbhid structure. When probing
> fails those statements don't get executed, so the timer and workqueue
> things aren't set up properly.
>
> This patch should fix it.

This very much reminds me the resume issue with the same keyboard on a
!CONFIG_SMP system back in February when the fix was to copy/move
usbhid->intf = intf; from usbhid_start() to usbhid_probe()!

Hopefully these are all those initializations that need to be taken
care of...

With this patch system now it passes "devices" level pm_test as well as
full suspend process, even multiple times in a row (though it's still
damn slow to resume IF no_console_suspend is passed to kernel - radeon
KMS?, but that's a new branch from start of this thread)

Reported-by: Bruno Prémont <bonbons(a)linux-vserver.org>
Tested-By: Bruno Prémont <bonbons(a)linux-vserver.org>

Now it can continue hunting the next issues preventing smooth S3
experience...

Thanks,
Bruno

> Index: usb-2.6/drivers/hid/usbhid/hid-core.c
> ===================================================================
> --- usb-2.6.orig/drivers/hid/usbhid/hid-core.c
> +++ usb-2.6/drivers/hid/usbhid/hid-core.c
> @@ -998,13 +998,6 @@ static int usbhid_start(struct hid_devic
> }
> }
>
> - init_waitqueue_head(&usbhid->wait);
> - INIT_WORK(&usbhid->reset_work, hid_reset);
> - INIT_WORK(&usbhid->restart_work, __usbhid_restart_queues);
> - setup_timer(&usbhid->io_retry, hid_retry_timeout, (unsigned long) hid);
> -
> - spin_lock_init(&usbhid->lock);
> -
> usbhid->urbctrl = usb_alloc_urb(0, GFP_KERNEL);
> if (!usbhid->urbctrl) {
> ret = -ENOMEM;
> @@ -1180,6 +1173,12 @@ static int usbhid_probe(struct usb_inter
> usbhid->intf = intf;
> usbhid->ifnum = interface->desc.bInterfaceNumber;
>
> + init_waitqueue_head(&usbhid->wait);
> + INIT_WORK(&usbhid->reset_work, hid_reset);
> + INIT_WORK(&usbhid->restart_work, __usbhid_restart_queues);
> + setup_timer(&usbhid->io_retry, hid_retry_timeout, (unsigned long) hid);
> + spin_lock_init(&usbhid->lock);
> +
> ret = hid_add_device(hid);
> if (ret) {
> if (ret != -ENODEV)
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jiri Kosina on 7 May 2010 04:30

On Thu, 6 May 2010, Alan Stern wrote:

> On Thu, 6 May 2010, Bruno [UTF-8] Prémont wrote:
>
> > > Bruno, can you confirm that the hang occurs during one of those
> > > cancel_work_sync() calls?
> >
> > No, it's not one of the cancel_work_sync() that hangs but it's the
> > del_timer_sync() right before them that hangs!
> > (del_timer_sync() also hangs if I put it last, so the cancel_work_sync()
> > don't hang anything)
> >
> > static void hid_cancel_delayed_stuff(struct usbhid_device *usbhid)
> > {
> > del_timer_sync(&usbhid->io_retry); /* this one never returns */
> > cancel_work_sync(&usbhid->restart_work);
> > cancel_work_sync(&usbhid->reset_work);
> > }
>
> Okay, I see what the problem is. In usbhid_start() there's a bunch of
> statements initializing parts of the usbhid structure. When probing
> fails those statements don't get executed, so the timer and workqueue
> things aren't set up properly.
>
> This patch should fix it.

Indeed, good catch, thanks a lot Alan!

Could you please send to me with your Signed-off-by: line and short
changelog, so that I could apply it with proper credit being given to you?

And Bruno, thanks a lot for excellent testing efforts.

--
Jiri Kosina
SUSE Labs, Novell Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev |
Pages: 1 2 3 4 5 6
Prev: intel_txt: enable VMXON check with SMX in KVM
Next: SCSI: Support HW reset for EH and polling scheme for scsi device