From: Alan Stern on
On Thu, 24 Jun 2010, Rafael J. Wysocki wrote:

> Ah, one piece is missing. Namely, the waiting /sys/power/wakeup_count reader
> needs to be woken up when events_in_progress goes down to zero.

It also needs to abort immediately if a signal is received.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on
On Thu, 24 Jun 2010, Rafael J. Wysocki wrote:

> > > And what happens if the device gets a second wakeup event before the timer
> > > for the first one expires?
> >
> > Good question. I don't have an answer to it at the moment, but it seems to
> > arise from using a single timer for all events.
> >
> > It looks like it's simpler to make pm_wakeup_event() allocate a timer for each
> > event and make the timer function remove it. That would cause suspend to
> > be blocked until the timer expires without a way to cancel it earlier, though.
>
> So, I decided to try this after all.
>
> Below is a new version of the patch. It introduces pm_stay_awake(dev) and
> pm_relax() that play the roles of the "old" pm_wakeup_begin() and
> pm_wakeup_end().
>
> pm_wakeup_event() now takes an extra timeout argument and uses it for
> deferred execution of pm_relax(). So, one can either use the
> pm_stay_awake(dev) / pm_relax() pair, or use pm_wakeup_event(dev, timeout)
> if the ending is under someone else's control.
>
> In addition to that, pm_get_wakeup_count() blocks until events_in_progress is
> zero.
>
> Please tell me what you think.

This is slightly different from the wakelock design. Each call to
pm_stay_awake() must be paired with a call to pm_relax(), allowing one
device to have multiple concurrent critical sections, whereas calls to
pm_wakeup_event() must not be paired with anything. With wakelocks,
you couldn't have multiple pending events for the same device. I'm not
sure which model is better in practice. No doubt the Android people
will prefer their way.

This requires you to define an explicit PCI_WAKEUP_COOLDOWN delay. I
think that's okay; I had to do something similar with USB and SCSI.
(And I still think it would be a good idea to prevent workqueue threads
from freezing until their queues are empty.)

Instead of allocating the work structures dynamically, would you be
better off using a memory pool?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on
On Thu, 24 Jun 2010, Rafael J. Wysocki wrote:

> > This is slightly different from the wakelock design. Each call to
> > pm_stay_awake() must be paired with a call to pm_relax(), allowing one
> > device to have multiple concurrent critical sections, whereas calls to
> > pm_wakeup_event() must not be paired with anything. With wakelocks,
> > you couldn't have multiple pending events for the same device.
>
> You could, but you needed to define multiple wakelocks for the same device for
> this purpose.

Yeah, okay, but who's going to do that?

> > I'm not sure which model is better in practice. No doubt the Android people
> > will prefer their way.
>
> I suppose so.

It may not make a significant difference in the end. You can always
emulate the wakelock approach by not calling pm_stay_awake() when you
know there is an earlier call still pending.

> > This requires you to define an explicit PCI_WAKEUP_COOLDOWN delay. I
> > think that's okay; I had to do something similar with USB and SCSI.
> > (And I still think it would be a good idea to prevent workqueue threads
> > from freezing until their queues are empty.)
>
> I guess you mean the freezable ones?

Yes. The unfreezable workqueue threads don't have to worry about
getting frozen while their queues are non-empty. :-)

> I'm not sure if that helps a lot, because
> new work items may still be added after the workqueue thread has been frozen.

That's not the point. If a wakeup handler queues a work item (for
example, by calling pm_request_resume) then it wouldn't need to guess a
timeout. The work item would be guaranteed to run before the system
could suspend again.

> > Instead of allocating the work structures dynamically, would you be
> > better off using a memory pool?
>
> Well, it would be kind of equivalent to defining my own slab cache for that,
> wouldn't it?

I suppose so. It would make the GFP_ATOMIC allocations a little more
reliable.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on
On Fri, 25 Jun 2010, Rafael J. Wysocki wrote:

> So, there it goes.
>
> I decided not to play with memory allocations at this point, because I really
> don't expect pm_wakeup_event() to be heavily used initially. If there are more
> users and it's called more frequently, we can always switch to using a separate
> slab cache.
>
> Hopefully, I haven't overlooked anything vitally important this time.
>
> Please tell me what you think.

Obviously comments still need to be added. Beyond that...

> --- /dev/null
> +++ linux-2.6/drivers/base/power/wakeup.c
> @@ -0,0 +1,143 @@
> +
> +#include <linux/device.h>
> +#include <linux/slab.h>
> +#include <linux/sched.h>
> +#include <linux/capability.h>
> +#include <linux/pm.h>
> +
> +bool events_check_enabled;
> +
> +static unsigned long event_count;
> +static unsigned long saved_event_count;
> +static unsigned long events_in_progress;
> +static spinlock_t events_lock;

Use static DEFINE_SPINLOCK(events_lock) instead.

> +static DECLARE_WAIT_QUEUE_HEAD(events_wait_queue);
> +
> +void pm_wakeup_events_init(void)
> +{
> + spin_lock_init(&events_lock);
> +}

Then this routine won't be needed.

> +unsigned long pm_dev_wakeup_count(struct device *dev)
> +{
> + unsigned long flags;
> + unsigned long count;
> +
> + spin_lock_irqsave(&events_lock, flags);
> + count = dev->power.wakeup_count;
> + spin_unlock_irqrestore(&events_lock, flags);
> + return count;
> +}

Are the spin_lock calls needed here? I doubt it.

> --- linux-2.6.orig/kernel/power/power.h
> +++ linux-2.6/kernel/power/power.h
> @@ -184,6 +184,15 @@ static inline void suspend_test_finish(c
> #ifdef CONFIG_PM_SLEEP
> /* kernel/power/main.c */
> extern int pm_notifier_call_chain(unsigned long val);
> +
> +/* drivers/base/power/wakeup.c */
> +extern bool events_check_enabled;
> +
> +extern void pm_wakeup_events_init(void);
> +extern bool pm_check_wakeup_events(void);
> +extern bool pm_check_wakeup_events_final(void);
> +extern bool pm_get_wakeup_count(unsigned long *count);
> +extern bool pm_save_wakeup_count(unsigned long count);
> #endif

This is unfortunate. These declarations belong in a file that can
also be #included by drivers/base/power/wakeup.c. Otherwise future
changes might cause type mismatches the compiler won't be able to
catch.

> @@ -511,18 +513,24 @@ int hibernation_platform_enter(void)
>
> local_irq_disable();
> sysdev_suspend(PMSG_HIBERNATE);
> + if (!pm_check_wakeup_events()) {
> + error = -EAGAIN;
> + goto Power_up;
> + }
> +
> hibernation_ops->enter();
> /* We should never get here */
> while (1);
>
> - /*
> - * We don't need to reenable the nonboot CPUs or resume consoles, since
> - * the system is going to be halted anyway.
> - */
> + Power_up:
> + sysdev_resume();
> + local_irq_enable();
> + enable_nonboot_cpus();
> +
> Platform_finish:
> hibernation_ops->finish();
>
> - dpm_suspend_noirq(PMSG_RESTORE);
> + dpm_resume_noirq(PMSG_RESTORE);

Is this a bug fix that crept in along with the other changes?

> --- linux-2.6.orig/drivers/pci/pci.h
> +++ linux-2.6/drivers/pci/pci.h
> @@ -6,6 +6,8 @@
> #define PCI_CFG_SPACE_SIZE 256
> #define PCI_CFG_SPACE_EXP_SIZE 4096
>
> +#define PCI_WAKEUP_COOLDOWN 100

This definition can go directly in pci.c, since it isn't used anywhere
else.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on
On Fri, 25 Jun 2010, Rafael J. Wysocki wrote:

> > That's not the point. If a wakeup handler queues a work item (for
> > example, by calling pm_request_resume) then it wouldn't need to guess a
> > timeout. The work item would be guaranteed to run before the system
> > could suspend again.
>
> You seem to be referring to the PM workqueue specifically. Perhaps it would be
> better to special-case it and stop it by adding a barrier work during suspend
> instead of just freezing? Then, it wouldn't need to be singlethread any more.

The barrier work would have to be queued to each CPU's thread. That
would be okay.

Hmm, it looks like wait_event_freezable() and
wait_event_freezable_timeout() could use similar changes: If the
condition is true then they shouldn't try to freeze the caller.

> Still, I think the timeout is necessary anyway in case the driver simply
> doesn't handle the event and user space needs time to catch up. Unfortunately,
> the PCI wakeup code doesn't know what happens next in advance.

That could all be handled by the lower driver. Still, a 100-ms timeout
isn't going to make a significant difference, since a suspend/resume
cycle will take a comparable length of time.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/