From: Rafael J. Wysocki on
On Saturday, August 07, 2010, Arve Hj�nnev�g wrote:
> 2010/8/6 Alan Stern <stern(a)rowland.harvard.edu>:
> > On Thu, 5 Aug 2010, Arve Hj�nnev�g wrote:
> >
> >> count, tells you how many times the wakelock was activated. If a
> >> wakelock prevented suspend for a long time a large count tells you it
> >> handled a lot of events while a small count tells you it took a long
> >> time to process the events, or the wakelock was not released properly.
> >
> > As noted, we already have this.
> >
>
> Almost. We have it when a device is passed in.

Sure. And what are the other cases (details, please)?

> >> expire_count, tells you how many times the timeout expired. For the
> >> input event wakelock in the android kernel (which has a timeout) an
> >> expire count that matches the count tells you that someone opened an
> >> input device but is not reading from it (this has happened several
> >> times).
> >
> > This is a little tricky. Rafael's model currently does not allow
> > wakeup events started by pm_wakeup_event() to be cancelled any way
> > other than by having their timer expire. This essentially means that
> > for some devices, expire_count will always be the same as count and for
> > others it will always be 0. To change this would require adding an
> > extra timer struct, which could be done (in fact, an earlier version of
> > the code included it). It would be nice if we could avoid the need.
> >
> > Does Android use any kernel-internal wakelocks both with a timer and
> > with active cancellation?
> >
>
> I don't know if they are all kernel-internal but these drivers appear
> to use timeouts and active cancellation on the same wakelock:
> wifi driver, mmc core, alarm driver, evdev (suspend blocker version
> removes the timeout).

You previously said you didn't need timeouted wakelocks in the kernel, so
I guess that was incorrect.

> >> wake_count, tells you that this is the first wakelock that was
> >> acquired in the resume path. This is currently less useful than I
> >> would like on the Nexus One since it is usually "SMD_RPCCALL" which
> >> does not tell me a lot.
> >
> > This could be done easily enough, but if it's not very useful then
> > there's no point.
> >
> It is useful there is no other way to tell what triggered a wakeup,
> but it would probably be better to just track wakeup interrupts/events
> elsewhere.
>
> >> active_since, tells you how long a a still active wakelock has been
> >> active. If someone activated a wakelock and never released it, it will
> >> be obvious here.
> >
> > Easily added. But you didn't mention any field saying whether the
> > wakelock is currently active. That could be added too (although it
> > would be racy -- but for detecting unreleased wakelocks you wouldn't
> > care).
> >
>
> These are the reported stats, not the fields in the stats structure.
> The wakelock code has an active flag. If we want to keep the
> pm_stay_wake nesting (which I would argue against), we would need an
> active count. It would also require a handle, which is a change Rafael
> said would not fly.
>
> >> total_time, total time the wake lock has been active. This one should
> >> be obvious.
> >
> > Also easily added.
> >
> Only with a handle passed to all the calls.

Well, I'm kind of tired of this "my solution is the only acceptable one"
mindset. IMHO, it's totally counter productive.

> >> sleep_time, total time the wake lock has been active when the screen was off.
> >
> > Not applicable to general systems. Is there anything like it that
> > _would_ apply in general?
> >
>
> The screen off is how it is used on android, the stats is keyed of
> what user space wrote to /sys/power/state. If "on" was written the
> sleep time is not updated.
>
> >> max_time, longest time the wakelock was active uninterrupted. This
> >> used less often, but the battery on a device was draining fast, but
> >> the problem went away before looking at the stats this will show if a
> >> wakelock was active for a long time.
> >
> > Again, easily added. The only drawback is that all these additions
> > will bloat the size of struct device. Of course, that's why you used
> > separately-allocated structures for your wakelocks. Maybe we can
> > change to do the same; it seems likely that the majority of device
> > structures won't ever be used for wakeup events.
> >
>
> Since many wakelocks are not associated with s struct device we need a
> separate object for this anyway.
>
> >> >> and I would prefer that the kernel interfaces would
> >> >> encourage drivers to block suspend until user space has consumed the
> >> >> event, which works for the android user space, instead of just long
> >> >> enough to work with a hypothetical user space power manager.
> >
> > Rafael doesn't _discourage_ drivers from doing this. However you have
> > to keep in mind that many kernel developers are accustomed to working
> > on systems (mostly PCs) with a different range of hardware devices from
> > embedded systems like your phones. With PCI devices(*), for example,
> > there's no clear point where a wakeup event gets handed off to
> > userspace.
> >
> > On the other hand, there's no reason the input layer shouldn't use
> > pm_stay_awake and pm_relax. It simply hasn't been implemented yet.
> ...
>
> The merged user space interface makes this unclear to me. When I first
> used suspend on android I had a power manager process that opened all
> the input devices and reset a screen off timeout every time there was
> an input event. If the input layer uses pm_stay_awake to block suspend
> when the queue is not empty, this will deadlock with the current
> interface since reading the wake count will block forever if an input
> event occurred right after the power manager decides to suspend.

No, in that case suspend will be aborted, IIUC.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on
On Saturday, August 07, 2010, Rafael J. Wysocki wrote:
> On Friday, August 06, 2010, Alan Stern wrote:
> > On Thu, 5 Aug 2010, Arve Hj�nnev�g wrote:
> ...
> > But what sorts of things qualify as wakeup events? Right now, the code
> > handles only events coming by way of the PME# signal (or its platform
> > equivalent). But that signal usually gets activated only when a PCI
> > device is in a low-power mode; if the device is at full power then it
> > simply generates an IRQ. It's the same event, but reported to the
> > kernel in a different way. So consider...
> >
> > Case 1: The system is suspending and the PCI device has already been
> > placed in D3hot when an event occurs. PME# is activated,
> > the wakeup event is reported, the suspend is aborted, and the
> > system won't try to suspend again for at least 100 ms. Good.
> >
> > Case 2: The system is running normally and the PCI device is at full
> > power when an event occurs. PME# isn't activated and
> > pm_wakeup_event doesn't get called. Then when the system
> > tries to suspend 25 ms later, there's nothing to prevent it
> > even though the event is still being processed. Bad.
> >
> > In case 2 the race has not been resolved. It seems to me that the
> > only proper solution is to call pm_wakeup_event for _every_ PCI
> > interrupt. This may be too much to add to a hot path, but what's the
> > alternative?
>
> Arguably not every PCI interrupt should be regarded as a wakeup event, so
> I think we can simply say in the cases when that's necessary the driver should
> be responsible for using pm_wakeup_event() or pm_stay_awake() / pm_relax() as
> appropriate.
>
> My patch only added it to the bus-level code which covered the PME-based
> wakeup events that _cannot_ be handled by device drivers.

Also please note that it depends a good deal on the definition of a "wakeup
event". Under the definition used when my patch was being developed, ie. that
wakeup events are the events that would wake up the system from a sleep state,
PCI interrupts cannot be wakeup events, unless the given device remains in the
full power state although the system has been suspended (standard PCI devices
are not allowed to generate signals except for PME from low-power states).

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on
On Saturday, August 07, 2010, david(a)lang.hm wrote:
> On Sat, 7 Aug 2010, Mark Brown wrote:
>
> > On Fri, Aug 06, 2010 at 04:35:59PM -0700, david(a)lang.hm wrote:
> >> On Fri, 6 Aug 2010, Paul E. McKenney wrote:
....
> What we want to have happen in an ideal world is
>
> when the storage isn't needed (between reads) the storage should shutdown
> to as low a power state as possible.
>
> when the CPU isn't needed (between decoding bursts) the CPU and as much of
> the system as possible (potentially including some banks of RAM) should
> shutdown to as low a power state as possible.

Unfortunately, the criteria for "not being needed" are not really
straightforward and one of the wakelocks' roles is to work around this issue.

> today there are two ways of this happening, via the idle approach (on
> everything except Android), or via suspend (on Android)
>
> Given that many platforms cannot go to into suspend while still playing
> audio, the idle approach is not going to be able to be eliminated (and in
> fact will be the most common approach to be used/deugged in terms of the
> types of platforms), it seems to me that there may be a significant amount
> of value in seeing if there is a way to change Android to use this
> approach as well instead of having two different systems competing to do
> the same job.

There is a fundamental obstacle to that, though. Namely, the Android
developers say that the idle-based approach doesn't lead to sufficient energy
savings due to periodic timers and "polling applications". Technically that
boils down to the interrupt sources that remain active in the idle-based case
and that are shut down during suspend. If you found a way to deactivate all of
them from the idle context in a non-racy fashion, that would probably satisfy
the Android's needs too.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: david on
On Sat, 7 Aug 2010, Rafael J. Wysocki wrote:

> On Saturday, August 07, 2010, david(a)lang.hm wrote:
>> On Sat, 7 Aug 2010, Mark Brown wrote:
>>
>>> On Fri, Aug 06, 2010 at 04:35:59PM -0700, david(a)lang.hm wrote:
>>>> On Fri, 6 Aug 2010, Paul E. McKenney wrote:
> ...
>> What we want to have happen in an ideal world is
>>
>> when the storage isn't needed (between reads) the storage should shutdown
>> to as low a power state as possible.
>>
>> when the CPU isn't needed (between decoding bursts) the CPU and as much of
>> the system as possible (potentially including some banks of RAM) should
>> shutdown to as low a power state as possible.
>
> Unfortunately, the criteria for "not being needed" are not really
> straightforward and one of the wakelocks' roles is to work around this issue.

if you can ignore the activity caused by the other "unimportant" processes
in the system, why is this much different then just the one process
running, in which case standard power management sleeps work pretty well.

>> today there are two ways of this happening, via the idle approach (on
>> everything except Android), or via suspend (on Android)
>>
>> Given that many platforms cannot go to into suspend while still playing
>> audio, the idle approach is not going to be able to be eliminated (and in
>> fact will be the most common approach to be used/deugged in terms of the
>> types of platforms), it seems to me that there may be a significant amount
>> of value in seeing if there is a way to change Android to use this
>> approach as well instead of having two different systems competing to do
>> the same job.
>
> There is a fundamental obstacle to that, though. Namely, the Android
> developers say that the idle-based approach doesn't lead to sufficient energy
> savings due to periodic timers and "polling applications".

polling applications can be solved by deciding that they aren't going to
be allowed to affect the power management decision (don't consider their
CPU useage when deciding to go to sleep, don't consider their timers when
deciding when to wake back up)

> Technically that
> boils down to the interrupt sources that remain active in the idle-based case
> and that are shut down during suspend. If you found a way to deactivate all of
> them from the idle context in a non-racy fashion, that would probably satisfy
> the Android's needs too.

well, we already have similar capibility for other peripherals (I keep
pointing to drive spin down as an example), the key to avoiding the
races seems to be in the drivers supporting this.

the fact that Android is making it possible for suspend to selectivly
avoid disabling them makes me think that a lot of the work needed to make
this happen has probably been done. look at what would happen in a suspend
if it decided to leave everything else on and just disable the one thing,
that should e the same thing that happens if you are just disabling that
one thing for idle sleep.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arve Hjønnevåg on
2010/8/7 Rafael J. Wysocki <rjw(a)sisk.pl>:
> On Saturday, August 07, 2010, Arve Hj�nnev�g wrote:
>> 2010/8/6 Alan Stern <stern(a)rowland.harvard.edu>:
>> > On Thu, 5 Aug 2010, Arve Hj�nnev�g wrote:
>> >
>> >> count, tells you how many times the wakelock was activated. If a
>> >> wakelock prevented suspend for a long time a large count tells you it
>> >> handled a lot of events while a small count tells you it took a long
>> >> time to process the events, or the wakelock was not released properly.
>> >
>> > As noted, we already have this.
>> >
>>
>> Almost. We have it when a device is passed in.
>
> Sure. �And what are the other cases (details, please)?
>

The suspend blockers I added I my suspend blocker patchset were not
directly associated with a device. The evdev changes could be modified
to share a device, but it would give less detail since a separate
queue is created for each client that opens the device. The suspend
blocking work api would have to change so the caller to passes a
device in, which I think would make that api less flexible. Mostly the
problem is that we need separate stats for wakelocks created by a
single driver. For instance we will still need a user-space interface
to block suspend on android devices (lower level services than the
power manager need to block suspend), with the stats in the device
struct we have to create a new device for every wakelock user space
creates in the kernel.

There is also the issue of reading the stats. It is a lot easier to
read a single stats file, than looping though every device on the
system (when most of the devices never block suspend).

>> >> expire_count, tells you how many times the timeout expired. For the
>> >> input event wakelock in the android kernel (which has a timeout) an
>> >> expire count that matches the count tells you that someone opened an
>> >> input device but is not reading from it (this has happened several
>> >> times).
>> >
>> > This is a little tricky. �Rafael's model currently does not allow
>> > wakeup events started by pm_wakeup_event() to be cancelled any way
>> > other than by having their timer expire. �This essentially means that
>> > for some devices, expire_count will always be the same as count and for
>> > others it will always be 0. �To change this would require adding an
>> > extra timer struct, which could be done (in fact, an earlier version of
>> > the code included it). �It would be nice if we could avoid the need.
>> >
>> > Does Android use any kernel-internal wakelocks both with a timer and
>> > with active cancellation?
>> >
>>
>> I don't know if they are all kernel-internal but these drivers appear
>> to use timeouts and active cancellation on the same wakelock:
>> wifi driver, mmc core, alarm driver, evdev (suspend blocker version
>> removes the timeout).
>
> You previously said you didn't need timeouted wakelocks in the kernel, so
> I guess that was incorrect.
>

I don't know what you are reffering to. We have always stated that we
need timeouts in the kernel to pass events through other kernel layers
that do not use wakelocks (that list is much longer than the list
above which mixes timeouts and unlock on the same wakelock). The only
feature we do not use is the timeout feature in the user space
interface to kernel wakelocks.

>> >> wake_count, tells you that this is the first wakelock that was
>> >> acquired in the resume path. This is currently less useful than I
>> >> would like on the Nexus One since it is usually "SMD_RPCCALL" which
>> >> does not tell me a lot.
>> >
>> > This could be done easily enough, but if it's not very useful then
>> > there's no point.
>> >
>> It is useful there is no other way to tell what triggered a wakeup,
>> but it would probably be better to just track wakeup interrupts/events
>> elsewhere.
>>
>> >> active_since, tells you how long a a still active wakelock has been
>> >> active. If someone activated a wakelock and never released it, it will
>> >> be obvious here.
>> >
>> > Easily added. �But you didn't mention any field saying whether the
>> > wakelock is currently active. �That could be added too (although it
>> > would be racy -- but for detecting unreleased wakelocks you wouldn't
>> > care).
>> >
>>
>> These are the reported stats, not the fields in the stats structure.
>> The wakelock code has an active flag. If we want to keep the
>> pm_stay_wake nesting (which I would argue against), we would need an
>> active count. It would also require a handle, which is a change Rafael
>> said would not fly.
>>
>> >> total_time, total time the wake lock has been active. This one should
>> >> be obvious.
>> >
>> > Also easily added.
>> >
>> Only with a handle passed to all the calls.
>
> Well, I'm kind of tired of this "my solution is the only acceptable one"
> mindset. �IMHO, it's totally counter productive.
>

How do you propose to track how long a driver has blocked suspend when
you have an unblock call that takes no arguments.

>> >> sleep_time, total time the wake lock has been active when the screen was off.
>> >
>> > Not applicable to general systems. �Is there anything like it that
>> > _would_ apply in general?
>> >
>>
>> The screen off is how it is used on android, the stats is keyed of
>> what user space wrote to /sys/power/state. If "on" was written the
>> sleep time is not updated.
>>
>> >> max_time, longest time the wakelock was active uninterrupted. This
>> >> used less often, but the battery on a device was draining fast, but
>> >> the problem went away before looking at the stats this will show if a
>> >> wakelock was active for a long time.
>> >
>> > Again, easily added. �The only drawback is that all these additions
>> > will bloat the size of struct device. �Of course, that's why you used
>> > separately-allocated structures for your wakelocks. �Maybe we can
>> > change to do the same; it seems likely that the majority of device
>> > structures won't ever be used for wakeup events.
>> >
>>
>> Since many wakelocks are not associated with s struct device we need a
>> separate object for this anyway.
>>
>> >> >> and I would prefer that the kernel interfaces would
>> >> >> encourage drivers to block suspend until user space has consumed the
>> >> >> event, which works for the android user space, instead of just long
>> >> >> enough to work with a hypothetical user space power manager.
>> >
>> > Rafael doesn't _discourage_ drivers from doing this. �However you have
>> > to keep in mind that many kernel developers are accustomed to working
>> > on systems (mostly PCs) with a different range of hardware devices from
>> > embedded systems like your phones. �With PCI devices(*), for example,
>> > there's no clear point where a wakeup event gets handed off to
>> > userspace.
>> >
>> > On the other hand, there's no reason the input layer shouldn't use
>> > pm_stay_awake and pm_relax. �It simply hasn't been implemented yet.
>> ...
>>
>> The merged user space interface makes this unclear to me. When I first
>> used suspend on android I had a power manager process that opened all
>> the input devices and reset a screen off timeout every time there was
>> an input event. If the input layer uses pm_stay_awake to block suspend
>> when the queue is not empty, this will deadlock with the current
>> interface since reading the wake count will block forever if an input
>> event occurred right after the power manager decides to suspend.
>
> No, in that case suspend will be aborted, IIUC.
>

How? Your pm_get_wakeup_count function loops until events_in_progress becomes 0.

--
Arve Hj�nnev�g
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/