Attempted summary of suspend-blockers LKML thread [Kernel]

Prev: [PATCH 6/7] usb: iowarrior: fix misuse of return value of copy_to_user()
Next: Get Back To Me Immediately....

From: Arve Hjønnevåg on 5 Aug 2010 21:30

2010/8/5 Rafael J. Wysocki <rjw(a)sisk.pl>:
> On Friday, August 06, 2010, Arve Hj�nnev�g wrote:
>> 2010/8/5 Rafael J. Wysocki <rjw(a)sisk.pl>:
>> > On Thursday, August 05, 2010, Arve Hj�nnev�g wrote:
>> >> 2010/8/4 Rafael J. Wysocki <rjw(a)sisk.pl>:
>> >> > On Thursday, August 05, 2010, Arve Hj�nnev�g wrote:
>> >> >> On Wed, Aug 4, 2010 at 1:56 PM, Matthew Garrett <mjg59(a)srcf.ucam.org> wrote:
>> >> >> > On Wed, Aug 04, 2010 at 10:51:07PM +0200, Rafael J. Wysocki wrote:
>> >> >> >> On Wednesday, August 04, 2010, Matthew Garrett wrote:
>> >> >> >> > No! And that's precisely the issue. Android's existing behaviour could
>> >> >> >> > be entirely implemented in the form of binary that manually triggers
>> >> >> >> > suspend when (a) the screen is off and (b) no userspace applications
>> >> >> >> > have indicated that the system shouldn't sleep, except for the wakeup
>> >> >> >> > event race. Imagine the following:
>> >> >> >> >
>> >> >> >> > 1) The policy timeout is about to expire. No applications are holding
>> >> >> >> > wakelocks. The system will suspend providing nothing takes a wakelock.
>> >> >> >> > 2) A network packet arrives indicating an incoming SIP call
>> >> >> >> > 3) The VOIP application takes a wakelock and prevents the phone from
>> >> >> >> > suspending while the call is in progress
>> >> >> >> >
>> >> >> >> > What stops the system going to sleep between (2) and (3)? cgroups don't,
>> >> >> >> > because the voip app is an otherwise untrusted application that you've
>> >> >> >> > just told the scheduler to ignore.
>> >> >> >>
>> >> >> >> I _think_ you can use the just-merged /sys/power/wakeup_count mechanism to
>> >> >> >> avoid the race (if pm_wakeup_event() is called at 2)).
>> >> >> >
>> >> >> > Yes, I think that solves the problem. The only question then is whether
>> >> >>
>> >> >> How? By passing a timeout to pm_wakeup_event when the network driver
>> >> >> gets the packet or by passing 0. If you pass a timeout it is the same
>> >> >> as using a wakelock with a timeout and should work (assuming the
>> >> >> timeout you picked is long enough). If you don't pass a timeout it
>> >> >> does not work, since the packet may not be visible to user-space yet.
>> >> >
>> >> > Alternatively, pm_stay_awake() / pm_relax() can be used.
>> >> >
>> >>
>> >> Which makes the driver and/or network stack changes identical to using
>> >> wakelocks, right?
>> >
>> > Please refer to the Matthew's response.
>> >
>> >> >> > it's preferable to use cgroups or suspend fully, which is pretty much up
>> >> >> > to the implementation. In other words, is there a reason we're still
>> >> >>
>> >> >> I have seen no proposed way to use cgroups that will work. If you
>> >> >> leave some processes running while other processes are frozen you run
>> >> >> into problems when a frozen process holds a resource that a running
>> >> >> process needs.
>> >> >>
>> >> >>
>> >> >> > having this conversation? :) It'd be good to have some feedback from
>> >> >> > Google as to whether this satisfies their functional requirements.
>> >> >> >
>> >> >>
>> >> >> That is "this"? The merged code? If so, no it does not satisfy our
>> >> >> requirements. The in kernel api, while offering similar functionality
>> >> >> to the wakelock interface, does not use any handles which makes it
>> >> >> impossible to get reasonable stats (You don't know which pm_stay_awake
>> >> >> request pm_relax is reverting).
>> >> >
>> >> > Why is that a problem (out of curiosity)?
>> >> >
>> >>
>> >> Not having stats or not knowing what pm_relax is undoing? We need
>> >> stats to be able to debug the system.
>> >
>> > You have the stats in struct device and they are available via sysfs.
>> > I suppose they are insufficient, but I'd like to know why exactly.
>> >
>>
>> Our wakelock stats currently have
>> (name,)count,expire_count,wake_count,active_since,total_time,sleep_time,max_time
>> and last_change. Not all of these are equally important (total_time is
>> most important followed by active_since), but you only have count.
>> Also as discussed before, many wakelocks/suspendblockers are not
>> associated with a struct device.
>
> OK
>
> How much of that is used in practice and what for exactly?

count, tells you how many times the wakelock was activated. If a
wakelock prevented suspend for a long time a large count tells you it
handled a lot of events while a small count tells you it took a long
time to process the events, or the wakelock was not released properly.

expire_count, tells you how many times the timeout expired. For the
input event wakelock in the android kernel (which has a timeout) an
expire count that matches the count tells you that someone opened an
input device but is not reading from it (this has happened several
times).

wake_count, tells you that this is the first wakelock that was
acquired in the resume path. This is currently less useful than I
would like on the Nexus One since it is usually "SMD_RPCCALL" which
does not tell me a lot.

active_since, tells you how long a a still active wakelock has been
active. If someone activated a wakelock and never released it, it will
be obvious here.

total_time, total time the wake lock has been active. This one should
be obvious.

sleep_time, total time the wake lock has been active when the screen was off.

max_time, longest time the wakelock was active uninterrupted. This
used less often, but the battery on a device was draining fast, but
the problem went away before looking at the stats this will show if a
wakelock was active for a long time.

>
> Do you _really_ have to debug the wakelocks in drivers that much?
>

Wake locks in drivers sometimes need to be debugged. If the api has no
accountability, then these problems would take forever to fix.

>> >> If the system does not suspend
>> >> at all or is awake for too long, the wakelock stats tells us which
>> >> component is at fault. Since pm_stay_awake and pm_relax does not
>> >> operate on a handle, you cannot determine how long it prevented
>> >> suspend for.
>> >
>> > Well, if you need that, you can add a counter of "completed events" into
>>
>> We need more than that (see above).
>>
>> > struct dev_pm_info and a function similar to pm_relax() that
>> > will update that counter. �I don't think anyone will object to that change.
>> >
>>
>> What about adding a handle that is passed to all three functions?
>
> I don't think that will fly at this point.
>

Why not? I think allowing drivers to modify a global reference count
with no accountability is a terrible idea.

>> >> >> The proposed in user-space interface
>> >> >> of calling into every process that receives wakeup events before every
>> >> >> suspend call
>> >> >
>> >> > Well, �you don't really need to do that.
>> >> >
>> >>
>> >> Only if the driver blocks suspend until user-space has read the event.
>> >> This means that for android to work we need to block suspend when
>> >> input events are not processed, but a system using your scheme needs a
>> >> pm_wakeup_event call when the input event is queued. How to you switch
>> >> between them? Do we add separate ioctls in the input device to enable
>> >> each scheme? If someone has a single threaded user space power manager
>> >> that also reads input event it will deadlock if you block suspend
>> >> until it reads the input events since you block when reading the wake
>> >> count.
>> >
>> > Well, until someone actually tries to implement a power manager in user space
>> > it's a bit vague.
>> >
>>
>> Not having clear rules for what the drivers should do is a problem.
>> The comments in your code seem to advocate using timeouts instead of
>> overlapping pm_stay_awake/pm_relax sections. I find this
>> recommendation strange given all the opposition to
>> wakelock/suspendblocker timeouts.
>
> There's no recommendation either way.

I'm referring to this paragraph:

* Second, a wakeup event may be detected by one functional unit and processed
* by another one. In that case the unit that has detected it cannot really
* "close" the "no suspend" period associated with it, unless it knows in
* advance what's going to happen to the event during processing. This
* knowledge, however, may not be available to it, so it can simply
specify time
* to wait before the system can be suspended and pass it as the second
* argument of pm_wakeup_event().

>
>> But more importantly, calling
>> pm_wakeup_event with a timeout of 0 is incompatible with the android
>> user space code,
>
> Which I don't find really relevant, sorry.
>
>> and I would prefer that the kernel interfaces would
>> encourage drivers to block suspend until user space has consumed the
>> event, which works for the android user space, instead of just long
>> enough to work with a hypothetical user space power manager.
>
> Well, that are your personal preferences, which I respect. �I also have some
> personal preferences that are not necessarily followed by the kernel code.
>
> Thanks,
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at �http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at �http://www.tux.org/lkml/
>

--
Arve Hj�nnev�g
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: david on 6 Aug 2010 04:10

On Thu, 5 Aug 2010, Brian Swetland wrote:

> On Thu, Aug 5, 2010 at 6:01 PM, <david(a)lang.hm> wrote:
>> On Thu, 5 Aug 2010, Brian Swetland wrote:
>>> On Thu, Aug 5, 2010 at 5:16 PM, <david(a)lang.hm> wrote:
>>>>>>
>>>>>> So for an mp3 playback, does an Android suspend between data fetches?
>>>>>
>>>>> It can if the latency is long enough (which is why I point out low
>>>>> power audio which is usually high latency). For low latency (system
>>>>> sounds, etc) 10-25ms between buffers it's not practical to fully
>>>>> suspend but we will go to the lowest power state in idle if possible.
>>>>
>>>> the playback is able to continue even with all the clocks stopped? that
>>>> surprises me. I would hav expected it to be able to sleep while playing
>>>> audio, but not do a full suspend.
>>>
>>> Obviously not all clocks are stopped (the DSP and codec are powered
>>> and clocked, for example), but yeah we can clock gate and power gate
>>> the cpu and most other peripherals while audio is playing on a number
>>> of ARM SoC designs available today (and the past few years).
>>
>> does this then mean that you have multiple variations of suspend?
>>
>> for example, one where the audio stuff is left powered, and one where it
>> isn't?
>
> While the cpu (and the bulk of the system) is suspended, it's not
> uncommon for some peripherals to continue to operate -- for example a
> cellular radio, gps, low power audio playback, etc. Details will vary
> depending on the SoC and board design. It's not so much a different
> suspend mode (the system is still suspended), just a matter of whether
> a peripheral can operate independently (and if it is lower power for
> it to do so).

this helps, but isn't quite what I was trying to ask.

on a given piece of hardware, does suspend always leave the same
peripherals on, or do you sometimes power more things down than other
times when suspending?

David Lang

From: david on 6 Aug 2010 04:40

On Thu, 5 Aug 2010, Paul E. McKenney wrote:

> On Thu, Aug 05, 2010 at 01:26:18PM -0700, david(a)lang.hm wrote:
>> On Thu, 5 Aug 2010, kevin granade wrote:
>>
>>> On Thu, Aug 5, 2010 at 10:46 AM, <david(a)lang.hm> wrote:
>>>> On Thu, 5 Aug 2010, Paul E. McKenney wrote:
>>>>
>>>>> On Wed, Aug 04, 2010 at 10:18:40PM -0700, david(a)lang.hm wrote:
>>>>>>
>>>>>> On Wed, 4 Aug 2010, Paul E. McKenney wrote:
>>>>>>>
>>>>>>> On Wed, Aug 04, 2010 at 05:25:53PM -0700, david(a)lang.hm wrote:
>>>>>>>>
>>>>>>>> On Wed, 4 Aug 2010, Paul E. McKenney wrote:
>>>>>
>>>>> [ . . . ]
>>>>>
>>>> however, in the case of Android I think the timeouts have to end up being
>>>> _much_ longer. Otherwise you have the problem of loading an untrusted book
>>>> reader app on the device and the device suspends while you are reading the
>>>> page.
>>>>
>>>> currently Android works around this by having a wakelock held whenever the
>>>> display is on. This seems backwards to me, the display should be on because
>>>> the system is not suspended, not the system is prevented from suspending
>>>> because the display is on.
>>>>
>>>> Rather than having the display be on causing a wavelock to be held (with the
>>>> code that is controls the display having a timeout for how long it leaves
>>>> the display on), I would invert this and have the timeout be based on system
>>>> activity, and when it decides the system is not active, turn off the display
>>>> (along with other things as it suspends)
>>>
>>> IIRC, this was a major point of their (Android's) power management
>>> policy. User input of any kind would reset the "display active"
>>> timeout, which is the primary thing keeping random untrusted
>>> user-facing programs from being suspended while in use. They seemed
>>> to consider this to be a special case in their policy, but from the
>>> kernel's point of view it is just another suspend blocker being held.
>>>
>>> I'm not sure this is the best use case to look at though, because
>>> since it is user-facing, the timeout durations are on a different
>>> scale than the ones they are really worried about. I think another
>>> category of use case that they are worried about is:
>>>
>>> (in suspend) -> wakeup due to network -> process network activity -> suspend
>>>
>>> or an example that has been mentioned previously:
>>>
>>> (in suspend) -> wakeup due to alarm for audio processing -> process
>>> batch of audio -> suspend
>>
>> when you suspend the audio will shut off, so it's sleep ->wake ->
>> sleep, not suspend
>>
>>> In both of these cases, the display may never power on (phone might
>>> beep to indicate txt message or email, audio just keeps playing), so
>>> the magnitude of the "timeout" for suspending again should be very
>>> small. Specifically, they don't want there to be a timeout at all, so
>>> as little time as possible time is spent out of suspend in addition to
>>> the time required to handle the event that caused wakeup.
>>
>> it really depnds on the frequency of the wakeups.
>>
>> if you get a network packet once every 5 min and need to wake to
>> process it, staying awake for 20 seconds after finishing procesing
>> is FAR more significant than if you get a network packet once every
>> hour. It's not just the factor of 20 that simple math would indicate
>> because the time in suspend eats power as well.
>>
>> I don't know real numbers, so these are made up for this example
>>
>> if suspend (with the cell live to receive packets) is 10ma average
>> current and full power is 500ma average current
>>
>> packets every 5 min with .1 sec wake time will eat ~13maH per hour
>>
>> packets every 5 min with 10 second wake time will eat ~37maH per hour
>>
>> packets every hour with .1 sec wake time will eat ~10maH per hour
>>
>> packets every hour with 10 sec wake time will eat ~11maH per hour
>>
>> so if you have frequent wakeups, staying awake 100 times as long
>> will cut your battery life to 1/3 what it was before.
>>
>> if your wakeups are rare, it's about a 10% hit to stay awake 100
>> times as long.
>>
>> there is a lot of room for tuning the timeouts here.
>
> Especially given different scenarios, for example, audio playback
> when the device is in airplane mode. ;-)

hmm, I've been thinking and talking in terms of two classes of cgroups,
trusted and untrusted. I wonder if it would be possible to set timeouts
for each cgroup instead)

the system would go to sleep IFF all cgroups have been idle longer than
the idle time (with -1 idle time being 'ignore this cgroup')

if this could be done you could set longer times for things designed for
user-interaction than you do for other purposes.

you could set media to 0 idle time (so that as soon as it finishes
processing the system can sleep until the next timer)

to do this, the code making the decision would have to be able to find out
the following fairly cheaply.

1. for this cgroup, what was the last time something ran

2. for this cgroup, what is the next timer set

it would be nice to get network traffic/connection stats.

so two questions.

first, what else would you need to get accumulated for the cgroup

second, is there a fairly easy way to have these stats available?

for the 'last time it ran' stat, this seems like you could have a per-cpu
variable per cgroup that's fairly cheap to update, but you would need to
take a global lock to read accuratly (the lock may be expensive enough
that it's worth trying to read the variables from the other cpu without a
lock, just to see if it's remotely possible to sleep/suspend)

with timers, is it possible to have multiple timer wheels (one per
cgroup)?

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Mark Brown on 6 Aug 2010 08:40

On Thu, Aug 05, 2010 at 06:01:24PM -0700, david(a)lang.hm wrote:
> On Thu, 5 Aug 2010, Brian Swetland wrote:

>> Obviously not all clocks are stopped (the DSP and codec are powered
>> and clocked, for example), but yeah we can clock gate and power gate
>> the cpu and most other peripherals while audio is playing on a number
>> of ARM SoC designs available today (and the past few years).

> does this then mean that you have multiple variations of suspend?

> for example, one where the audio stuff is left powered, and one where it
> isn't?

This was the core of the issue I was raising in the last thread about
this (the one following the rename to suspend blockers). Essentially
what happens in a mainline context is that some subsystems can with
varying degress of optionality ignore some or all of the instruction to
suspend and keep bits of the system alive during suspend.

Those that stay alive will either have per subsystem handling or will be
outside the direct control of the kernel entirely (the modem is a good
example of the latter case in many systems - in terms of the software
it's essentially a parallel computer that's sitting in the system rather
than a perhiperal of the AP).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Mark Brown on 6 Aug 2010 08:40

On Fri, Aug 06, 2010 at 01:07:47AM -0700, david(a)lang.hm wrote:

> on a given piece of hardware, does suspend always leave the same
> peripherals on, or do you sometimes power more things down than other
> times when suspending?

Different bits of hardware get powered down depending on current system
state. In the audio case (which is so far as I know the only case for
this sort of stuff that currently does anything in mainline) we'll keep
alive any active paths (that is, paths carrying live audio) between
endpoints in the audio subsystem which have been explicitly marked as
staying alive during suspend. Other audio paths will be powered down
when the system suspends. During normal run time only paths that are
active will be powered up.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
Prev: [PATCH 6/7] usb: iowarrior: fix misuse of return value of copy_to_user()
Next: Get Back To Me Immediately....