Attempted summary of suspend-blockers LKML thread [Kernel]

Prev: [PATCH 6/7] usb: iowarrior: fix misuse of return value of copy_to_user()
Next: Get Back To Me Immediately....

From: Arve Hjønnevåg on 2 Aug 2010 23:30

On Mon, Aug 2, 2010 at 5:08 PM, <david(a)lang.hm> wrote:
> On Mon, 2 Aug 2010, Paul E. McKenney wrote:
>
>> On Sun, Aug 01, 2010 at 10:06:34PM -0700, david(a)lang.hm wrote:
>>>
>>> On Sun, 1 Aug 2010, Arjan van de Ven wrote:
>>>
>>>> I'm a little worried that this whole "I need to block suspend" is
>>>> temporary. Yes today there is silicon from ARM and Intel where suspend
>>>> is a heavy operation, yet at the same time it's not all THAT heavy
>>>> anymore.... at least on the Intel side it's good enough to use pretty
>>>> much all the time (when the screen is off for now, but that's a memory
>>>> controller issue more than anything else). I'm pretty sure the ARM guys
>>>> will not be far behind.
>>>
>>> remember that this 'block suspend' is really 'block overriding the
>>> fact that there are still runable processes and suspending anyway"
>>>
>>> having it labeled as 'suspend blocker' or even 'wakelock' makes it
>>> sound as if it blocks any attempt to suspend, and I'm not sure
>>> that's what's really intended. Itsounds like the normal syspend
>>> process would continue to work, just this 'ignore if these other
>>> apps are busy' mode of operation would not work.
>>>
>>> which makes me wonder, would it be possible to tell the normal idle
>>> detection mechanism to ignore specific processes when deciding if it
>>> should suspend or not? how about only considering processes in one
>>> cgroup when deciding to suspend and ignoring all others?
>>
>> Why not flesh this out and compare it to the draft requirements?
>> (I expect to be sending another version by end of day Pacific Time.)
>>
>> The biggest issue I see right off-hand is that a straightforward
>> implementation of your idea would require moving processes from one
>> cgroup to another when acquiring or releasing a suspend blocker, which
>> from what I understand would be way to heavyweight. �On the other hand,
>> if acquiring and releasing a suspend blocker does not move the process
>> from one cgroup to another, then you need something very like the
>> suspend-blocker mechanism to handle those processes that are permitted
>> to acquire suspend blockers, and which are thus not a member of the
>> cgroup in question.
>>
>> That said, I did see some hint from the Android guys that it -might-
>> be possible to leverage cgroups in the way that you suggest might help
>> save power during times when suspend was blocked but (for example) the
>> screen was turned off. �The idea would be to freeze the cgroup whenever
>> the screen blanked, even if suspend was blocked. �The biggest issue
>> here is that any process that can hold a suspend blocker must never to
>> an unconditional wait on any process in this cgroup. �Seems to me that
>> this should be possible in theory, but the devil would be in the details.
>>
>> If I am misunderstanding your proposal, please enlighten me!
>
> you are close, but I think what I'm proposing is actually simpler (assuming
> that the scheduler can be configured to generate the appropriate stats)
>
> my thought was not to move applications between cgroups as they
> aquire/release the suspend-block lock, bur rather to say that any
> application that you would trust to get the suspend-block lock should be in
> cgroup A while all other applications are in cgroup B
>
> when you are deciding if the system shoudl go to sleep because it is idle,
> ignore the activity of all applications in cgroup B
>
> if cgroup A applications are busy, the system is not idle and should not
> suspend.
>

Triggering suspend from idle has been suggested before. However, idle
is not a signal that it is safe to suspend since timers stop in
suspend (or the code could temporarily be waiting on a non-wakeup
interrupt). If you add suspend blockers or wakelocks to prevent
suspend while events you care about are pending, then it does not make
a lot of sense to prevent suspend just because the cpu is not idle.

> this requires that the applications in cgroup A actually go idle as opposed
> to simply releaseing the suspend-block lock, but it would mean that there
> are no application changes required for to move a system from the status
> "even if it's busy, go ahead ans suspend" to "this application is important,
> don't suspend if it's got work to do", it would just be classifying the
> application in one cgroup or the other.
>
> This assumes that an application that you would trust to hold the
> suspend-block lock is going to be well behaved (if it isn't, how can you
> trust it to not grab the lock inappropriatly?)
>
> David Lang
>

--
Arve Hj�nnev�g
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Arve Hjønnevåg on 3 Aug 2010 00:20

On Sat, Jul 31, 2010 at 10:58 AM, Paul E. McKenney
<paulmck(a)linux.vnet.ibm.com> wrote:
....
> REQUIREMENTS
>
> o � � � Reduce the system's power consumption in order to (1) extend
> � � � �battery life and (2) preserve state until AC power can be obtained.
>
> o � � � It is necessary to be able to use power-naive applications.
> � � � �Many of these applications were designed for use in PC platforms
> � � � �where power consumption has historically not been of great
> � � � �concern, due to either (1) the availability of AC power or (2)
> � � � �relatively undemanding laptop battery-lifetime expectations. �The
> � � � �system must be capable of running these power-naive applications
> � � � �without requiring that these applications be modified, and must
> � � � �be capable of reasonable power efficiency even when power-naive
> � � � �applications are available.
>
> o � � � If the display is powered off, there is no need to run any
> � � � �application whose only effect is to update the display.
>
> � � � �Although one could simply block such an application when it next
> � � � �tries to access the display, it appears that it is highly
> � � � �desirable that the application also be prevented from
> � � � �consuming power computing anything that will not be displayed.
> � � � �Furthermore, whatever mechanism is used must operate on
> � � � �power-naive applications that do not use blocking system calls.
>
> o � � � In order to avoid overrunning hardware and/or kernel buffers,
> � � � �input events must be delivered to the corresponding application
> � � � �in a timely fashion. �The application might or might not be
> � � � �required to actually process the events in a timely fashion,
> � � � �depending on the specific application.
>
> � � � �In particular, if user input that would prevent the system
> � � � �from entering a low-power state is received while the system is
> � � � �transitioning into a low-power state, the system must transition
> � � � �back out of the low-power state so that it can hand the user
> � � � �input off to the corresponding application.
>
> o � � � If a power-aware application receives user input, then that
> � � � �application must be given the opportunity to process that
> � � � �input.
>
> o � � � A power-aware application must be able to efficiently communicate
> � � � �its needs to the system, so that such communication can be
> � � � �performed on hot code paths. �Communication via open() and
> � � � �close() is considered too slow, but communication via ioctl()
> � � � �is acceptable.
>

The problem with using open and close to prevent an allow suspend is
not that it is too slow but that it interferes with collecting stats.
The wakelock code has a sysfs interface that allow you to use a
open/write/close sequence to block or unblock suspend. There is no
limit to the amount of kernel memory that a process can consume with
this interface, so the suspend blocker patchset uses a /dev interface
with ioctls to block or unblock suspend and it destroys the kernel
object when the file descriptor is closed.

> o � � � Power-naive applications must be prohibited from controlling
> � � � �the system power state. �One acceptable approach is through
> � � � �use of group permissions on a special power-control device.
>
> o � � � Statistics of the power-control actions taken by power-aware
> � � � �applications must be provided, and must be keyed off of program
> � � � �name.
>

We don't key the stats off the program name, but having useful
statistics is critical too us. The current code in linux-next does not
appear to allow this (I'm referring to pm_stay_awake here, etc not
pm-qos.)

> o � � � Power-aware applications can make use of power-naive infrastructure.
> � � � �This means that a power-aware application must have some way,
> � � � �whether explicit or implicit, to ensure that any power-naive
> � � � �infrastructure is permitted to run when a power-aware application
> � � � �needs it to run.
>
> o � � � When a power-aware application is preventing the system from
> � � � �shutting down, and is also waiting on a power-naive application,
> � � � �the power-aware application must set a timeout to handle
> � � � �the possibility that the power-naive application might halt
> � � � �or otherwise fail. �(Such timeouts are also used to limit the
> � � � �number of kernel modifications required.)

wake-lock/suspend-blocker timeouts have nothing to do with the timeout
used by applications when waiting for a response from a less trusted
application.

>
> o � � � If no power-aware or power-optimized application are indicating
> � � � �a need for the system to remain operating, the system is permitted
> � � � �(even encouraged!) to suspend all execution, even if power-naive
> � � � �applications are runnable. �(This requirement did appear to be
> � � � �somewhat controversial.)

I would say it should suspend even if power aware applications are
runnable. Most applications do not exclusively perform critical work.

>
> o � � � Transition to low-power state must be efficient. �In particular,
> � � � �methods based on repeated attempts to suspend are considered to
> � � � �be too inefficient to be useful.
>

It must be power-efficient. Repeated attempts to suspend will kill the
idle battery life.

> o � � � Individual peripherals and CPUs must still use standard
> � � � �power-conservation measures, for example, transitioning CPUs into
> � � � �low-power states on idle and powering down peripheral devices
> � � � �and hardware accelerators that have not been recently used.
>
> o � � � The API that controls the system power state must be
> � � � �accessible both from Android's Java replacement, from
> � � � �userland C code, and from kernel C code (both process
> � � � �level and irq code, but not NMI handlers).
>
> o � � � Any initialization of the API that controls the system power
> � � � �state must be unconditional, so as to be free from failure.
> � � � �(I don't currently understand how this relates, probably due to
> � � � �my current insufficient understanding of the proposed patch set.)
>

Unconditional initialization makes it easier to add suspend blockers
to existing kernel code since you don't have to add new failure exit
paths. It is not a strong requirement.

> o � � � The API that controls the system power state must operate
> � � � �correctly on SMP systems of modest size. �(My guess is that
> � � � �"modest" means up to four CPUs, maybe up to eight CPUs.)
>
> o � � � Any QoS-based solution must take display and user-input
> � � � �state into account. �In other words, the QoS must be
> � � � �expressed as a function of the display and the user-input
> � � � �states.
>
> o � � � Transitioning to extremely low power states requires saving
> � � � �and restoring DRAM and/or cache SRAM state, which in itself
> � � � �consumes significant energy. �The power savings must therefore
> � � � �be balanced against the energy consumed in the state
> � � � �transitions.
>
> o � � � The current Android userspace API must be supported in order
> � � � �to support existing device software.
>
>

--
Arve Hj�nnev�g
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Paul Menage on 3 Aug 2010 00:40

On Sun, Aug 1, 2010 at 11:53 PM, Florian Mickler <florian(a)mickler.org> wrote:
>
> Thinking about it.. I don't know much about cgroups, but I think a
> process can only be in one cgroup at a time.

A thread can only be in one cgroup in each hierarchy at one time. You
can mount multiple cgroups hierarchies, with different resource
controllers on different hierarchies.

>
> b) you can't use cgroup for other purposes anymore. I.e. if you want to
> have 2 groups that each only have half of the memory available, how
> would you then integrate the cgroup-ignore-for-idle-approach with this?

You could mount the subsystem that provides the "ignore-for-idle"
support on one hierarchy, and partition the trusted/untrusted
processes that way, and the memory controller subsystem on a different
hierarchy, with whatever split you wanted for memory controls.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: david on 3 Aug 2010 00:50

On Mon, 2 Aug 2010, Arve Hj?nnev?g wrote:

> On Mon, Aug 2, 2010 at 5:08 PM, <david(a)lang.hm> wrote:
>> On Mon, 2 Aug 2010, Paul E. McKenney wrote:
>>
>>
>> you are close, but I think what I'm proposing is actually simpler (assuming
>> that the scheduler can be configured to generate the appropriate stats)
>>
>> my thought was not to move applications between cgroups as they
>> aquire/release the suspend-block lock, bur rather to say that any
>> application that you would trust to get the suspend-block lock should be in
>> cgroup A while all other applications are in cgroup B
>>
>> when you are deciding if the system shoudl go to sleep because it is idle,
>> ignore the activity of all applications in cgroup B
>>
>> if cgroup A applications are busy, the system is not idle and should not
>> suspend.
>>
>
> Triggering suspend from idle has been suggested before. However, idle
> is not a signal that it is safe to suspend since timers stop in
> suspend (or the code could temporarily be waiting on a non-wakeup
> interrupt). If you add suspend blockers or wakelocks to prevent
> suspend while events you care about are pending, then it does not make
> a lot of sense to prevent suspend just because the cpu is not idle.

isn't this a matter of making the suspend decision look at what timers
have been set to expire in the near future and/or tweaking how long the
system needs to be idle before going to sleep?

to properly do a good job at suspending hyperagressivly you need to look
at when you need to wake back up (after all, if you are only going to
sleep for 1 second, it makes no sense to try and enter a sleep state that
takes .5 seconds to get into and .6 seconds to get out of, you need to
pick a lighter power saving mode)

if you really want to have the application be able to say 'I am ready for
you to go to sleep NOW and I don't want any chance of waking back up until
the system is ready for me to do so' it may be possible to have a special
dev file that when a program attempts to read from it the program is
considered to have been idle forever (expiring any 'delay since last
activity' timers that are running for that process) and when the device
wakes back up (or after a little bit of time if the device decides not to
go back to sleep), allow the return from the blocking call to allow the
app to continue running.

but I am not sure that something like that is needed. I think just
checking for timers that are due to expire 'soon' and tweaking how long
the system must be 'idle' before it decides to go to sleep should be good
enough.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Paul Menage on 3 Aug 2010 00:50

On Mon, Aug 2, 2010 at 2:06 AM, <david(a)lang.hm> wrote:
>
> yes, it could mean a doubleing in the number of cgroups that you need on a
> system. and if there are other features like this you can end up in a
> geometric explosion in the number of cgroups.

No, it would be additive - you can mount different subsystems on
separate hierarchies. So if you had X divisions for memory, Y
divisions for CPU and Z divisions for suspend-blocking (where Z=2,
probably?) you could mount three separate hierarchies and have X+Y+Z
complexity, not X*Y*Z.

(Not that I have a strong opinion on whether cgroups is an appropriate
mechanism for solving this problem - just that the problem you forsee
shouldn't occur in practice).

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Prev: [PATCH 6/7] usb: iowarrior: fix misuse of return value of copy_to_user()
Next: Get Back To Me Immediately....