From: Paul E. McKenney on
On Mon, Aug 02, 2010 at 09:56:10PM -0700, Arve Hj�nnev�g wrote:
> On Sun, Aug 1, 2010 at 6:10 PM, Paul E. McKenney
> <paulmck(a)linux.vnet.ibm.com> wrote:
> > On Sun, Aug 01, 2010 at 03:47:08PM -0700, Arjan van de Ven wrote:
> ...
> >> Another one: freezing whole cgroups..... we have that today. it
> >> actually works quite well.... of course the hard part is the decision
> >> what to put in which cgroup, and at what frequency and duration you let
> >> cgroups run.
> >
> > Indeed, the Android guys seemed to be quite excited by cgroup freezing
> > until they thought about the application-classification problem.
> > Seems like it should be easy for some types of applications, but I do
> > admit that apps can have non-trivial and non-obvious dependencies.
>
> The dependencies is what made this solution uninteresting to us. For
> instance, we currently use cgroup scheduling to reduce the impact of
> some background tasks, but we occasionally saw a watchdog restart of
> the system process were critical services were waiting on a kernel
> mutex owned by a background task for more than 20 seconds. If we froze
> a cgroup instead, we would not hit this particular problem since tasks
> cannot be frozen while executing kernel code the same way they can be
> preempted, but nothing prevents a task from being frozen while holding
> a user-space resource.

Excellent point -- I had completely missed this failure mode!!!

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Paul E. McKenney on
On Mon, Aug 02, 2010 at 09:18:27PM -0700, Arve Hj�nnev�g wrote:
> On Sat, Jul 31, 2010 at 10:58 AM, Paul E. McKenney
> <paulmck(a)linux.vnet.ibm.com> wrote:
> ...

First, thank you very much for your review and feedback!

> > REQUIREMENTS
> >
> > o � � � Reduce the system's power consumption in order to (1) extend
> > � � � �battery life and (2) preserve state until AC power can be obtained.
> >
> > o � � � It is necessary to be able to use power-naive applications.
> > � � � �Many of these applications were designed for use in PC platforms
> > � � � �where power consumption has historically not been of great
> > � � � �concern, due to either (1) the availability of AC power or (2)
> > � � � �relatively undemanding laptop battery-lifetime expectations. �The
> > � � � �system must be capable of running these power-naive applications
> > � � � �without requiring that these applications be modified, and must
> > � � � �be capable of reasonable power efficiency even when power-naive
> > � � � �applications are available.
> >
> > o � � � If the display is powered off, there is no need to run any
> > � � � �application whose only effect is to update the display.
> >
> > � � � �Although one could simply block such an application when it next
> > � � � �tries to access the display, it appears that it is highly
> > � � � �desirable that the application also be prevented from
> > � � � �consuming power computing anything that will not be displayed.
> > � � � �Furthermore, whatever mechanism is used must operate on
> > � � � �power-naive applications that do not use blocking system calls.
> >
> > o � � � In order to avoid overrunning hardware and/or kernel buffers,
> > � � � �input events must be delivered to the corresponding application
> > � � � �in a timely fashion. �The application might or might not be
> > � � � �required to actually process the events in a timely fashion,
> > � � � �depending on the specific application.
> >
> > � � � �In particular, if user input that would prevent the system
> > � � � �from entering a low-power state is received while the system is
> > � � � �transitioning into a low-power state, the system must transition
> > � � � �back out of the low-power state so that it can hand the user
> > � � � �input off to the corresponding application.
> >
> > o � � � If a power-aware application receives user input, then that
> > � � � �application must be given the opportunity to process that
> > � � � �input.
> >
> > o � � � A power-aware application must be able to efficiently communicate
> > � � � �its needs to the system, so that such communication can be
> > � � � �performed on hot code paths. �Communication via open() and
> > � � � �close() is considered too slow, but communication via ioctl()
> > � � � �is acceptable.
>
> The problem with using open and close to prevent an allow suspend is
> not that it is too slow but that it interferes with collecting stats.
> The wakelock code has a sysfs interface that allow you to use a
> open/write/close sequence to block or unblock suspend. There is no
> limit to the amount of kernel memory that a process can consume with
> this interface, so the suspend blocker patchset uses a /dev interface
> with ioctls to block or unblock suspend and it destroys the kernel
> object when the file descriptor is closed.

Ah, I missed this point. What I am doing to adjust is to strike the
above requirement, and to add verbiage to the "statistics" requirement
about using ioctl() to implement suspend-blocker operations, so that the
statistics can be tracked based on the device being open throughout the
application's lifetime.

> > o � � � Power-naive applications must be prohibited from controlling
> > � � � �the system power state. �One acceptable approach is through
> > � � � �use of group permissions on a special power-control device.
> >
> > o � � � Statistics of the power-control actions taken by power-aware
> > � � � �applications must be provided, and must be keyed off of program
> > � � � �name.
>
> We don't key the stats off the program name, but having useful
> statistics is critical too us. The current code in linux-next does not
> appear to allow this (I'm referring to pm_stay_awake here, etc not
> pm-qos.)

OK, maybe I was confused earlier. So you do not track statistics via
the device being open throughout the application's lifetime?

I am not familiar with pm_stay_awake(), but will take a look at it.

> > o � � � Power-aware applications can make use of power-naive infrastructure.
> > � � � �This means that a power-aware application must have some way,
> > � � � �whether explicit or implicit, to ensure that any power-naive
> > � � � �infrastructure is permitted to run when a power-aware application
> > � � � �needs it to run.
> >
> > o � � � When a power-aware application is preventing the system from
> > � � � �shutting down, and is also waiting on a power-naive application,
> > � � � �the power-aware application must set a timeout to handle
> > � � � �the possibility that the power-naive application might halt
> > � � � �or otherwise fail. �(Such timeouts are also used to limit the
> > � � � �number of kernel modifications required.)
>
> wake-lock/suspend-blocker timeouts have nothing to do with the timeout
> used by applications when waiting for a response from a less trusted
> application.

OK, I moved this to a new "SUGGESTED USAGE" section and removed the
last (parenthesized) sentence.

> > o � � � If no power-aware or power-optimized application are indicating
> > � � � �a need for the system to remain operating, the system is permitted
> > � � � �(even encouraged!) to suspend all execution, even if power-naive
> > � � � �applications are runnable. �(This requirement did appear to be
> > � � � �somewhat controversial.)
>
> I would say it should suspend even if power aware applications are
> runnable. Most applications do not exclusively perform critical work.

The point being that a power-aware application does not block suspend
-unless- it holds a suspend blocker, correct?

Or am I missing some other subtlety?

> > o � � � Transition to low-power state must be efficient. �In particular,
> > � � � �methods based on repeated attempts to suspend are considered to
> > � � � �be too inefficient to be useful.
>
> It must be power-efficient. Repeated attempts to suspend will kill the
> idle battery life.

Good point! I changed "Transition to low-power state must be efficient"
to instead read "Transition to low-power state must be power-efficient."

> > o � � � Individual peripherals and CPUs must still use standard
> > � � � �power-conservation measures, for example, transitioning CPUs into
> > � � � �low-power states on idle and powering down peripheral devices
> > � � � �and hardware accelerators that have not been recently used.
> >
> > o � � � The API that controls the system power state must be
> > � � � �accessible both from Android's Java replacement, from
> > � � � �userland C code, and from kernel C code (both process
> > � � � �level and irq code, but not NMI handlers).
> >
> > o � � � Any initialization of the API that controls the system power
> > � � � �state must be unconditional, so as to be free from failure.
> > � � � �(I don't currently understand how this relates, probably due to
> > � � � �my current insufficient understanding of the proposed patch set.)
>
> Unconditional initialization makes it easier to add suspend blockers
> to existing kernel code since you don't have to add new failure exit
> paths. It is not a strong requirement.

Ah, that makes more sense! I moved this to a new "NICE-TO-HAVES"
section. I also changed the last parenthesized sentence to read
"Such unconditional initialization reduces the intrusiveness of the
the Android patchset." Does that work?

> > o � � � The API that controls the system power state must operate
> > � � � �correctly on SMP systems of modest size. �(My guess is that
> > � � � �"modest" means up to four CPUs, maybe up to eight CPUs.)
> >
> > o � � � Any QoS-based solution must take display and user-input
> > � � � �state into account. �In other words, the QoS must be
> > � � � �expressed as a function of the display and the user-input
> > � � � �states.
> >
> > o � � � Transitioning to extremely low power states requires saving
> > � � � �and restoring DRAM and/or cache SRAM state, which in itself
> > � � � �consumes significant energy. �The power savings must therefore
> > � � � �be balanced against the energy consumed in the state
> > � � � �transitions.
> >
> > o � � � The current Android userspace API must be supported in order
> > � � � �to support existing device software.

Thank you again for looking this over and for your comments!!!

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: James Bottomley on
On Mon, 2010-08-02 at 21:18 -0700, Arve Hjønnevåg wrote:
> > o A power-aware application must be able to efficiently communicate
> > its needs to the system, so that such communication can be
> > performed on hot code paths. Communication via open() and
> > close() is considered too slow, but communication via ioctl()
> > is acceptable.
> >
>
> The problem with using open and close to prevent an allow suspend is
> not that it is too slow but that it interferes with collecting stats.

Please elaborate on this. I expect the pm-qos stats interface will
collect stats across user open/close because that's how it currently
works. What's the problem?

> The wakelock code has a sysfs interface that allow you to use a
> open/write/close sequence to block or unblock suspend. There is no
> limit to the amount of kernel memory that a process can consume with
> this interface, so the suspend blocker patchset uses a /dev interface
> with ioctls to block or unblock suspend and it destroys the kernel
> object when the file descriptor is closed.

This is an implementation detail only. The pm-qos objects are long
lived, so their stats would be too. I would guess that explicit stat
clearing might be a useful option.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arve Hjønnevåg on
2010/8/3 James Bottomley <James.Bottomley(a)suse.de>:
> On Mon, 2010-08-02 at 21:18 -0700, Arve Hj�nnev�g wrote:
>> > o � � � A power-aware application must be able to efficiently communicate
>> > � � � �its needs to the system, so that such communication can be
>> > � � � �performed on hot code paths. �Communication via open() and
>> > � � � �close() is considered too slow, but communication via ioctl()
>> > � � � �is acceptable.
>> >
>>
>> The problem with using open and close to prevent an allow suspend is
>> not that it is too slow but that it interferes with collecting stats.
>
> Please elaborate on this. �I expect the pm-qos stats interface will
> collect stats across user open/close because that's how it currently
> works. �What's the problem?
>

The pm-qos interface creates the request object in open and destroys
it in release just like the suspend blocker interface. We need stats
for each client which is lost if you free the object every time you
unblock suspend.

Or are you talking about user space opening and closing the stats
interface (which does not cause any problems)?

>> The wakelock code has a sysfs interface that allow you to use a
>> open/write/close sequence to block or unblock suspend. There is no
>> limit to the amount of kernel memory that a process can consume with
>> this interface, so the suspend blocker patchset uses a /dev interface
>> with ioctls to block or unblock suspend and it destroys the kernel
>> object when the file descriptor is closed.
>
> This is an implementation detail only.

There is no way fix it without changing the user space visible
behavior of the API. The kernel does not know when it is safe to free
the objects.

>�The pm-qos objects are long
> lived, so their stats would be too. �I would guess that explicit stat
> clearing might be a useful option.
>

Which pm-qos objects are you referring to? The struct pm_qos_object
that backs each pm-qos class is long lived (I don't know why this is
named pm_qos_object), but we need stats in struct pm_qos_request_list.

--
Arve Hj�nnev�g
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arve Hjønnevåg on
2010/8/3 Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>:
> On Mon, Aug 02, 2010 at 09:18:27PM -0700, Arve Hj�nnev�g wrote:
>> On Sat, Jul 31, 2010 at 10:58 AM, Paul E. McKenney
>> <paulmck(a)linux.vnet.ibm.com> wrote:
....
>> > o � � � Statistics of the power-control actions taken by power-aware
>> > � � � �applications must be provided, and must be keyed off of program
>> > � � � �name.
>>
>> We don't key the stats off the program name, but having useful
>> statistics is critical too us. The current code in linux-next does not
>> appear to allow this (I'm referring to pm_stay_awake here, etc not
>> pm-qos.)
>
> OK, maybe I was confused earlier. �So you do not track statistics via
> the device being open throughout the application's lifetime?
>

The suspend blocker patchset does track statistics while the device is
open, but it it not keyed of the program name. The name is passed from
user-space and a single process can have the device open several
times. The wakelock interface that we currently use just creates a new
object every time it sees a new name and never frees it.

....
>> > o � � � If no power-aware or power-optimized application are indicating
>> > � � � �a need for the system to remain operating, the system is permitted
>> > � � � �(even encouraged!) to suspend all execution, even if power-naive
>> > � � � �applications are runnable. �(This requirement did appear to be
>> > � � � �somewhat controversial.)
>>
>> I would say it should suspend even if power aware applications are
>> runnable. Most applications do not exclusively perform critical work.
>
> The point being that a power-aware application does not block suspend
> -unless- it holds a suspend blocker, correct?

Yes.

>
> Or am I missing some other subtlety?

No.

....
>> > o � � � Any initialization of the API that controls the system power
>> > � � � �state must be unconditional, so as to be free from failure.
>> > � � � �(I don't currently understand how this relates, probably due to
>> > � � � �my current insufficient understanding of the proposed patch set.)
>>
>> Unconditional initialization makes it easier to add suspend blockers
>> to existing kernel code since you don't have to add new failure exit
>> paths. It is not a strong requirement.
>
> Ah, that makes more sense! �I moved this to a new "NICE-TO-HAVES"
> section. �I also changed the last parenthesized sentence to read
> "Such unconditional initialization reduces the intrusiveness of the
> the Android patchset." �Does that work?
>

Sure.



--
Arve Hj�nnev�g
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/