From: Arve Hjønnevåg on
This patch series adds a suspend-block api that provides the same
functionality as the android wakelock api. This version fixes a race
in suspend blocking work, has some documentation changes and
opportunistic suspend now uses the same workqueue as runtime pm.

--
Arve Hjønnevåg <arve(a)android.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Paul Walmsley on
Hello,

Some general comments on the suspend blockers/wakelock/opportunistic
suspend v6 patch series, posted here:

https://lists.linux-foundation.org/pipermail/linux-pm/2010-April/025146.html

The comments below are somewhat telegraphic in the interests of
readability - more specific comments to follow in later E-mails. I am
indebted to those of us who discussed these issues at LPC last year and
ELC this year for several stimulating discussions.

There are several general problems with the design of opportunistic
suspend and suspend-blocks.

1. The opportunistic suspend code bypasses existing Linux kernel code,
such as timers and the scheduler, that indicates when code
needs to run, and when the system is idle. This causes two problems:

a. When opportunistic suspend is enabled, the default mode is to
break all timers and scheduling on the system. This isn't
right: the default mode should be to preserve standard Linux
behavior. Exceptions can then be added for process groups that
should run with the non-standard timer and scheduler behavior.

b. The series introduces a de novo kernel API and userspace API
that are unrelated to timers and the scheduler, but if the point
is to modify the behavior of timers or the scheduler, the
existing timer or scheduler APIs should be extended. Any new
APIs will need to be widely spread throughout the kernel and
userspace.

2. The suspend-block kernel API tells the kernel _how_ to accomplish a
goal, rather than telling the kernel _what_ the goal is. This
results in layering violations, unstated assumptions, and is too
coarse-grained. These problems in turn will cause fragile kernel
code, kernel code with userspace dependencies, and power management
problems on modern hardware. Code should ask for what it wants.
For example, if a driver needs to place an upper bound on its
device wakeup latency, or if it needs to place an upper bound on
interrupt response latency, that is what it should request. Driver
and subsystem code should not care how the kernel implements those
requests, since the implementation can differ on different hardware
and even on different use-cases with the same hardware.

3. Similarly, the suspend-block userspace API tells the kernel how to
accomplish a goal, rather than telling the kernel what the goal is.
Userspace processes should ask the kernel for what they really
want. If a process' timers should be disabled upon entering
suspend, or the timer durations should have a lower bound, that's
what the API should request.

Merging this series as currently designed and implemented will cause
problems. Suspend-blocks introduce a second, separate idle management
approach in the Linux kernel. The existing approach is the familiar timer
and scheduler based approach. The new approach is one where timers and
runqueues no longer matter: the system is always at risk of entering
suspend at any moment, with only suspend-blocks to stop it. Driver authors
will effectively have to implement both approaches in their code.

Once merged, it will be nearly impossible to remove this code in favor
of a cleaner approach. Suspend-block calls are likely to spread
throughout the kernel and drivers. Patches 6, 7, and 8 are the leading
edge of this - a quick grep through the Android common kernel at

git://android.git.kernel.org/kernel/common.git

shows wakelocks in the following drivers:

drivers/input/evdev.c
drivers/input/misc/gpio_input.c
drivers/input/misc/gpio_matrix.c
drivers/mmc/core/core.c
drivers/rtc/alarm.c
drivers/usb/gadget/f_mass_storage.c

Suspend-blocks will be difficult to convert to a finer-grained approach
later. The API design problems, mentioned above in points 2 and 3, will
make it very difficult to determine what a driver author's or modifier's
intention was when adding the suspend-block. Also, patches 2 and 7
introduce userspace APIs. We will undoubtedly wish to avoid removing a
userspace API once it is merged. It will be quite difficult to implement
such a general directive ("block system suspend") on a future kernel that
may have a much finer-grained notion of low-power system modes, indeed
that may have no useful notion of "system suspend."

....

The opportunistic suspend patches try to solve at least two real problems,
that should be resolved in some way. First, some types of userspace
processes can unintentionally block system power management. Second, the
kernel is missing a system-wide form of CPUIdle. This patch series,
though, isn't the right way to solve either of these problems. Let's
figure out a different approach.

Figuring out a different way to do this should not limit Android at all,
since Google can do what other Linux distributions do and continue to
patch opportunistic suspend/suspend-block calls into their kernels as
needed to ship devices, while contributing towards a different solution to
the problem.


regards,

- Paul

(Linux-OMAP co-maintainer, focusing mostly on power management and
software architecture issues)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Matthew Garrett on
On Wed, May 12, 2010 at 09:35:30PM -0600, Paul Walmsley wrote:
>
> Figuring out a different way to do this should not limit Android at all,
> since Google can do what other Linux distributions do and continue to
> patch opportunistic suspend/suspend-block calls into their kernels as
> needed to ship devices, while contributing towards a different solution to
> the problem.

I basically agree, except that despite having a year to do so none of us
have come up with a different way that would actually work. Google have
done this work. Who's going to prove that there is actually a different
way to do this?

--
Matthew Garrett | mjg59(a)srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on
On Wed, 12 May 2010, Paul Walmsley wrote:

> Hello,
>
> Some general comments on the suspend blockers/wakelock/opportunistic
> suspend v6 patch series, posted here:
>
> https://lists.linux-foundation.org/pipermail/linux-pm/2010-April/025146.html
>
> The comments below are somewhat telegraphic in the interests of
> readability - more specific comments to follow in later E-mails. I am
> indebted to those of us who discussed these issues at LPC last year and
> ELC this year for several stimulating discussions.
>
> There are several general problems with the design of opportunistic
> suspend and suspend-blocks.
>
> 1. The opportunistic suspend code bypasses existing Linux kernel code,
> such as timers and the scheduler, that indicates when code
> needs to run, and when the system is idle.

Whoa! That's not my understanding at all.

As I see it, opportunistic suspend doesn't bypass any code that isn't
already bypassed by the existing suspend code. Users can do

echo mem >/sys/power/state

whenever they want, without regard to kernel timers and the scheduler
(other than the fact that the user's thread must be running in order to
carry out the write, of course).

> This causes two problems:
>
> a. When opportunistic suspend is enabled, the default mode is to
> break all timers and scheduling on the system. This isn't
> right: the default mode should be to preserve standard Linux
> behavior. Exceptions can then be added for process groups that
> should run with the non-standard timer and scheduler behavior.

I don't understand this at all. What gets broken, and how? In
particular, what gets broken that isn't also broken by "echo mem
>/sys/power/state"?

> b. The series introduces a de novo kernel API and userspace API
> that are unrelated to timers and the scheduler, but if the point
> is to modify the behavior of timers or the scheduler, the
> existing timer or scheduler APIs should be extended. Any new
> APIs will need to be widely spread throughout the kernel and
> userspace.

But the point _isn't_ to modify the behavior of timers and the
scheduler. The point is to provide a way for the system to enter a
very low-power state as soon as possible while safely handling races.

> 2. The suspend-block kernel API tells the kernel _how_ to accomplish a
> goal, rather than telling the kernel _what_ the goal is. This
> results in layering violations, unstated assumptions, and is too
> coarse-grained. These problems in turn will cause fragile kernel
> code, kernel code with userspace dependencies, and power management
> problems on modern hardware. Code should ask for what it wants.
> For example, if a driver needs to place an upper bound on its
> device wakeup latency, or if it needs to place an upper bound on
> interrupt response latency, that is what it should request. Driver
> and subsystem code should not care how the kernel implements those
> requests, since the implementation can differ on different hardware
> and even on different use-cases with the same hardware.

Although the first sentence is true, I don't find it useful. The goal
of suspend blockers is to prevent the system from entering a low-power
state until some important task is finished. It has little to do with
interrupt response latency or device wakeup latency.

As far as I can tell, suspend blockers are more or less a direct
implementation of the desired goal.

> 3. Similarly, the suspend-block userspace API tells the kernel how to
> accomplish a goal, rather than telling the kernel what the goal is.
> Userspace processes should ask the kernel for what they really
> want. If a process' timers should be disabled upon entering
> suspend, or the timer durations should have a lower bound, that's
> what the API should request.

The userspace API has essentially the same goal as the kernel API.

> Merging this series as currently designed and implemented will cause
> problems. Suspend-blocks introduce a second, separate idle management
> approach in the Linux kernel. The existing approach is the familiar timer
> and scheduler based approach. The new approach is one where timers and
> runqueues no longer matter: the system is always at risk of entering
> suspend at any moment, with only suspend-blocks to stop it. Driver authors
> will effectively have to implement both approaches in their code.

That's true. Where's the problem? The system is _already_ at risk of
entering suspend at any moment, as I described above. If the "timer
and scheduler based" approach can be adapted to do what the Android
people want, then all the better -- but I rather suspect it can't.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Daniel Walker on
On Thu, 2010-05-13 at 13:17 +0100, Matthew Garrett wrote:
> On Wed, May 12, 2010 at 09:35:30PM -0600, Paul Walmsley wrote:
> >
> > Figuring out a different way to do this should not limit Android at all,
> > since Google can do what other Linux distributions do and continue to
> > patch opportunistic suspend/suspend-block calls into their kernels as
> > needed to ship devices, while contributing towards a different solution to
> > the problem.
>
> I basically agree, except that despite having a year to do so none of us
> have come up with a different way that would actually work. Google have
> done this work. Who's going to prove that there is actually a different
> way to do this?

We all feel the pain of inelegance right? I think it's clear that this
system will not last (but there's no other immediate option) .. That
doesn't mean we should reject it, but we need to be clear that this
system will get replaced. So we should format the patches appropriately.
To me the userspace aspect is a permanent change .. If we could drop
that (or put it into debugfs) then it would make this a lot easy to
accept as a stepping stone.

Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/