From: Neil Brown on
On Fri, 4 Jun 2010 01:23:02 +0200
Ingo Molnar <mingo(a)elte.hu> wrote:

> Btw., i'd like to summarize the scheduler based suspend scheme proposed by
> Thomas Gleixner, Peter Zijlstra and myself. I found no good summary of it in
> the big thread, and there are also new elements of the proposal:

Hi
I would like to summarise the alternate proposal that I an others have
suggested in a variety of different forms.

It starts from the premise that
1/ Android developers actually like the "big hammer" aspect of suspend.
Initiating suspend powers down some devices, puts others in low power
states, freezes all processes and generally puts the device to sleep
with a well defined and easily controlled (at the whole-of-system level)
set of events that will wake from suspend. This is a big part of the
Android approach to power-saving and I'm guessing they are not keen to
depart from it.

2/ The main problem with using suspend as-is is that it is racy.
The purpose of suspend is to put the device to sleep until a wake-event
occurs. When that wake-event occurs at much the same time that suspend is
requested races can occur. We want a wake-event to not only wake the
device, to be keep the device awake while the wake-event is being handled,
and to cancel any suspend that was initiated before the wake event
completed.
We need to understand "wake event" in an holistic sense. If a key press is
expected to brighten the screen and make a glyph appear, and if that key
press is considered to be a wake-event, then the glyph appearing must also
be a part of the wake event. For such a holistic wake-event to fully
block/cancel a suspend there much be some mechanism for hand-over of
wake-events from kernel-space to user-space.

Given those premises, google's suspend-blocker approach was to allow a
kernel thread to initiate suspend whenever nothing was stopping it, and to
allow both drivers and user processes to block that suspend while handling
a wake event (or anything else that needed to keep the device awake).
In this case the hand-over is fairly straight forward as the kernel thread
as full knowledge and can easily wait for all sorts of things.

The alternate proposal is simply to have user-space initiate a suspend (as
is already possible), user-space processes can then trivially block that
suspend through any of a number of IPC approaches, and kernel space drivers
can block/abort suspend by explicitly requesting a block.

The variety of alternate proposals comes from a variety of ways to modify
the semantics of "ask for a suspend" in such a way that userspace can
discover when there are kernel-space blocks, and can wait for them to be
released without spinning.

A sample modification (which I think is different to all the ones
mentioned so far, and hopefully pulls out the best of them all) is
to allow userspace to write e.g. "mem_safe" rather than "mem" to
/sys/power/state. The 'safe' implies it is safe from races.

When this is written, the process sleeps in an interruptible state until
all in-kernel suspend blocks have been dropped. If any such suspend blocks
were found, or if a signal is received, the request aborts. Only if there
were no suspend blocks and no pending signals does the suspend progress.

wake-events in the kernel then need to be tracked all the way to user-space,
and the in-kernel lock is only dropped when the event is consumed by
user-space. User-space must take some sort of lock to ensure no new
suspend is requested before consuming any wake-events from the kernel.

I believe this is very close to what android has today, only with a much
smaller change to the user-space interface, which I believe to be the thing
that has been found most objectionable.
I does still require a degree of event-tracking within the kernel which
might still be objectionable - I'm not so sure about different people's
positions on that.

Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arve Hjønnevåg on
On Thu, Jun 3, 2010 at 4:23 PM, Ingo Molnar <mingo(a)elte.hu> wrote:
....
> �- Controlled auto-suspend: drivers (such as input) could on wakeup
> � automatically set the 'minimum wakeup latency' value of wakee tasks to a
> � lower value. This automatically prevents another auto-suspend in the near
> � future: up to the point the wakee task increases its latency (via the
> � scheduler syscall) again and allows suspend again.
>

How do you clear the latency value in a safe way? If another wakeup
event happens right after your wakee task is done processing the last
event and decides to increase its latency, auto suspend will be
allowed even though you have an unprocessed wakeup event. Also how do
you know which task will read the event if it is not already waiting
for it?


> � This means there will be no surprise suspends for a task that may take a
> � bit longer than usual to finish its work. [ Detail: this would only be done
> � for tasks that have a non-default (non-infinity) task->latency value - to
> � prevent the input driver from lowering latency values (and preventing
> � future suspends) just because some unaware apps are running and using input
> � drivers. ]

Don't you need two inifinity values for this?

--
Arve Hj�nnev�g
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Thu, 3 Jun 2010, Arjan van de Ven wrote:
>
> And because there's then no power saving (but a performance cost), it's
> actually a negative for battery life/total energy.

Including the UP optimizations we do (ie lock prefix removal)? It's
possible that I'm just biased by benchmarks, and it's true that Intel has
been getting lots better, but the locking costs are very noticeable
performance-wise on some benchmarks.

And several CPU's have been held back from going into deepest sleep states
by stupid firmware and/or platform bugs.

But hey, if it's not going to help, and people have tried it, I guess I'll
have to believe it.

Linus

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Linus Torvalds <torvalds(a)linux-foundation.org> wrote:

> [...]
>
> And those two things go together. The /sys/power/state thing is a global
> suspend - which I don't think is appropriate for a opportunistic thing in
> the first place, especially for multi-core.
>
> A well-designed opportunistic suspend should be a two-phase thing: an
> opportunistc CPU hotunplug (shutting down cores one by one as the system is
> idle), and not a "global" event in the first place. And only when you've
> reached single-core state should you then say "do I suspend the system too".

Shutting a core down would be a natural idle level, and when the last one goes
idle we can do the suspend. (it happens as part of suspend anyway)

So on systems that dont want to auto-suspend this would indeed behave like you
suggest: the final core left would run as UP in essence.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Arve Hj?nnev?g <arve(a)android.com> wrote:

> On Thu, Jun 3, 2010 at 4:23 PM, Ingo Molnar <mingo(a)elte.hu> wrote:
> ...
> > ?- Controlled auto-suspend: drivers (such as input) could on wakeup
> > ? automatically set the 'minimum wakeup latency' value of wakee tasks to a
> > ? lower value. This automatically prevents another auto-suspend in the near
> > ? future: up to the point the wakee task increases its latency (via the
> > ? scheduler syscall) again and allows suspend again.
> >
>
> How do you clear the latency value in a safe way? If another wakeup event
> happens right after your wakee task is done processing the last event and
> decides to increase its latency, auto suspend will be allowed even though
> you have an unprocessed wakeup event. Also how do you know which task will
> read the event if it is not already waiting for it?

The easiest solution would be to not do any of that initially. (If it's ever a
concern we could subtract/add without destroying the nesting property)

Why do you need to track input wakeups? It's rather fragile and rather
unnecessary - the idle drivers know it very well how to not go into the
deepest idle mode already today. We wont hit C8 on laptops when you are using
the desktop.

> > ? This means there will be no surprise suspends for a task that may take a
> > ? bit longer than usual to finish its work. [ Detail: this would only be done
> > ? for tasks that have a non-default (non-infinity) task->latency value - to
> > ? prevent the input driver from lowering latency values (and preventing
> > ? future suspends) just because some unaware apps are running and using input
> > ? drivers. ]
>
> Don't you need two inifinity values for this?

Yes - any value above the max idle latency in the system will do.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/