From: Alan Stern on
On Sat, 29 May 2010, Arve Hj�nnev�g wrote:

> > In place of in-kernel suspend blockers, there will be a new type of QoS
> > constraint -- call it QOS_EVENTUALLY. �It's a very weak constraint,
> > compatible with all cpuidle modes in which runnable threads are allowed
> > to run (which is all of them), but not compatible with suspend.
> >
> This sound just like another API rename. It will work, but given that
> suspend blockers was the name least objectionable last time around,
> I'm not sure what this would solve.

It's not just a rename. By changing this into a QoS constraint, we
make it more generally useful. Instead of standing on its own, it
becomes part of the PM-QOS framework.

> > There is no /sys/power/policy file. �In place of opportunistic suspend,
> > we have "QoS-based suspend". �This is initiated by userspace writing
> > "qos" to /sys/power/state, and it is very much like suspend-to-RAM.
>
> Why do you want to tie it to a specific state?

I don't. I suggested making it a veriant of suspend-to-RAM merely
because that's what you were using. But Nigel's suggestion of having
"qos" variants of all the different suspend states makes sense.

> > However a QoS-based suspend fails immediately if there are any active
>
> Fail or block? Your next paragraph said that it blocks for
> QOS_EVENTUALLY, but if normal constraints fail, you are still stuck in
> a retry loop.

Normal (i.e., non QOS_EVENTUALLY) constraints aren't part of the
Android use case, so it wasn't clear how they should be treated. On
further thought, it probably makes more sense to block for them too
instead of failing immediately.

> > normal QoS constraints incompatible with system suspend, in other
> > words, any constraints requiring a throughput > 0 or an interrupt
> > latency shorter than the time required for a suspend-to-RAM/resume
> > cycle.
> >
> > If no such constraints are active, the QoS-based suspend blocks in an
> > interruptible wait until the number of active QOS_EVENTUALLY
>
> How do you implement this?

I'm not sure what you mean. The same way you implement any
interruptible wait.

> > � � � �for (;;) {
> > � � � � � � � �while (any IPC requests remain)
> > � � � � � � � � � � � �handle them;
> > � � � � � � � �if (any processes need to prevent suspend)
> > � � � � � � � � � � � �sleep;
> > � � � � � � � �else
> > � � � � � � � � � � � �write "qos" to /sys/power/state;
> > � � � �}
> >
> > The idea is that receipt of a new IPC request will cause a signal to be
> > sent, interrupting the sleep or the "qos" write.
>
> What happen if the signal is right before (or even right after)
> calling write "qos". How does the signal handler stop the write?

You're right, this is a serious problem. The process would have to
give the kernel a signal mask to be used during the wait, as in ppoll
or pselect. There ought to be a way to do this or something
equivalent.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on
On Sat, 29 May 2010, Alan Stern wrote:

> On Sat, 29 May 2010, Arve Hj�nnev�g wrote:

> > > � � � �for (;;) {
> > > � � � � � � � �while (any IPC requests remain)
> > > � � � � � � � � � � � �handle them;
> > > � � � � � � � �if (any processes need to prevent suspend)
> > > � � � � � � � � � � � �sleep;
> > > � � � � � � � �else
> > > � � � � � � � � � � � �write "qos" to /sys/power/state;
> > > � � � �}
> > >
> > > The idea is that receipt of a new IPC request will cause a signal to be
> > > sent, interrupting the sleep or the "qos" write.
> >
> > What happen if the signal is right before (or even right after)
> > calling write "qos". How does the signal handler stop the write?
>
> You're right, this is a serious problem. The process would have to
> give the kernel a signal mask to be used during the wait, as in ppoll
> or pselect. There ought to be a way to do this or something
> equivalent.

Okay, here's a possible solution:

char arg[20];

signal_handler()
{
arg[0] = 0;
}


In the main loop:

...
mask signals;
if (we decide to start a QoS-based suspend) {
strcpy(arg, "qos");
unmask signals;
write arg to /sys/power/state;
}

It's hacky but I think it will work.

A more straightforward approach is to give processes the ability to
register their own QoS constraints. This could be done via a custom
driver as Brian suggested or by adding a new system call as Alan Cox
suggested. Then the power manager could be split into two threads, one
of which handles IPC requests and manages QoS constraints, while the
other repeatedly attempts to initiate QoS-based suspends.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on
On Saturday 29 May 2010, Arve Hj�nnev�g wrote:
> 2010/5/29 Rafael J. Wysocki <rjw(a)sisk.pl>:
> > On Saturday 29 May 2010, Arve Hj�nnev�g wrote:
> >> 2010/5/28 Rafael J. Wysocki <rjw(a)sisk.pl>:
> >> > On Friday 28 May 2010, Arve Hj�nnev�g wrote:
> >> >> On Fri, May 28, 2010 at 1:44 AM, Florian Mickler <florian(a)mickler.org> wrote:
> >> >> > On Thu, 27 May 2010 20:05:39 +0200 (CEST)
> >> >> > Thomas Gleixner <tglx(a)linutronix.de> wrote:
> >> > ...
> >> >> > To integrate this with the current way of doing things, i gathered it
> >> >> > needs to be implemented as an idle-state that does the suspend()-call?
> >> >> >
> >> >>
> >> >> I think it is better no not confuse this with idle. Since initiating
> >> >> suspend will cause the system to become not-idle, I don't think is is
> >> >> beneficial to initiate suspend from idle.
> >> >
> >> > It is, if the following two conditions hold simultaneously:
> >> >
> >> > (a) Doing full system suspend is ultimately going to bring you more energy
> >> > savings than the (presumably lowest) idle state you're currently in.
> >> >
> >> > (b) You anticipate that the system will stay idle for a considerably long time
> >> > such that it's worth suspending.
> >> >
> >>
> >> I still don't think this matters. If you are waiting for in interrupt
> >> that cannot wake you up from suspend, then idle is not an indicator
> >> that it is safe to enter suspend. I also don't think you can avoid any
> >> user-space suspend blockers by delaying suspend until the system goes
> >> idle since any page fault could cause it to go idle. Therefore I don't
> >> see a benefit in delaying suspend until idle when the last suspend
> >> blocker is released (it would only mask possible race conditions).
> >
> > I wasn't referring to suspend blockers, but to the idea of initiating full
> > system suspend from idle, which I still think makes sense. If you are
> > waiting for an interrupt that cannot wake you from suspend, then
> > _obviously_ suspend should not be started. However, if you're not waiting for
> > such an interrupt and the (a) and (b) above hold, it makes sense to start
> > suspend from idle.
> >
>
> What about timers? When you suspend timers stop (otherwise it is just
> a deep-idle mode), and this could cause problems. Some drivers rely on
> timers if the hardware does not have a completion interrupt. It is not
> uncommon to see send command x then wait 200ms in a some hardware
> specs.

QoS should be used in such cases.

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Zygo Blaxell on
On Fri, May 28, 2010 at 10:17:55AM +0100, Alan Cox wrote:
> > Android does not only run on phones. It is possible that no android
> > devices have ACPI, but I don't know that for a fact. What I do know is
> > that people want to run Android on x86 hardware and supporting suspend
> > could be very benficial.
>
> Sufficently beneficial to justify putting all this stuff all over the
> kernel and apps ? That is a *very* high hurdle, doubly so when those
> vendors who have chosen to be part of the community are shipping phones
> and PDAs just fine without them.

I'm not sure "other people are shipping without them" is such a good
metric, especially for scheduler features. For some reason (I have some
ideas what it might be, but I won't speculate here) people don't like
messing with the scheduler in mainline, even though there's a lot of
special cases where a bit of messing with the scheduler (or replacing
it outright) goes a long way toward qualitatively improving performance
on some workloads.

I'd love to have several more ways to have large classes of processes stop
executing, and stay stopped, even though traditional Unix and mainline
Linux would try to run them. I don't want to put knowledge of this into
every application I run since there are literally thousands of them,
and IMNSHO it's not even an application's responsibility to know this
kind of thing. The "sort" program can't know what QoS to ask for in any
sane system design. The best it can do is try to execute as hard as it
can whenever the kernel lets it, and have some other application advise
the kernel about how much or how little service (including cases like
"no service at all") the sort program should get from the system.

To choose a random example, I'd like a "duty cycle" constraint on
process execution (i.e. a runnable task must execute between L and M ns
per N ns interval--stealing slices from lower priority processes if it
doesn't get enough and isn't blocked on I/O, and leaving the CPU idle even
though the process is runnable if it gets too much). I usually want to
apply this kind of limit to programs like Firefox, because Firefox is a)
big enough that controlling it actually matters for power consumption,
b) sensitive enough to user interaction latency that I want it to have
fairly high CPU priority when it has something to do, and c) big and
complex enough that I wouldn't want to try to adjust its behavior by
modifying its source. Also, Firefox's behavior tends to be driven by
the data it pulls from random web sites, over which I have no control
whatsoever, and many of them are intentionally wasteful.

I'm not willing to run a non-mainline kernel (or Firefox, for that
matter) just to get that feature, and I'm not willing to submit patches
to mainline if I've seen nearly identical ideas rejected recently, so I
live without the feature for now. This implies that the statistic for
"people running desirable scheduler features" is at least one lower than
the statistic for "people who would use desirable scheduler features if
they didn't have to hack up non-mainline kernels to get them."

I can hack up something that does something similar to duty cycle in
user-space, but it's got a lot of problems:

- when you send SIGSTOP/SIGCONT to a process, it wakes up its
parent through waitpid() (well, you can partially get around
this with ptrace(), but that raises other issues),

- it's racy wrt fork(),

- it can't opportunistically schedule process execution,
e.g. during times when the CPU is idle at high clock rates,

- sufficiently badly behaved processes are able to escape
the CPU usage regulation mechanism, and

- estimating how much global CPU has been used as a percentage of
real time is easy, but how much CPU relative to other processes
running on the system is not. I keep doing math like "subtract
aggregate process CPU usage from global CPU usage" and getting
numbers outside the range of 0..100% of global CPU usage.

Also, for non-trivial cases, the user-space CPU management process
consumes more CPU than any other process on the system, and keeps waking
up the CPU every N and M ns, even if the process being scheduled isn't
runnable.

Simply providing better information to userspace to help a regulator
application of this kind would be a huge leap in the right direction.

Arguably I could run the applications I want to throttle under KVM,
and hack up the KVM to manage the CPU usage; however, that's hardly
transparent to the application, which is now running on the wrong machine
for a lot of what it wants to do.

So instead of fixing the software, I have an extra-large third-party
battery on my laptop. It's a cheaper solution on small (one user) scales.
I can't ship a competitive product with that kind of problem, though.

Having said all that, I'm fairly sure suspend blockers aren't the way to
get it. I'd much rather have interesting QoS constraint features,
including new conditions under which to not run otherwise runnable tasks.
Maybe ionice and SCHED_IDLEPRIO on steroids?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Neil Brown on
On Thu, 27 May 2010 23:40:29 +0200 (CEST)
Thomas Gleixner <tglx(a)linutronix.de> wrote:

> On Thu, 27 May 2010, Rafael J. Wysocki wrote:
>
> > On Thursday 27 May 2010, Thomas Gleixner wrote:
> > > On Thu, 27 May 2010, Alan Stern wrote:
> > >
> > > > On Thu, 27 May 2010, Felipe Balbi wrote:
> > > >
> > > > > On Thu, May 27, 2010 at 05:06:23PM +0200, ext Alan Stern wrote:
> > > > > >If people don't mind, here is a greatly simplified summary of the
> > > > > >comments and objections I have seen so far on this thread:
> > > > > >
> > > > > > The in-kernel suspend blocker implementation is okay, even
> > > > > > beneficial.
> > > > >
> > > > > I disagree here. I believe expressing that as QoS is much better. Let
> > > > > the kernel decide which power state is better as long as I can say I
> > > > > need 100us IRQ latency or 100ms wakeup latency.
> > > >
> > > > Does this mean you believe "echo mem >/sys/power/state" is bad and
> > > > should be removed? Or "echo disk >/sys/power/state"? They pay no
> > >
> > > mem should be replaced by an idle suspend to ram mechanism
> >
> > Well, what about when I want the machine to suspend _regardless_ of whether
> > or not it's idle at the moment? That actually happens quite often to me. :-)
>
> Fair enough. Let's agree on a non ambigous terminology then:
>
> forced:
>
> suspend which you enforce via user interaction, which
> also implies that you risk losing wakeups depending on
> the hardware properties

Reasonable definition I think. However the current implementation doesn't
exactly match it.
With the current implementation you risk losing wakeups *independent* of the
hardware properties.
Even with ideal hardware events can be lost - by which I mean that they will
not be seen until some other event effects a wake-up.
e.g. the interrupt which signals the event happens immediately before the
suspend is requested (or maybe at the same time as), but the process which
needs to handle the event doesn't get a chance to see it before the suspend
procedure freezes that process, and even if it did it would have no way to
abort the suspend.

So I submit that the current implementation doesn't match your description of
"forced", is therefore buggy, and that if it were fixed, that would be
sufficient to meet the immediate needs of android.

NeilBrown

>
> opportunistic:
>
> suspend driven from the idle context, which guarantees to
> not lose wakeups. Provided only when the hardware does
> provide the necessary capabilities.
>
> Thanks,
>
> tglx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/