From: Thomas Gleixner on
On Tue, 1 Jun 2010, Rafael J. Wysocki wrote:
> On Monday 31 May 2010, Thomas Gleixner wrote:
> > So that allows to use the same mechanism for more than the android
> > sledge hammer approach and confines the controversial use cases into
> > android specific files without adding a hard to maintain user space
> > interface which would prevent or at least make it hard to do some of
> > the above mentioned things which we want to see implemented.
>
> I generally agree.
>
> I think the Alan Stern's recent proposal goes along these lines, but it has
> the advantage of being a bit more specific. ;-)

Yes, Alan Stern's proposal is going into that direction and I'm not
opposed. Just wanted to get the overall picture straight for James :)

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on
On Monday 31 May 2010, Thomas Gleixner wrote:
> On Sat, 29 May 2010, James Bottomley wrote:
> > On Sat, 2010-05-29 at 10:10 +0200, Peter Zijlstra wrote:
> > >
> > > Not using suspend is exactly the point. As Alan has argued, propagating
> > > suspend blockers up into all regions of userspace will take much longer
> > > than fixing the hardware.
> >
> > Strange, that's not what I heard as the possible solution. I thought he
> > was advocating expressing the kernel side of suspend blockers as QoS
> > constraints. Once we have QoS constraints correctly done in the kernel,
> > userspace still has to express its requirements. If the requirements
> > are static, then they can be done from policy files or in some other
> > static way but if they're dynamic, they'll still have to be in the
> > applications ... in about the same places the android wakelocks are.
>
> That's wrong. You only need the explicit dynamic QoS constraints for
> applications which follow the scheme:
>
> while (1) {
> if (event_available())
> process_event();
> else
> do_useless_crap_which_consumes_power();
> }
>
> which need the following annotation:
>
> while (1) {
> block_suspend();
> if (event_available()) {
> process_event();
> unblock_suspend();
> } else {
> unblock_suspend();
> do_useless_crap_which_consumes_power();
> }
> }
>
> Plus the kernel counterpart of drivers which take the suspend blocker
> in the interrupt handler and release it when the event queue is empty.
>
> So that's done for making polling event handling power "efficient".
>
> Even worse, you need the same "annotation" for non polling mode and it
> enforces the use of select() because you cannot take a suspend blocker
> across a blocking read() without adding more invasive interactions to
> the kernel..
>
> So the "sane" app looks like:
>
> while (1) {
> select();
> block_suspend();
> process_events();
> unblock_suspend();
> }
>
> I'm really tired of arguing that this promotion of "programming style"
> is the worst idea ever, so let's look how you can do the same thing
> QoS based.
>
> s/block_suspend()/qos(INTERACTIVE)/ and
> s/unblock_suspend()/qos(NONE)/ and
> s/block_magic()/qos_magic()/ in the drivers.
>
> Yes, it's mostly the same, with a subtle difference:
>
> While android can use it in the big hammer approach to disable the
> existing user initiated suspend via /sys/power/state, the rest of the
> world can benefit as well in various ways.
>
> - Sane applications which use a blocking event wait can be handled
> with a static QoS setting simply because a blocking read relies on
> the QoS state of the underlying I/O system.
>
> - Idle based suspend as the logical consequence of idle states is
> just a matter of QoS constraint based decisions.
>
> - Untrusted apps can be confined in cgroups. The groups are set to
> QoS(None) when user land decides that it's time to safe power
> (e.g. screen lock kicks in)
>
> - QoS states can block applications from I/O when the I/O system is
> set to a state which is exclusive.
>
> - ...
>
> So that allows to use the same mechanism for more than the android
> sledge hammer approach and confines the controversial use cases into
> android specific files without adding a hard to maintain user space
> interface which would prevent or at least make it hard to do some of
> the above mentioned things which we want to see implemented.

I generally agree.

I think the Alan Stern's recent proposal goes along these lines, but it has
the advantage of being a bit more specific. ;-)

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Neil Brown on
On Tue, 1 Jun 2010 00:05:19 +0200
"Rafael J. Wysocki" <rjw(a)sisk.pl> wrote:

> On Monday 31 May 2010, Neil Brown wrote:
> > On Thu, 27 May 2010 23:40:29 +0200 (CEST)
> > Thomas Gleixner <tglx(a)linutronix.de> wrote:
> >
> > > On Thu, 27 May 2010, Rafael J. Wysocki wrote:
> > >
> > > > On Thursday 27 May 2010, Thomas Gleixner wrote:
> > > > > On Thu, 27 May 2010, Alan Stern wrote:
> > > > >
> > > > > > On Thu, 27 May 2010, Felipe Balbi wrote:
> > > > > >
> > > > > > > On Thu, May 27, 2010 at 05:06:23PM +0200, ext Alan Stern wrote:
> > > > > > > >If people don't mind, here is a greatly simplified summary of the
> > > > > > > >comments and objections I have seen so far on this thread:
> > > > > > > >
> > > > > > > > The in-kernel suspend blocker implementation is okay, even
> > > > > > > > beneficial.
> > > > > > >
> > > > > > > I disagree here. I believe expressing that as QoS is much better. Let
> > > > > > > the kernel decide which power state is better as long as I can say I
> > > > > > > need 100us IRQ latency or 100ms wakeup latency.
> > > > > >
> > > > > > Does this mean you believe "echo mem >/sys/power/state" is bad and
> > > > > > should be removed? Or "echo disk >/sys/power/state"? They pay no
> > > > >
> > > > > mem should be replaced by an idle suspend to ram mechanism
> > > >
> > > > Well, what about when I want the machine to suspend _regardless_ of whether
> > > > or not it's idle at the moment? That actually happens quite often to me. :-)
> > >
> > > Fair enough. Let's agree on a non ambigous terminology then:
> > >
> > > forced:
> > >
> > > suspend which you enforce via user interaction, which
> > > also implies that you risk losing wakeups depending on
> > > the hardware properties
> >
> > Reasonable definition I think. However the current implementation doesn't
> > exactly match it.
> > With the current implementation you risk losing wakeups *independent* of the
> > hardware properties.
>
> Define "losing", please.

I did. See next line in my original.
"... by which I mean that they will not be seen until some other event
effects a wake-up". By "seen" I mean "a user-space process has had a chance
to react to the event, including having the opportunity to abort the suspend
(or ensure an immediate wake-up)".
Another way of saying it might be that the event - as an abstract concept -
does not reach it's final destination promptly. This "final destination" may
be well outside the kernel.

>
> Currently, we simply don't regard hardware signals occuring _during_ the
> suspend operation itself as wakeups (unless they are wakeup interrupts to be
> precise, because these _are_ taken into account by our current code).
>
> The reason is that the meaning of given event may be _different_ at run time
> and after the system has been suspended. For example, consider a power button
> on a PC box. If it's pressed at run time, it usually means "power off the
> system" to the kernel. After the system has been suspended, however, it means
> "wake up". So, you have to switch from one interpretation of the event to the
> other and that's not an atomic operaition (to put it lightly).

Yes, a suspend-toggle switch is inherently racy. It is only wake-up sources
that are not inherently racy that are interesting. e.g. a serial line from a
GSM device which reports "You have an SMS message".
I want to be able to turn my freerunner upside-down by which I tell it (via
the accelerometers) that I am done and want it to turn off. If a TXT message
comes in just then, I don't want it to suspend, I want it to make an alert
noise.
I can put code in to ignore the accelerometer if a txt has just recently come
in, but if the TXT arrives just as the write to /sys/power/state starts, the
UART interrupt handler could have completed before it has the PRE_SUSPEND
method called. So the suspend will complete and the wakeup from the UART
will have been "lost" in that the event didn't get all the way to its
destination: my ear.
My freerunner has a single core so without CONFIG_PREEMPT it may be that
there is no actual race-window - maybe the PRE_SUSPENDs will all run before a
soft_irq thread has a chance to finish handling of the interrupt (my
knowledge of these details is limits). But on a muilti-core device I think
there would definitely be a race-window.

Thanks,
NeilBrown

>
> > Even with ideal hardware events can be lost - by which I mean that they will
> > not be seen until some other event effects a wake-up.
> > e.g. the interrupt which signals the event happens immediately before the
> > suspend is requested (or maybe at the same time as), but the process which
> > needs to handle the event doesn't get a chance to see it before the suspend
> > procedure freezes that process, and even if it did it would have no way to
> > abort the suspend.
> >
> > So I submit that the current implementation doesn't match your description of
> > "forced", is therefore buggy, and that if it were fixed, that would be
> > sufficient to meet the immediate needs of android.
>
> I don't really think it may be fixed with respect to every possible kind of
> hardware. On platforms where I/O interrupts are wakeup events it should
> work right now. On other platforms it may be impossible to overcome hardware
> limitations.
>
> Thanks,
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: mark gross on
On Mon, May 31, 2010 at 11:38:55PM +0200, Rafael J. Wysocki wrote:
> On Monday 31 May 2010, Arve Hj�nnev�g wrote:
> > 2010/5/29 Alan Stern <stern(a)rowland.harvard.edu>:
> > > On Sat, 29 May 2010, Arve Hj�nnev�g wrote:
> > >
> > >> > In place of in-kernel suspend blockers, there will be a new type of QoS
> > >> > constraint -- call it QOS_EVENTUALLY. It's a very weak constraint,
> > >> > compatible with all cpuidle modes in which runnable threads are allowed
> > >> > to run (which is all of them), but not compatible with suspend.
> > >> >
> > >> This sound just like another API rename. It will work, but given that
> > >> suspend blockers was the name least objectionable last time around,
> > >> I'm not sure what this would solve.
> > >
> > > It's not just a rename. By changing this into a QoS constraint, we
> > > make it more generally useful. Instead of standing on its own, it
> > > becomes part of the PM-QOS framework.
> > >
> >
> > We cannot use the existing pm-qos framework. It is not safe to call
> > from atomic context.
>
> We've just merged a patch that fixed that if I'm not mistaken. Mark, did your
> PM QoS update fix that?
>

I'm pretty sure it can be called in atomic context, and if its not I'm
sure we can fix that. It can be called in atomic context. I don't
think it was ever a problem to call it in atomic context. The problem it
had was that crappy list of string compares. Thats been fixed.

--mgross

> > Also, it does not have any state constraints, so it iterates over every
> > registered constraint each time one of them changes.
>
> That's fixable IMO.
>
> > Nor does is currently provide any stats for debugging.
>
> That's why Alan is proposing to add that.
>
> > The original wakelock patchset supported a wakelock type so it could
> > be used to block more then suspend, but I had to remove this because
> > it "overlapped" with pm-qos. So, yes I do consider this just another
> > rename.
>
> It's an extension of an existing framework rather than an addition of a new
> one, with entirely new API and so on. Extending existing APIs is much
> preferred to adding new ones, in general.
>
> Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: James Bottomley on
On Mon, 2010-05-31 at 23:41 +0200, Thomas Gleixner wrote:
> On Sat, 29 May 2010, James Bottomley wrote:
> > On Sat, 2010-05-29 at 10:10 +0200, Peter Zijlstra wrote:
> > >
> > > Not using suspend is exactly the point. As Alan has argued, propagating
> > > suspend blockers up into all regions of userspace will take much longer
> > > than fixing the hardware.
> >
> > Strange, that's not what I heard as the possible solution. I thought he
> > was advocating expressing the kernel side of suspend blockers as QoS
> > constraints. Once we have QoS constraints correctly done in the kernel,
> > userspace still has to express its requirements. If the requirements
> > are static, then they can be done from policy files or in some other
> > static way but if they're dynamic, they'll still have to be in the
> > applications ... in about the same places the android wakelocks are.
>
> That's wrong. You only need the explicit dynamic QoS constraints for
> applications which follow the scheme:
>
> while (1) {
> if (event_available())
> process_event();
> else
> do_useless_crap_which_consumes_power();
> }
>
> which need the following annotation:
>
> while (1) {
> block_suspend();
> if (event_available()) {
> process_event();
> unblock_suspend();
> } else {
> unblock_suspend();
> do_useless_crap_which_consumes_power();
> }
> }
>
> Plus the kernel counterpart of drivers which take the suspend blocker
> in the interrupt handler and release it when the event queue is empty.
>
> So that's done for making polling event handling power "efficient".
>
> Even worse, you need the same "annotation" for non polling mode and it
> enforces the use of select() because you cannot take a suspend blocker
> across a blocking read() without adding more invasive interactions to
> the kernel..
>
> So the "sane" app looks like:
>
> while (1) {
> select();
> block_suspend();
> process_events();
> unblock_suspend();
> }
>
> I'm really tired of arguing that this promotion of "programming style"
> is the worst idea ever, so let's look how you can do the same thing
> QoS based.
>
> s/block_suspend()/qos(INTERACTIVE)/ and
> s/unblock_suspend()/qos(NONE)/ and
> s/block_magic()/qos_magic()/ in the drivers.

So this is the re-expression in terms of a QoS API that I mentioned ...
as I said, I think it's the way forwards. (from the android point of
view, it keeps the user space expression in exactly the same place as
the original wake locks, or suspend blocks, which is why it looks like a
rename to them).

> Yes, it's mostly the same, with a subtle difference:
>
> While android can use it in the big hammer approach to disable the
> existing user initiated suspend via /sys/power/state, the rest of the
> world can benefit as well in various ways.
>
> - Sane applications which use a blocking event wait can be handled
> with a static QoS setting simply because a blocking read relies on
> the QoS state of the underlying I/O system.
>
> - Idle based suspend as the logical consequence of idle states is
> just a matter of QoS constraint based decisions.
>
> - Untrusted apps can be confined in cgroups. The groups are set to
> QoS(None) when user land decides that it's time to safe power
> (e.g. screen lock kicks in)
>
> - QoS states can block applications from I/O when the I/O system is
> set to a state which is exclusive.
>
> - ...

I understand this ... it's effectively the alan stern approach. I've
already said that looks like the way forwards.

> So that allows to use the same mechanism for more than the android
> sledge hammer approach and confines the controversial use cases into
> android specific files without adding a hard to maintain user space
> interface which would prevent or at least make it hard to do some of
> the above mentioned things which we want to see implemented.

Yes, which is why I think something like this can be made to work ... I
don't really see that we differ on the broad brush picture. As long as
the acceptable implementation accomplishes what everyone wants, I think
we're home.

James

> While I personally disagree with the android approach, I REALLY want
> to provide them a mechanism to make it work, but not for the price
> that it actively prevents solutions which are considered to be
> technically superior by a relevant group of developers/maintainers.
>
> > > You got to realize this is about Linux as a whole, I really don't care
> > > one whit about the specific Android case. We want a solution that is
> > > generic enough to solve the power consumption problem and makes sense on
> > > future hardware.
> >
> > I don't think anyone disagrees with this. As long as we find a long term
> > solution that satisfies the android case, everyone will be happy.
>
> See above.
>
> Thanks,
>
> tglx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/