[PATCH 0/8] Suspend block api (version 8) [Kernel]

Prev: [ANN] Linux Security Summit 2010 - Announcement and CFP
Next: [PATCH 4/8] PM: suspend_block: Add debugfs file

From: Alan Cox on 27 May 2010 12:40

On Thu, 27 May 2010 17:07:14 +0100
Matthew Garrett <mjg59(a)srcf.ucam.org> wrote:

> On Thu, May 27, 2010 at 04:05:43PM +0100, Alan Cox wrote:
> > > Now, if the user is playing this game, you want it to be scheduled. If
> > > the user has put down their phone and the screen lock has kicked in, you
> > > don't want it to be scheduled. So we could imagine some sort of cgroup
> > > that contains untrusted tasks - when the session is active we set a flag
> >
> > I would hope not, because I'd rather prefer my app that used the screen
> > to get the chance to save important data on what it was doing
> > irrespective of the screen blank: "I have an elegant proof for this
> > problem but my battery has gone flat"
>
> Perhaps set after callbacks are made. But given that the approach
> doesn't work anyway...

Which approach doesn't work, and why ?

> > What is the problem here - your device driver for the display can block
> > tasks it doesn't want to use the display.
>
> It's still racy. Going back to my example without any of the suspend
> blocking code, but using a network socket rather than an input device:
>
> int input = socket(AF_INET, SOCK_STREAM|SOCK_NONBLOCK, 0);
> char foo;
> struct sockaddr addr;
> connect (input, &addr, sizeof(addr))
> while (1) {
> if (read(input, &foo, 1) > 0) {
> (do something)
> } else {
> (draw bouncing cows and clouds and tractor beams briefly)
> }
> }
>
> A network packet arrives while we're drawing. Before we finish drawing,
> the policy timeout expires and the screen turns off.

Which is correct for a badly behaved application. You said you wanted to
constrain it. You've done so. Now I am not sure why such a "timeout"
would expire in the example as the task is clearly busy when drawing, or
is talking to someone else who is in turn busy. Someone somewhere is
actually drawing be it a driver or app code.

For a well behaved application you are drawing so you are running
drawing stuff so why would you suspend. The app has said it has a
latency constraint that suspend cannot meet, or has a device open that
cannot meet the constraints in suspend.

You also have the socket open so you can meaningfully extract resource
constraint information from that fact.

See it's not the read() that matters, it's the connect and the close.

If your policy for a well behaved application is 'thou shalt not
suspend in a way that breaks its networking' then for a well behaving app
once I connect the socket we cannot suspend that app until such point as
the app closes the socket. At any other point we will break the
connection. Whether that is desirable is a policy question and you get to
pick how much you choose to trust an app and how you interpret the
information in your cpufreq and suspend drivers.

If you have wake-on-lan then the network stack might be smarter and
choose to express itself as

'the constraint is C6 unless the input queue is empty in which
case suspend is ok as I have WoL and my network routing is such
that I can prove that interface will be used'

In truth I doubt much hardware can make such an inference but some phones
probably can. On the other hand for /dev/input/foo you can make the
inference very nicely thank you.

Again wake on lan information does not belong in the application !

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Matthew Garrett on 27 May 2010 13:00

On Thu, May 27, 2010 at 05:41:31PM +0100, Alan Cox wrote:
> On Thu, 27 May 2010 17:07:14 +0100
> Matthew Garrett <mjg59(a)srcf.ucam.org> wrote:
> > Perhaps set after callbacks are made. But given that the approach
> > doesn't work anyway...
>
> Which approach doesn't work, and why ?

Sorry, using cgroups and scheduler tricks as a race-free replacement for
opportunistic suspend.

> > It's still racy. Going back to my example without any of the suspend
> > blocking code, but using a network socket rather than an input device:
> >
> > int input = socket(AF_INET, SOCK_STREAM|SOCK_NONBLOCK, 0);
> > char foo;
> > struct sockaddr addr;
> > connect (input, &addr, sizeof(addr))
> > while (1) {
> > if (read(input, &foo, 1) > 0) {
> > (do something)
> > } else {
> > (draw bouncing cows and clouds and tractor beams briefly)
> > }
> > }
> >
> > A network packet arrives while we're drawing. Before we finish drawing,
> > the policy timeout expires and the screen turns off.
>
> Which is correct for a badly behaved application. You said you wanted to
> constrain it. You've done so. Now I am not sure why such a "timeout"
> would expire in the example as the task is clearly busy when drawing, or
> is talking to someone else who is in turn busy. Someone somewhere is
> actually drawing be it a driver or app code.

The timeout would be at the userspace platform level. If I haven't
touched the app for 30 seconds (and if the app hasn't taken any form of
suspend block), the screen should turn off. In the current Android
implementation that will then (in the absence of any kernel-level
suspend blockers) result in the system transitioning into a fully
suspended state.

> For a well behaved application you are drawing so you are running
> drawing stuff so why would you suspend. The app has said it has a
> latency constraint that suspend cannot meet, or has a device open that
> cannot meet the constraints in suspend.

Not at all. The fact that the application hasn't taken any sort of
suspend block means that the application has indicated that it's happy
with no longer being scheduled when the screen is shut off, *providing
there's no wakeup event to be processed*.

> You also have the socket open so you can meaningfully extract resource
> constraint information from that fact.
>
> See it's not the read() that matters, it's the connect and the close.
>
> If your policy for a well behaved application is 'thou shalt not
> suspend in a way that breaks its networking' then for a well behaving app
> once I connect the socket we cannot suspend that app until such point as
> the app closes the socket. At any other point we will break the
> connection. Whether that is desirable is a policy question and you get to
> pick how much you choose to trust an app and how you interpret the
> information in your cpufreq and suspend drivers.

Again, that's not the desired outcome. The desired outcome is that when
the screen shuts off, the application no longer gets scheduled until a
network packet arrives. The difference between these scenarios is large.

> If you have wake-on-lan then the network stack might be smarter and
> choose to express itself as
>
> 'the constraint is C6 unless the input queue is empty in which
> case suspend is ok as I have WoL and my network routing is such
> that I can prove that interface will be used'

This is still racy. Going back to this:

int input = socket(AF_INET, SOCK_STREAM|SOCK_NONBLOCK, 0);
char foo;
struct sockaddr addr;
connect (input, &addr, sizeof(addr))
while (1) {
if (read(input, &foo, 1) > 0) {
(do something)
} else {
* SUSPEND OCCURS HERE *
(draw bouncing cows and clouds and tractor beams briefly)
}
}

A wakeup event now arrives. We use kernel level suspend blockers to
prevent the system from going back to sleep until userspace has read the
packet. The application finishes drawing its cows, reads the packet
(thus releasing the kernel-level suspend block) and them immediately
reaches the end of its timeslice. At this point the application has not
had an opportunity to indicate in any way whether or not the packet has
altered its constraints in any way. What stops us from immediately
suspending again?

--
Matthew Garrett | mjg59(a)srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Alan Stern on 27 May 2010 13:10

On Thu, 27 May 2010, Felipe Balbi wrote:

> On Thu, May 27, 2010 at 05:06:23PM +0200, ext Alan Stern wrote:
> >If people don't mind, here is a greatly simplified summary of the
> >comments and objections I have seen so far on this thread:
> >
> > The in-kernel suspend blocker implementation is okay, even
> > beneficial.
>
> I disagree here. I believe expressing that as QoS is much better. Let
> the kernel decide which power state is better as long as I can say I
> need 100us IRQ latency or 100ms wakeup latency.

Does this mean you believe "echo mem >/sys/power/state" is bad and
should be removed? Or "echo disk >/sys/power/state"? They pay no
attention to latencies or other requirements.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Thomas Gleixner on 27 May 2010 13:10

On Thu, 27 May 2010, Alan Cox wrote:
> That's all your need to do it right.
>
> In kernel yes your device driver probably does need to say things like
> 'Don't go below C6 for a moment' just as a high speed serial port might
> want to say 'Nothing over 10mS please'
>
> I can't speak for Thomas, but I'm certainly not arguing that you don't
> need something that looks more like the blocker side of the logic *in
> kernel*, because there is stuff that you want to express which isn't tied
> to the task.

I'm not opposed, but yes it needs to be expressed in quantifiable
terms, i.e. wakeup latency. That's just contributing to the global QoS
state of affairs even if it is not tied to a particular task.

And that allows the driver to be intelligent about it. The serial port
at 9600 has definitely different requirements than at 115200.

But that's quite a different concept than the big hammer approach of
the blockers.

Thanks,

tglx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Matthew Garrett on 27 May 2010 13:10

On Thu, May 27, 2010 at 07:04:38PM +0200, Thomas Gleixner wrote:
> On Thu, 27 May 2010, Matthew Garrett wrote:
> > Sure, if you're not using opportunistic suspend then I don't think
> > there's any real need for the userspace side of this. The question is
> > how to implement something with the useful properties of opportunistic
> > suspend without without implementing something pretty much equivalent to
> > the userspace suspend blockers. I've sent another mail expressing why I
> > don't think your proposed QoS style behaviour provides that.
>
> Opportunistic suspend is just a deep idle state, nothing else.

No. The useful property of opportunistic suspend is that nothing gets
scheduled. That's fundamentally different to a deep idle state.

> Stop thinking about suspend as a special mechanism. It's not - except
> for s2disk, which is an entirely different beast.

On PCs, suspend has more in common with s2disk than it does C states.

--
Matthew Garrett | mjg59(a)srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Prev: [ANN] Linux Security Summit 2010 - Announcement and CFP
Next: [PATCH 4/8] PM: suspend_block: Add debugfs file