From: Alan Stern on
On Mon, 21 Jun 2010, Florian Mickler wrote:

> > > > What happens if an event arrives just before you read
> > > > /sys/power/wakeup_count, but the userspace consumer doesn't realize
> > > > there is a new unprocessed event until after the power manager checks
> > > > it?
> >
> > > I think this is not the kernel's problem. In this approach the kernel makes it
> > > possible for the user space to avoid the race. Whether or not the user space
> > > will use this opportunity is a different matter.
> >
> > It is _not_ possible for userspace to avoid this race. Help from the
> > kernel is needed.
>
> It is possible if every (relevant) userspace program implements a
> callback for the powermanager to check if one of it's wakeup-sources
> got activated.
>
> That way the powermanager would read /sys/power/wakeup_count, then do
> the roundtrip to all it's registered users and only then suspend.

After further thought, there's still a race:

A wakeup event arrives.

The kernel increments /sys/power/wakeup_count and starts
processing the event.

The power-manager process reads /sys/power/wakeup_count.

The power-manager process polls the relevant programs and
they all say no events are pending.

The power-manager process successfully writes
/sys/power/wakeup_count.

The power-manager process initiates a suspend.

... Hours later ...

The system wakes up.

The kernel finishes its internal processing of the event and
sends a notification to a user program.

The problem here is that the power-manager process can't tell when the
kernel has finished processing an event. This is true both for events
that need to propagate to userspace and for events that are handled
entirely by the kernel.

This is a reflection of the two distinct pieces of information that we
need to keep track of:

A wakeup event has arrived, so it's no longer safe to suspend.

Wakeup events are no longer pending, so it's once again
safe to suspend.

The wakeup_count interface tracks the first, but in this scheme nothing
tracks the second.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Florian Mickler on
On Mon, 21 Jun 2010 11:23:33 -0400 (EDT)
Alan Stern <stern(a)rowland.harvard.edu> wrote:

> On Mon, 21 Jun 2010, Florian Mickler wrote:
>
> > On Sun, 20 Jun 2010 22:23:38 -0400 (EDT)
> > Alan Stern <stern(a)rowland.harvard.edu> wrote:
>
> > > This is the race I was talking about:
> > >
> > > > > What happens if an event arrives just before you read
> > > > > /sys/power/wakeup_count, but the userspace consumer doesn't realize
> > > > > there is a new unprocessed event until after the power manager checks
> > > > > it?
> > >
> > > > I think this is not the kernel's problem. In this approach the kernel makes it
> > > > possible for the user space to avoid the race. Whether or not the user space
> > > > will use this opportunity is a different matter.
> > >
> > > It is _not_ possible for userspace to avoid this race. Help from the
> > > kernel is needed.
> >
> > It is possible if every (relevant) userspace program implements a
> > callback for the powermanager to check if one of it's wakeup-sources
> > got activated.
> >
> > That way the powermanager would read /sys/power/wakeup_count, then do
> > the roundtrip to all it's registered users and only then suspend.
> >
> > This turns the suspend_blockers concept around. Instead of actively
> > signaling the suspend_blockers, the userspace programs only answer
> > "yes/no" when asked. (i.e. polling?)
>
> In the end you would want to have communication in both directions:
> suspend blockers _and_ callbacks. Polling is bad if done too often.
> But I think the idea is a good one.

Actually, I'm not so shure.

1. you have to roundtrip whereas in the suspend_blocker scheme you have
active annotations (i.e. no further action needed)

2. it may not be possible for a user to determine if a wake-event is
in-flight. you would have to somehow pass the wake-event-number with
it, so that the userspace process could ack it properly without
confusion. Or... I don't know of anything else...

1. userspace-manager (UM) reads a number (42).

2. it questions userspace program X: is it ok to suspend?

[please fill in how userspace program X determines to block
suspend]

3a. UM's roundtrip ends and it proceeds to write "42" to the
kernel [suspending]
3b. UM's roundtrip ends and it aborts suspend, because a
(userspace-)suspend-blocker got activated

I'm not shure how the userspace program could determine that there is a
wake-event in flight. Perhaps by storing the number of last wake-event.
But then you need per-wake-event-counters... :|


> In fact, you don't need a "yes/no" response. Programs merely need a
> chance to activate a new suspend blocker if a wakeup source was
> recently activated before they acknowledge the poll.

true. (incorporated alreeady above)

>
> > You _can not_ implement userspace suspend blockers with this approach,
> > as it is vital for every userspace program to get scheduled and check
> > it's wakeup-source (if even possible) before you know that the right
> > parties have won the race.
>
> I'm not sure what you mean.

Sorry, that was not understandable. What I meant was that you "_can
not_" implement the suspend-blockers scheme, where you don't need to
roundtrip through all userspace (with all it's glory).
( => you need the roundtrip here)

>
> There is still at least one loophole to be closed: Android's
> timer-based wakelocks. These include cases where the Android
> developers didn't add enough wakelocks to cover the entire path from
> kernel-event to userspace-handler, so they punted and relied on a timer
> to decide when the wakelock should be deactivated. (There may be other
> cases too; I didn't follow the original discussion very closely.)
> It's not clear whether these things can be handled already in Rafael's
> scheme with your addition, or whether something new is needed.
>
> Alan Stern

Do you have some thoughts about the wake-event-in-flight detection?

Cheers,
Flo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Florian Mickler on
On Mon, 21 Jun 2010 12:54:44 -0400 (EDT)
Alan Stern <stern(a)rowland.harvard.edu> wrote:

> On Mon, 21 Jun 2010, Florian Mickler wrote:
>
> > > > > What happens if an event arrives just before you read
> > > > > /sys/power/wakeup_count, but the userspace consumer doesn't realize
> > > > > there is a new unprocessed event until after the power manager checks
> > > > > it?
> > >
> > > > I think this is not the kernel's problem. In this approach the kernel makes it
> > > > possible for the user space to avoid the race. Whether or not the user space
> > > > will use this opportunity is a different matter.
> > >
> > > It is _not_ possible for userspace to avoid this race. Help from the
> > > kernel is needed.
> >
> > It is possible if every (relevant) userspace program implements a
> > callback for the powermanager to check if one of it's wakeup-sources
> > got activated.
> >
> > That way the powermanager would read /sys/power/wakeup_count, then do
> > the roundtrip to all it's registered users and only then suspend.
>
> After further thought, there's still a race:

Ah, yes. That's what I meant in my other mail with 'how to determine
that an wake event is in flight'.

Read this too late.

> Alan Stern

Cheers,
Flo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on
On Monday, June 21, 2010, Alan Stern wrote:
> On Mon, 21 Jun 2010, Florian Mickler wrote:
>
> > > > > What happens if an event arrives just before you read
> > > > > /sys/power/wakeup_count, but the userspace consumer doesn't realize
> > > > > there is a new unprocessed event until after the power manager checks
> > > > > it?
> > >
> > > > I think this is not the kernel's problem. In this approach the kernel makes it
> > > > possible for the user space to avoid the race. Whether or not the user space
> > > > will use this opportunity is a different matter.
> > >
> > > It is _not_ possible for userspace to avoid this race. Help from the
> > > kernel is needed.
> >
> > It is possible if every (relevant) userspace program implements a
> > callback for the powermanager to check if one of it's wakeup-sources
> > got activated.
> >
> > That way the powermanager would read /sys/power/wakeup_count, then do
> > the roundtrip to all it's registered users and only then suspend.
>
> After further thought, there's still a race:
>
> A wakeup event arrives.
>
> The kernel increments /sys/power/wakeup_count and starts
> processing the event.
>
> The power-manager process reads /sys/power/wakeup_count.

You assume that these two steps will occur instantaneously one after the other,
while there may be (and in fact there should be) a delay between them.

I would make the power manager wait for certain time after reading
/sys/power/wakeup_count and if no wakeup events are reported to it within
that time, to write to /sys/power/wakeup_count.

The length of the time to wait would be system-dependent in general, but I'd
also allow the event consumers to influence it (like when an application knows
it will process things for 10 minutes going forward, so it tells the power
manager to wait for at least 10 minutes before attempting to suspend).

> The power-manager process polls the relevant programs and
> they all say no events are pending.
>
> The power-manager process successfully writes
> /sys/power/wakeup_count.
>
> The power-manager process initiates a suspend.
>
> ... Hours later ...
>
> The system wakes up.
>
> The kernel finishes its internal processing of the event and
> sends a notification to a user program.
>
> The problem here is that the power-manager process can't tell when the
> kernel has finished processing an event. This is true both for events
> that need to propagate to userspace and for events that are handled
> entirely by the kernel.
>
> This is a reflection of the two distinct pieces of information that we
> need to keep track of:
>
> A wakeup event has arrived, so it's no longer safe to suspend.
>
> Wakeup events are no longer pending, so it's once again
> safe to suspend.
>
> The wakeup_count interface tracks the first, but in this scheme nothing
> tracks the second.

Which I don't think is really necessary, because we'll need to use timeouts
anyway, at least for events that have no user space consumers.

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on
On Monday, June 21, 2010, Alan Stern wrote:
> On Sun, 20 Jun 2010, Rafael J. Wysocki wrote:
>
> > > > Generally, there are two problems in that area. First, if a wakeup event
> > > > occurs exactly at the same time when /sys/power/state is being written to,
> > > > the even may be delivered to user space right before the freezing of it,
> > > > in which case the user space consumer of the event may not be able to process
> > > > it before the system is suspended.
> > >
> > > Indeed, the same problem arises if the event isn't delivered to
> > > userspace until after userspace is frozen.
> >
> > In that case the kernel should abort the suspend so that the event can be
> > delivered to the user space.
>
> Yes.
>
> > > Of course, the underlying issue here is that the kernel has no direct way
> > > to know when userspace has finished processing an event. Userspace would
> > > have to tell it, which generally would mean rewriting some large number of user
> > > programs.
> >
> > I'm not sure of that. If the kernel doesn't initiate suspend, it doesn't
> > really need to know whether or not user space has already consumed the event.
>
> That's true. But it only shifts the onus: When a userspace program has
> finished processing an event, it has to tell the power-manager process.
> Clearly this sort of thing is unavoidable, one way or another.

That's correct, but there already are multiple ways to communicate things
between user space processes, while we would need new interfaces to
pass that information between user space and the kernel. That makes quite
some difference.

> > > > The following patch illustrates my idea of how these two problems may be
> > > > addressed. It introduces a new global sysfs attribute,
> > > > /sys/power/wakeup_count, associated with a running counter of wakeup events
> > > > and a helper function, pm_wakeup_event(), that may be used by kernel subsystems
> > > > to increment the wakeup events counter.
> > >
> > > In what way is this better than suspend blockers?
> >
> > It doesn't add any new framework and it doesn't require the users of
> > pm_wakeup_event() to "unblock" suspend, so it is simpler. It also doesn't add
> > the user space interface that caused so much opposition to appear.
>
> Okay. A quick comparison shows that in your proposal:
>
> There's no need to register and unregister suspend blockers.
> But instead you create the equivalent of a suspend blocker
> inside every struct device.

Not really. That thing is for statistics purposes only and it's not essential
from the functionality POV. But you know that. :-)

> Drivers (or subsystems) don't have to activate suspend
> blockers. But instead they have to call pm_wakeup_event().
>
> Drivers don't have to deactivate suspend blockers. You don't
> have anything equivalent, and as a result your scheme is
> subject to the race described below.

The original one without user space variant of suspend blockers is subject to
it as well, because the kernel doesn't know when the user space consumer is
going to finish processing the event.

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/