From: Mark Brown on
On Sun, Jun 06, 2010 at 12:58:10PM -0700, Brian Swetland wrote:
> On Sun, Jun 6, 2010 at 12:24 PM, Christoph Hellwig <hch(a)infradead.org> wrote:

> > On the other hand I've heard
> > that various hardware vendors or parties closed to them are rather
> > annoyed by their drivers beeing stuck in the android tree - but that
> > can be easily solved by getting removing the suspend blockers (at least
> > temporarily), cleaning up a few bits here and there and getting them in.

> This continues to baffle me. If we (Google) are such a headache, why
> not just route around us. The drivers we've written are GPLv2, the
> source is out there for anyone who wants it, etc. The drivers other
> people have written we have no control over at all. From my point of
> view it'd be an annoyance if somebody took the code we wrote, modified
> it heavily, and pushed it upstream, but fundamentally I can't stop
> that from happening other than by pushing it upstream myself, first.

AFAICT this is purely down to the fact that the vendors producing
Android devices are using the kernel which is shipped with whatever
release they are using so people doing drivers end up getting locked in
to an older kernel with old APIs (independant of Android specifics) and
don't have the resource to redo things for upstream. Suspend blockers
are one more API update in there, but general kernel development creates
far more.

I was looking at this just today and one thing that it occurs to me
might help is if when you guys rebase your work against upstream you
were to tag the results - at the minute the only "release" Android
kernels are those included in full stack releases so providing more
hints that other kernel versions could be substituted in may help.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on
On Tue, 8 Jun 2010, Arve Hj�nnev�g wrote:

> >> Wakeup event occurs, and the driver:
> >> - report wakeup event type A
> >> - queue event for delivery to user-space
> >
> > That's not really two distinct steps. �Queuing the event for delivery
> > to userspace involves waking up any tasks that are waiting to read the
> > device file; that action (calling wake_up_all() or whatever the driver
> > does) is how the event gets reported.
> >
>
> If you want to ensure that more than one process see the event it has
> to be two steps, but it does not affect the race I was trying to
> describe.

Are you sure about that? If two processes call poll() for the same
file descriptor, don't both calls return when data becomes available?
But agreed, it doesn't matter -- especially since I only need one
process (the power manager) to see the event.

> >> User space wakes up:
> >> - Calls api to block task freezing for event type A
> >
> > Again, that's a confusing way of putting it. �The API you're referring
> > to is simply the function that activates a suspend blocker. �It does
> > prevent task freezing, but you shouldn't say it prevents freezing for
> > event type A. �More like the other way around: In addition to
> > preventing freezing, the function tells the power manager that event
> > type A should no longer be considered active. �Thus, in a sense it
> > _stops_ event type A from preventing freezing.
> >
> >> Another wakeup event occurs, and the driver:
> >> - report wakeup event type A

I think this is where you misunderstood. There is no "report wakeup
event" as such. All that happens is that data becomes available to be
read from the input device file. However the power manager process
isn't polling the device file at this point (because a suspend blocker
is active), so it doesn't realize that the source has become active
again.

> >> - queue event for delivery to user-space
> >
> > Same as above.
> >
> >> User space continues:
> >> - Read events
>
> Sorry, I missed the unblock task freezing step here.
>
> >> - Wait for more events
> >>
> >> Result: Task are not frozen again.
> >
> > Because the suspend blocker was never deactivated. �The same thing
> > happens with wakelocks: If a task activates a wakelock and never
> > deactivates it, the system won't go into opportunistic suspend again.
>
> Yes, but with the sequence of events above task will not be frozen
> again even if the wake-lock/suspend-blocker/task-freezing-preventer is
> released.

Yes they will. When the suspend blocker is deactivated, the power
manager process will realize that there are no active suspend blockers
and it will think there are no active sources. Thus it will freeze
processes as usual.

> > Here's how my scheme is meant to work:
> >
> > � � � �Wakeup event for input device A occurs.
> >
> > � � � �A's driver adds an entry to the input device queue and
> > � � � �(if the queue was empty) does wake_up_all() on the device
> > � � � �file's wait_queue.
> >
> > � � � �The PM process returns from poll() and sees that device
> > � � � �file A is now readable, so it adds A to its list of active
> > � � � �sources and unfreezes userspace.
> >
> > � � � �Some other process sees that device file A is now readable,
> > � � � �so it activates a suspend blocker and reads events from A.
> >
> > � � � �When the PM process receives the request to activate the
> > � � � �suspend blocker, it removes A from its list of active
> > � � � �sources. �But it doesn't freeze userspace yet, because now
> > � � � �a suspend blocker is active.
>
> If another event happens at this point don't you put A back on the
> list? If so, it never gets removed.

No, you don't put A back on the list. Sources get put on the list only
when the information returned by poll() indicates they have data
available. The power manager doesn't poll while suspend blockers are
active.

> > � � � �The other process consumes events from A and does other
> > � � � �stuff. �Maybe more input data arrives while this is happening
> > � � � �and the process reads it. �Eventually the process decides to
> > � � � �deactivate the suspend blocker, perhaps when no more data
> > � � � �is available from the device file, perhaps not.
> >
> > � � � �When the PM process receives the request to deactivate the
> > � � � �suspend blocker, it sees that now there are no active
> > � � � �sources and no active suspend blockers. �Therefore it
> > � � � �freezes userspace and does a big poll() on all possible
> > � � � �sources. �(If there are still events on the input device
> > � � � �queue, the poll() returns immediately.)
> >
> > � � � �Rinse and repeat.
> >
> > I don't see any dangerous races there. �The scheme can be made a little
> > more efficient by having the PM process do another poll() (with 0
> > timeout) just before freezing userspace; if the result indicates that a
> > source is active then the freezing and unfreezing can be skipped.

> > There is no race. �The driver reports an event has occurred by making
> > the data available to be read from the device file, and the event is
> > processed by reading it from the device file (or at least, that's the
> > first step in processing the event).
> >
>
> If the driver making data available to be read triggers a wakeup event
> in the power manager process

It doesn't. Only return from a poll() causes the power manager process
to think a wakeup event has occurred.

> that has to be cleared by the process
> reading the events, then you have a race. Since the power manager is
> selecting/polling on the same file descriptor, I don't see what you
> gain from linking the wakeup events to suspend blockers.

What I gain is the ability to know when an in-kernel wakelock _could_
have been released, without actually implementing in-kernel wakelocks.

With real in-kernel wakelocks, the wakelock is released when the input
queue becomes empty. There's no way for the power manager process to
know exactly when that happens without modifying the kernel. However
we can use the activation of the corresponding userspace suspend
blocker as a proxy. It's nearly as good, and it gets the job done.

If you prefer, an interface could be added whereby a user process tells
the power manager explicitly that it's going to read data from an
input device, instead of relying on implicit notification through
suspend blocker activation. I don't know whether this would be simpler
or more complex; it depends on the design of your userspace.

> If you break
> this link it think can work, but it does require us to modify all code
> that reads wakeup events from the kernel to register the file
> descriptors they get events from.

Yes. I don't know how your user code is structured; if there is a
fixed correspondence between file descriptors and suspend blockers (the
same wakeup events are always handled by the same suspend blockers)
then this will be a simple change -- the file descriptor can be
registered when the suspend blocker is created.

If the correspondence is more dynamic (different suspend blockers used
for the same wakeup device at different times, or multiple wakeup
devices handled by one suspend blocker) then the required changes will
be more complicated. Not tremendously more.

> It would also require adding
> poll/select support to android alarm driver,

Yes. Is this a platform-specific driver? (I assume so, since you
called it the "android alarm driver".) Then poll/select support can be
added without provoking a lot of objections from legions of kernel
developers.

> and any driver that
> currently uses a wakelock with a timeout would need to notify the user
> space power manager instead.

Hmm. This is symptomatic of a deficiency in the original wakelock
implementation -- those timeouts always were arbitrary.

The power manager would indeed have to know about wakeup devices that
don't need to _keep_ the system awake. Here's one way to cope: During
those times when no suspend blockers are active but the PM process
thinks a wakeup source is active, the PM process could poll every few
seconds to update its list of active sources. At those points it could
remove wakeup sources that have timed out.


Obviously this proposal would complicate your userspace. Not
enormously, since most of the work is confined to the power manager,
but somewhat. That's the price to be paid for leaving the kernel
essentially untouched. Consider the amount of resistance your
wakelock/suspend-blocker patches have already received; you'll have to
decide which approach will work out better in the end.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Neil Brown on
On Wed, 9 Jun 2010 11:40:27 +0200
"Rafael J. Wysocki" <rjw(a)sisk.pl> wrote:

> On Wednesday 09 June 2010, Felipe Contreras wrote:
> > On Wed, Jun 9, 2010 at 6:46 AM, Linus Torvalds
> > <torvalds(a)linux-foundation.org> wrote:
> > > On Tue, 8 Jun 2010, david(a)lang.hm wrote:
> > >>
> > >> having suspend blockers inside the kernel adds significant complexity, it's
> > >> worth it only if the complexity buys you enough. In this case the question is
> > >> if the suspend blockers would extend the sleep time enough more to matter. As
> > >> per my other e-mail, this is an area with rapidly diminishing returns as the
> > >> sleep times get longer.
> > >
> > > Well, the counter-argument that nobody seems to have brought up is that
> > > suspend blockers exist, are real code, and end up being shipped in a lot
> > > of machines.
> > >
> > > That's a _big_ argument in favour of them. Certainly much bigger than
> > > arguing against them based on some complexity-arguments for an alternative
> > > that hasn't seen any testing at all.
> > >
> > > IOW, I would seriously hope that this discussion was more about real code
> > > that _exists_ and does what people need. It seems to have degenerated into
> > > something else.
> > >
> > > Because in the end, "code talks, bullshit walks". People can complain and
> > > suggest alternatives all they want, but you can't just argue. At some
> > > point you need to show the code that actually solves the problem.
> >
> > That's assuming there is an actual problem, which according to all the
> > embedded people except android, there is not.
>
> Yes, there is, but they've decided to ignore it.
>
> > And if there is indeed such a problem (probably not big), it might be
> > solved properly by the time suspend blockers are merged, or few
> > releases after.
>
> Not quite. Have you followed all of the discussion, actually?
>
> > Whatever the solution (or workaround) is, it would be nice if it could
> > be used by more than just android people, and it would also be nice to
> > do it without introducing user-space API that *nobody* likes and might
> > be quickly deprecated.
>
> I agree with Linus and I don't have that much of a problem with the API that
> people seem to have. In fact the much-hated user space API is just a char
> device driver with 3 ioctls (that can be extended in future if need be) and
> the kernel API is acceptable to me.

I think there is a little bit more to it than that. It seems there is a new
ioctl for input/event devices to say "Any events queued here should be
treated as wake-up events". There may be similar additions to other devices,
but I know of no details.

I wonder if we can get a complete statement of changes to the user-space
API...

> Yes, there is some overlap between it
> and PM QoS, but IMhO that overlap may be reduced over time (eg. by
> using PM QoS requirements to implement suspend blockers).
>
> To me, the question boils down to whether or not we're able to persuade the
> Android people to use any other approach (eg. by demonstrating that something
> else is actually better), because even if we invent a brilliant new approach,
> but Android ends up using its old one anyway, the net result will be as though
> we haven't done anything useful.

Yes. There is no point unless we can meet somewhere in the middle. I think
that would have to include a full suspend that freezes all processes.
Solutions which reject that - while quite clever - would require too much
change to Android user-space to be acceptable.

NeilBrown

>
> Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arve Hjønnevåg on
2010/6/9 Alan Stern <stern(a)rowland.harvard.edu>:
> On Tue, 8 Jun 2010, Arve Hj�nnev�g wrote:
>
>> >> Wakeup event occurs, and the driver:
>> >> - report wakeup event type A
>> >> - queue event for delivery to user-space
>> >
>> > That's not really two distinct steps. �Queuing the event for delivery
>> > to userspace involves waking up any tasks that are waiting to read the
>> > device file; that action (calling wake_up_all() or whatever the driver
>> > does) is how the event gets reported.
>> >
>>
>> If you want to ensure that more than one process see the event it has
>> to be two steps, but it does not affect the race I was trying to
>> describe.
>
> Are you sure about that? �If two processes call poll() for the same
> file descriptor, don't both calls return when data becomes available?

Yes if they are both already in the poll call they both return, but if
one process reads the data while the second process is not in the poll
call the second process will not see anything.

> But agreed, it doesn't matter -- especially since I only need one
> process (the power manager) to see the event.

The power may not see the event, the process that reads the event will
always see it. If the power manager is not in the poll call when the
event happens, the process that reads the event can read the event
before the power manager calls poll.

>
>> >> User space wakes up:
>> >> - Calls api to block task freezing for event type A
>> >
>> > Again, that's a confusing way of putting it. �The API you're referring
>> > to is simply the function that activates a suspend blocker. �It does
>> > prevent task freezing, but you shouldn't say it prevents freezing for
>> > event type A. �More like the other way around: In addition to
>> > preventing freezing, the function tells the power manager that event
>> > type A should no longer be considered active. �Thus, in a sense it
>> > _stops_ event type A from preventing freezing.
>> >
>> >> Another wakeup event occurs, and the driver:
>> >> - report wakeup event type A
>
> I think this is where you misunderstood. �There is no "report wakeup
> event" as such. �All that happens is that data becomes available to be
> read from the input device file. �However the power manager process
> isn't polling the device file at this point (because a suspend blocker
> is active), so it doesn't realize that the source has become active
> again.
>

Yes this is not what I though you were suggesting. I thought you were
trying to make sure the power manager sees all wakeup events. If you
are only listening for wakeup events while no suspend blockers are
active, why latch them?

>> >> - queue event for delivery to user-space
>> >
>> > Same as above.
>> >
>> >> User space continues:
>> >> - Read events
>>
>> Sorry, I missed the unblock task freezing step here.
>>
>> >> - Wait for more events
>> >>
>> >> Result: Task are not frozen again.
>> >
>> > Because the suspend blocker was never deactivated. �The same thing
>> > happens with wakelocks: If a task activates a wakelock and never
>> > deactivates it, the system won't go into opportunistic suspend again.
>>
>> Yes, but with the sequence of events above task will not be frozen
>> again even if the wake-lock/suspend-blocker/task-freezing-preventer is
>> released.
>
> Yes they will. �When the suspend blocker is deactivated, the power
> manager process will realize that there are no active suspend blockers
> and it will think there are no active sources. �Thus it will freeze
> processes as usual.
>
>> > Here's how my scheme is meant to work:
>> >
>> > � � � �Wakeup event for input device A occurs.
>> >
>> > � � � �A's driver adds an entry to the input device queue and
>> > � � � �(if the queue was empty) does wake_up_all() on the device
>> > � � � �file's wait_queue.
>> >
>> > � � � �The PM process returns from poll() and sees that device
>> > � � � �file A is now readable, so it adds A to its list of active
>> > � � � �sources and unfreezes userspace.
>> >
>> > � � � �Some other process sees that device file A is now readable,
>> > � � � �so it activates a suspend blocker and reads events from A.
>> >
>> > � � � �When the PM process receives the request to activate the
>> > � � � �suspend blocker, it removes A from its list of active
>> > � � � �sources. �But it doesn't freeze userspace yet, because now
>> > � � � �a suspend blocker is active.
>>
>> If another event happens at this point don't you put A back on the
>> list? If so, it never gets removed.
>
> No, you don't put A back on the list. �Sources get put on the list only
> when the information returned by poll() indicates they have data
> available. �The power manager doesn't poll while suspend blockers are
> active.
>
>> > � � � �The other process consumes events from A and does other
>> > � � � �stuff. �Maybe more input data arrives while this is happening
>> > � � � �and the process reads it. �Eventually the process decides to
>> > � � � �deactivate the suspend blocker, perhaps when no more data
>> > � � � �is available from the device file, perhaps not.
>> >
>> > � � � �When the PM process receives the request to deactivate the
>> > � � � �suspend blocker, it sees that now there are no active
>> > � � � �sources and no active suspend blockers. �Therefore it
>> > � � � �freezes userspace and does a big poll() on all possible
>> > � � � �sources. �(If there are still events on the input device
>> > � � � �queue, the poll() returns immediately.)
>> >
>> > � � � �Rinse and repeat.
>> >
>> > I don't see any dangerous races there. �The scheme can be made a little
>> > more efficient by having the PM process do another poll() (with 0
>> > timeout) just before freezing userspace; if the result indicates that a
>> > source is active then the freezing and unfreezing can be skipped.
>
>> > There is no race. �The driver reports an event has occurred by making
>> > the data available to be read from the device file, and the event is
>> > processed by reading it from the device file (or at least, that's the
>> > first step in processing the event).
>> >
>>
>> If the driver making data available to be read triggers a wakeup event
>> in the power manager process
>
> It doesn't. �Only return from a poll() causes the power manager process
> to think a wakeup event has occurred.
>
>> that has to be cleared by the process
>> reading the events, then you have a race. Since the power manager is
>> selecting/polling on the same file descriptor, I don't see what you
>> gain from linking the wakeup events to suspend blockers.
>
> What I gain is the ability to know when an in-kernel wakelock _could_
> have been released, without actually implementing in-kernel wakelocks.
>
> With real in-kernel wakelocks, the wakelock is released when the input
> queue becomes empty. �There's no way for the power manager process to
> know exactly when that happens without modifying the kernel. �However
> we can use the activation of the corresponding userspace suspend
> blocker as a proxy. �It's nearly as good, and it gets the job done.
>

If you only poll the fd after the last user-space suspend blocker is
released, why do you care when the kernel wakelock could have been
released? It seem the only thing it saves you is an extra poll call
when two wakeup events happen at the same time and one of them is
fully processed and unblocks suspend before the other event handler
blocks suspend. It seems strange to remove your wakeup event from the
list when a specific suspend blocker is acquired when any suspend
blocker will prevent that wakeup event from being added to the list in
the first place.

> If you prefer, an interface could be added whereby a user process tells
> the power manager explicitly that it's going to read data from an
> input device, instead of relying on implicit notification through
> suspend blocker activation. �I don't know whether this would be simpler
> or more complex; it depends on the design of your userspace.
>

I don't think there is a need to tie the fds to anything else. If you
poll the fds on the last suspend unblock call, you should get the same
behaviour.

>> If you break
>> this link it think can work, but it does require us to modify all code
>> that reads wakeup events from the kernel to register the file
>> descriptors they get events from.
>
> Yes. �I don't know how your user code is structured; if there is a
> fixed correspondence between file descriptors and suspend blockers (the
> same wakeup events are always handled by the same suspend blockers)
> then this will be a simple change -- the file descriptor can be
> registered when the suspend blocker is created.
>
> If the correspondence is more dynamic (different suspend blockers used
> for the same wakeup device at different times, or multiple wakeup
> devices handled by one suspend blocker) then the required changes will
> be more complicated. �Not tremendously more.

All input events that can wake the system are handled by one
user-space suspend blocker. Input devices come and go so we would need
to add and remove the fds dynamically.

>
>> It would also require adding
>> poll/select support to android alarm driver,
>
> Yes. �Is this a platform-specific driver? �(I assume so, since you
> called it the "android alarm driver".) �Then poll/select support can be
> added without provoking a lot of objections from legions of kernel
> developers.
>

It is not platform specific, but it is not currently in the mainline
kernel so I would not expect any objections to a change that adding
poll/select support.

>> and any driver that
>> currently uses a wakelock with a timeout would need to notify the user
>> space power manager instead.
>
> Hmm. �This is symptomatic of a deficiency in the original wakelock
> implementation -- those timeouts always were arbitrary.
>
> The power manager would indeed have to know about wakeup devices that
> don't need to _keep_ the system awake. �Here's one way to cope: During
> those times when no suspend blockers are active but the PM process
> thinks a wakeup source is active, the PM process could poll every few
> seconds to update its list of active sources. �At those points it could
> remove wakeup sources that have timed out.
>

For that to work the wakeup events would have to be reported to the
power manager in a reliable way in the first place. Passing the file
descriptor that the app uses to the power manager does not work for
this, since the app could read the event while the power manager was
not in the poll call and the power manager would never see it. Also,
existing apps don't pass their file descriptors to the power manager,
so it has the get the event from somewhere else.

>
> Obviously this proposal would complicate your userspace. �Not
> enormously, since most of the work is confined to the power manager,

No, the main problem it that it is not confined to the power manager.
The power manager does not have a list of file descriptors to monitor,
so we have to modify all code that handles wakeup events. This
includes vendor supplied code that we don't have the source for, but,
on some platforms at least, this code relies on kernel wakelocks with
a timeout and could at first be handled the same way we would have to
handle existing apps reading from a socket.

> but somewhat. �That's the price to be paid for leaving the kernel
> essentially untouched. �Consider the amount of resistance your
> wakelock/suspend-blocker patches have already received; you'll have to
> decide which approach will work out better in the end.
>

The suspend blocker approach is more generally useful since it
supports hardware where suspend is needed. Why this argument is being
ignored is very puzzling.

Your solution is not immediately useful since depends on removing all
periodic kernel timers and adding support to stopping the monotonic
clock for set of processes that are frozen.

Your solution forces the user space interface to be more complicated,
and it creates a new user space power manager that has to start before
any process that handle wakeup events.

--
Arve Hj�nnev�g
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: david on
On Wed, 9 Jun 2010, Arve Hj?nnev?g wrote:

>
> The power may not see the event, the process that reads the event will
> always see it. If the power manager is not in the poll call when the
> event happens, the process that reads the event can read the event
> before the power manager calls poll.
>

>
> All input events that can wake the system are handled by one
> user-space suspend blocker. Input devices come and go so we would need
> to add and remove the fds dynamically.


>
> For that to work the wakeup events would have to be reported to the
> power manager in a reliable way in the first place. Passing the file
> descriptor that the app uses to the power manager does not work for
> this, since the app could read the event while the power manager was
> not in the poll call and the power manager would never see it. Also,
> existing apps don't pass their file descriptors to the power manager,
> so it has the get the event from somewhere else.
>

why could the suspend blocker process see all events, but the power
manager process not see the events?

have the userspace talk to the power manager the way it does to the
suspend blocker now and what's the difference?

effectivly think s/suspend blocker/power manager/ (with the power manager
doing all the other things that are proposed instead of grabbing the
wakelock), the difference should be hidden to the rest of userspace.

what am I missing here?

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/