From: Alan Stern on
On Mon, 7 Jun 2010, Arve Hj�nnev�g wrote:

> > It's true that under some exceptional circumstances the system would
> > never remove a wakeup source from the "active" list and then would
> > never go idle. �But exactly the same problem exists with wakelocks, if
> > the kernel activates a wakelock and there's no user process reading the
> > corresponding event queue.
> >
>
> No, you have a different problem. If you open an input device and
> issue the ioctl to enable the suspend blocker that blocks while the

Um, the suspend blocker that is active while the queue is nonempty is
an in-kernel suspend blocker, right? Not a userspace suspend blocker.
Hence it doesn't have to be enabled by an ioctl. Or is this some part
of the whole wakelock design that hasn't yet been posted? As far as I
know, you intended the in-kernel suspend blocker to be enabled whenever
the input device file is open.

> queue is not empty then don't read the event, that is a bug that is
> easy to fix.

I assume you mean "you open an input device but then fail to read from
it". When that happens the device's driver will activate its in-kernel
suspend blocker, and since the input queue will never become empty, the
suspend blocker will never be deactivated. Yes, that's a bug.

> What you have is a race condition. If you read an event
> that occurred after you blocked the task freezing tasks will never get
> frozen again (until more events occur).

Sorry, I can't parse that sentence. Could you rephrase it more
grammatically?

It seems to say: "If you read an event that occurred after [something],
then tasks won't get frozen again until more events occur". Which
doesn't make sense, firstly because in my scheme reading events has no
direct connection with freezing or unfreezing tasks, and secondly
because the occurrence of events doesn't cause tasks to be frozen --
just the opposite: occurrence of events _prevents_ tasks from being
frozen.

> We have multiple input devices and one thread reading from them. Do
> all input devices that can generate wakeup events share a wakeup
> source?

Basically, a wakeup source is a file descriptor that in your scheme,
some user process would read from in order to clear an in-kernel
wakelock. Thus, if each of your input devices activates an in-kernel
wakelock that is cleared when a user process reads the device, then the
file descriptors for these input devices would each be considered a
wakeup source.

> It seems you would need a way to pass the wakeup source id to use from
> user space to the driver and for this to work

No, nothing needs to be passed from userspace to the kernel. However
the source ID (or a set of source IDs) does need to be passed to the
power manager process, probably when the suspend blocker is created.

[On rereading this, I realized it might not have been clear that in my
scheme, suspend blockers have no in-kernel component. They are
implemented entirely by IPC between the process owning the suspend
blocker and the power manager process. Would it be less confusing if I
called them something else?]

> (ignoring the race if
> you allow multiple alarms per file) which seems like more work than
> using a suspend blocker.

It's not very much more: just one additional argument to the routine
that creates a suspend blocker.

I get the impression that you don't fully understand how my scheme is
meant to work. Would some additional explanation or examples help?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arve Hjønnevåg on
2010/6/7 Alan Stern <stern(a)rowland.harvard.edu>:
> On Mon, 7 Jun 2010, Arve Hj�nnev�g wrote:
>
>> > It's true that under some exceptional circumstances the system would
>> > never remove a wakeup source from the "active" list and then would
>> > never go idle. �But exactly the same problem exists with wakelocks, if
>> > the kernel activates a wakelock and there's no user process reading the
>> > corresponding event queue.
>> >
>>
>> No, you have a different problem. If you open an input device and
>> issue the ioctl to enable the suspend blocker that blocks while the
>
> Um, the suspend blocker that is active while the queue is nonempty is
> an in-kernel suspend blocker, right? �Not a userspace suspend blocker.
> Hence it doesn't have to be enabled by an ioctl. �Or is this some part
> of the whole wakelock design that hasn't yet been posted? �As far as I
> know, you intended the in-kernel suspend blocker to be enabled whenever
> the input device file is open.
>

The patch that modifies evdev (posted in this patchset) uses an ioctl
to enable the suspend blocker. Not all input devices are used for
wakeup events and those don't need to block suspend.

>> queue is not empty then don't read the event, that is a bug that is
>> easy to fix.
>
> I assume you mean "you open an input device but then fail to read from
> it". �When that happens the device's driver will activate its in-kernel
> suspend blocker, and since the input queue will never become empty, the
> suspend blocker will never be deactivated. �Yes, that's a bug.
>
>> What you have is a race condition. If you read an event
>> that occurred after you blocked the task freezing tasks will never get
>> frozen again (until more events occur).
>
> Sorry, I can't parse that sentence. �Could you rephrase it more
> grammatically?
>

If you read an event that occurred after you blocked the task
freezing, then tasks will never get frozen again (until more events
occur). I think my original description was less confusing, but it
seems you got completely distracted by my use of block and unblock
suspend when referring to the user space api.


> It seems to say: "If you read an event that occurred after [something],

Block suspend, block task freezing or whatever you want to call it.

> then tasks won't get frozen again until more events occur". �Which
> doesn't make sense, firstly because in my scheme reading events has no
> direct connection with freezing or unfreezing tasks, and secondly

It has an indirect connection. You report a wakeup event when it
occurs, but clear it when user space calls an api before reading the
event. So:
Wakeup event occurs, and the driver:
- report wakeup event type A
- queue event for delivery to user-space
User space wakes up:
- Calls api to block task freezing for event type A
Another wakeup event occurs, and the driver:
- report wakeup event type A
- queue event for delivery to user-space
User space continues:
- Read events
- Wait for more events

Result: Task are not frozen again.

> because the occurrence of events doesn't cause tasks to be frozen --
> just the opposite: occurrence of events _prevents_ tasks from being
> frozen.
>
>> We have multiple input devices and one thread reading from them. Do
>> all input devices that can generate wakeup events share a wakeup
>> source?
>
> Basically, a wakeup source is a file descriptor that in your scheme,
> some user process would read from in order to clear an in-kernel
> wakelock. �Thus, if each of your input devices activates an in-kernel
> wakelock that is cleared when a user process reads the device, then the
> file descriptors for these input devices would each be considered a
> wakeup source.
>
>> It seems you would need a way to pass the wakeup source id to use from
>> user space to the driver and for this to work
>
> No, nothing needs to be passed from userspace to the kernel. �However
> the source ID (or a set of source IDs) does need to be passed to the
> power manager process, probably when the suspend blocker is created.
>

Then the source id need to be passed from the kernel to user-space.

> [On rereading this, I realized it might not have been clear that in my
> scheme, suspend blockers have no in-kernel component. �They are
> implemented entirely by IPC between the process owning the suspend
> blocker and the power manager process. �Would it be less confusing if I
> called them something else?]

No, that is not the unclear part. What is unclear to me is where the
source IDs come from. Are they static and hardcoded in the driver and
user-space, or are they passed between the driver and user-space
client?

>
>> (ignoring the race if
>> you allow multiple alarms per file) which seems like more work than
>> using a suspend blocker.
>
> It's not very much more: just one additional argument to the routine
> that creates a suspend blocker.
>
> I get the impression that you don't fully understand how my scheme is
> meant to work. �Would some additional explanation or examples help?
>

I don't understand how you are planning to ensure that the driver and
user-space code that consumes the real event use the same source id.
The biggest problem I have with it though is that you have created a
new race condition between reporting that a wakeup event has occurred
and processing of the real event.

--
Arve Hj�nnev�g
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on
On Tuesday 08 June 2010, Arve Hj�nnev�g wrote:
> 2010/6/6 Rafael J. Wysocki <rjw(a)sisk.pl>:
> > On Sunday 06 June 2010, Arve Hj�nnev�g wrote:
....
> If individual processes are frozen, we run into problems that we
> cannot run into if we freeze and thaw all processes.

Not individual processes, but the processes that don't use wakelocks in the
Android world all together. And of course the approach has to be different,
because it's a different design, but I don't think there are any fundamental
issues you can't solve within this approach.

> >> The app that reads this event blocks suspend before reading it. If it was
> >> busy talking to a less trusted app when the event happened it still works
> >> since all apps are running at this point.
> >
> > And how is this different from an approach with cgroup freezing? Apps that
> > use wakelock within the current framework would use "freeze locks" to prevent
> > the "untrusted" part of user space from being frozen or to thaw it. Where's
> > the problem, then?
> >
>
> They will not be able to get wakeup events directly from the kernel.
> If the kernel does not thaw processes when a wakeup event happens, the
> app may never get to the point where it grabs its wakelock.

The apps that use "freeze locks" (or wakelocks) are never frozen, so I don't
think this would be a problem.

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Florian Mickler on
On Mon, 7 Jun 2010 20:05:56 -0700
Arve Hj�nnev�g <arve(a)android.com> wrote:
Hi,

>
> If you read an event that occurred after you blocked the task
> freezing, then tasks will never get frozen again (until more events
> occur). I think my original description was less confusing, but it
> seems you got completely distracted by my use of block and unblock
> suspend when referring to the user space api.

Here is how I understood Alan's approach:

Userspace manager (UM) does:

<...continuation of function A>
5) unblock from reading a wakeup from wakeupevents-fd
6) thaw userspace
7) return
</function A>

[userspace sees there is an event; blocks suspend at UM; processes
event; consume wakeupevent at UM; unblock suspend at UM;]

Unblocking the last suspend-blocker at the UM starts function A:

<function A>
1) non-blocking read of wakeup-events-fd (refills wakeupevents)
2) if all wakeupevents are consumed:
3a) freeze userspace
else
3b) /* wait for userspace to unblock
suspend again... this should take care of the races? */ return;
4) blocking read of wakeupevents-fd
<...for continuation see above>

You mitigate the race by freezing and unfreezing userspace. If there
occur wakeups between 3a) and 4) you will have frozen userspace in
vain.

So I think the feasibility of this solution depends on the performance
of freezing/thawing userspace. I can't judge that.

Also I _think_ this is racefree as long as you have the UM properly
serialized. Or did I overlook something?


Cheers,
Flo


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on
On Mon, 7 Jun 2010, Arve Hj�nnev�g wrote:

> The patch that modifies evdev (posted in this patchset) uses an ioctl
> to enable the suspend blocker. Not all input devices are used for
> wakeup events and those don't need to block suspend.

But you do have a 1-1 correspondence, right? That is, the input
devices that are used for wakeup events are exactly the ones that block
suspend?


> If you read an event that occurred after you blocked the task
> freezing, then tasks will never get frozen again (until more events
> occur). I think my original description was less confusing, but it
> seems you got completely distracted by my use of block and unblock
> suspend when referring to the user space api.

I still find your wording a little confusing. Task freezing can be
prevented (a more accurate term than "blocked") by two kinds of things:
a suspend blocker or an "active" wakeup source. I'm not sure which
kind you mean here.

> It has an indirect connection. You report a wakeup event when it
> occurs, but clear it when user space calls an api before reading the
> event. So:

Yes, that's right.

> Wakeup event occurs, and the driver:
> - report wakeup event type A
> - queue event for delivery to user-space

That's not really two distinct steps. Queuing the event for delivery
to userspace involves waking up any tasks that are waiting to read the
device file; that action (calling wake_up_all() or whatever the driver
does) is how the event gets reported.

> User space wakes up:
> - Calls api to block task freezing for event type A

Again, that's a confusing way of putting it. The API you're referring
to is simply the function that activates a suspend blocker. It does
prevent task freezing, but you shouldn't say it prevents freezing for
event type A. More like the other way around: In addition to
preventing freezing, the function tells the power manager that event
type A should no longer be considered active. Thus, in a sense it
_stops_ event type A from preventing freezing.

> Another wakeup event occurs, and the driver:
> - report wakeup event type A
> - queue event for delivery to user-space

Same as above.

> User space continues:
> - Read events
> - Wait for more events
>
> Result: Task are not frozen again.

Because the suspend blocker was never deactivated. The same thing
happens with wakelocks: If a task activates a wakelock and never
deactivates it, the system won't go into opportunistic suspend again.

Here's how my scheme is meant to work:

Wakeup event for input device A occurs.

A's driver adds an entry to the input device queue and
(if the queue was empty) does wake_up_all() on the device
file's wait_queue.

The PM process returns from poll() and sees that device
file A is now readable, so it adds A to its list of active
sources and unfreezes userspace.

Some other process sees that device file A is now readable,
so it activates a suspend blocker and reads events from A.

When the PM process receives the request to activate the
suspend blocker, it removes A from its list of active
sources. But it doesn't freeze userspace yet, because now
a suspend blocker is active.

The other process consumes events from A and does other
stuff. Maybe more input data arrives while this is happening
and the process reads it. Eventually the process decides to
deactivate the suspend blocker, perhaps when no more data
is available from the device file, perhaps not.

When the PM process receives the request to deactivate the
suspend blocker, it sees that now there are no active
sources and no active suspend blockers. Therefore it
freezes userspace and does a big poll() on all possible
sources. (If there are still events on the input device
queue, the poll() returns immediately.)

Rinse and repeat.

I don't see any dangerous races there. The scheme can be made a little
more efficient by having the PM process do another poll() (with 0
timeout) just before freezing userspace; if the result indicates that a
source is active then the freezing and unfreezing can be skipped.

The big assumption here is that a user process never consumes wakeup
events without first activating a suspend blocker. This seems like a
reasonable assumption, but we can work around it if necessary.

> >> It seems you would need a way to pass the wakeup source id to use from
> >> user space to the driver and for this to work
> >
> > No, nothing needs to be passed from userspace to the kernel. �However
> > the source ID (or a set of source IDs) does need to be passed to the
> > power manager process, probably when the suspend blocker is created.
> >
>
> Then the source id need to be passed from the kernel to user-space.

A source ID is a file descriptor. File descriptors are passed from the
kernel to userspace whenever a file is opened; I can't deny it. And
they are passed back to the kernel as part of the read() and poll()
system calls. Is that what you mean?

> No, that is not the unclear part. What is unclear to me is where the
> source IDs come from. Are they static and hardcoded in the driver and
> user-space, or are they passed between the driver and user-space
> client?

They are not static; they are file descriptors. I guess this should
have been made more clear originally, but this is still pretty new to
me too.

> I don't understand how you are planning to ensure that the driver and
> user-space code that consumes the real event use the same source id.

How can it be otherwise? The userspace code consumes the event by
reading from the device file. In order to do so, it has to use the
same file descriptor it received when it opened the device file
originally.

> The biggest problem I have with it though is that you have created a
> new race condition between reporting that a wakeup event has occurred
> and processing of the real event.

There is no race. The driver reports an event has occurred by making
the data available to be read from the device file, and the event is
processed by reading it from the device file (or at least, that's the
first step in processing the event).


There's one other thing worth mentioning. All along I've been talking
about a power manager process that coordinates all these activities.
In theory there's no reason that process couldn't be implemented as a
kernel thread. This would improve efficiency by reducing the number of
context switches, and it would change IPC calls into plain system
calls.

If you did implement it that way, it could be done as a standalone
kernel module, totally noninvasive. It would not need to be part of
the vanilla kernel and nobody would object to it.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/