From: Alan Stern on
On Fri, 21 May 2010, [UTF-8] Arve Hjønnevåg wrote:

> The first goal can be achieved either by using device runtime PM and
> cpuidle to put all hardware into low-power states, transparently from
> the user space point of view, or by suspending the whole system.
> However, system suspend, in its current form, does not guarantee that
> the events of interest will always be responded to, since wakeup
> events (events that wake the CPU from idle and the system from
> suspend) that occur right after initiating suspend will not be
> processed until another possibly unrelated event wakes the system up
> again.

Minor point of clarification here. I'm not requesting that the patch
description be rewritten. But this issue of lost wakeup events is more
subtle than it appears.

Wakeup events can be lost in at least three different ways:

1. A hardware signal (such as an IRQ) gets ignored.

2. The hardware event occurs, but without effect since the
kernel thread that would handle the event has been frozen.
The event just ends up sitting in a queue somewhere until
something else wakes up the system.

3. The hardware event occurs and the kernel handles it fully,
but the event propagates to userspace for further handling
and the user program is already frozen.

1 is a hardware configuration failure (for example, it might happen as
a result of using edge-triggered IRQs instead of level-triggered) and
is outside the scope of this discussion.

2 generally represents a failure of the core PM subsystem, or a failure
of some other part of the kernel to use the PM core correctly. In
theory we should be able to fix such mistakes. Right now I'm aware of
at least one possible failure scenario that could be fixed fairly
easily.

3 is the type of failure that suspend blockers were really meant to
handle, particularly the userspace suspend-blocker API.

IMO, we should strive to fix the existing type-2 failure modes.
However it is worth pointing out that they are basically separate from
the suspend-blocker mechanism.

And it might be a good idea to point out somewhere in the patch
descriptions that suspend blockers are really meant to handle type-3
wakeup losses.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arve Hjønnevåg on
2010/5/21 Alan Stern <stern(a)rowland.harvard.edu>:
> On Fri, 21 May 2010, [UTF-8] Arve Hj�nnev�g wrote:
>
>> The first goal can be achieved either by using device runtime PM and
>> cpuidle to put all hardware into low-power states, transparently from
>> the user space point of view, or by suspending the whole system.
>> However, system suspend, in its current form, does not guarantee that
>> the events of interest will always be responded to, since wakeup
>> events (events that wake the CPU from idle and the system from
>> suspend) that occur right after initiating suspend will not be
>> processed until another possibly unrelated event wakes the system up
>> again.
>
> Minor point of clarification here. �I'm not requesting that the patch
> description be rewritten. �But this issue of lost wakeup events is more
> subtle than it appears.
>
> Wakeup events can be lost in at least three different ways:
>
> � � 1. A hardware signal (such as an IRQ) gets ignored.
>
> � � 2. The hardware event occurs, but without effect since the
> � � � �kernel thread that would handle the event has been frozen.
> � � � �The event just ends up sitting in a queue somewhere until
> � � � �something else wakes up the system.
>
> � � 3. The hardware event occurs and the kernel handles it fully,
> � � � �but the event propagates to userspace for further handling
> � � � �and the user program is already frozen.
>
> 1 is a hardware configuration failure (for example, it might happen as
> a result of using edge-triggered IRQs instead of level-triggered) and
> is outside the scope of this discussion.
>
> 2 generally represents a failure of the core PM subsystem, or a failure
> of some other part of the kernel to use the PM core correctly. �In
> theory we should be able to fix such mistakes. �Right now I'm aware of
> at least one possible failure scenario that could be fixed fairly
> easily.
>
> 3 is the type of failure that suspend blockers were really meant to
> handle, particularly the userspace suspend-blocker API.
>
> IMO, we should strive to fix the existing type-2 failure modes.
> However it is worth pointing out that they are basically separate from
> the suspend-blocker mechanism.
>
> And it might be a good idea to point out somewhere in the patch
> descriptions that suspend blockers are really meant to handle type-3
> wakeup losses.
>

I don't see a big difference between 2 and 3. You can use suspend
blockers to handle either.

--
Arve Hj�nnev�g
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on
On Mon, 24 May 2010, Arve Hj�nnev�g wrote:

> > Wakeup events can be lost in at least three different ways:
> >
> > � � 1. A hardware signal (such as an IRQ) gets ignored.
> >
> > � � 2. The hardware event occurs, but without effect since the
> > � � � �kernel thread that would handle the event has been frozen.
> > � � � �The event just ends up sitting in a queue somewhere until
> > � � � �something else wakes up the system.
> >
> > � � 3. The hardware event occurs and the kernel handles it fully,
> > � � � �but the event propagates to userspace for further handling
> > � � � �and the user program is already frozen.
> >
> > 1 is a hardware configuration failure (for example, it might happen as
> > a result of using edge-triggered IRQs instead of level-triggered) and
> > is outside the scope of this discussion.
> >
> > 2 generally represents a failure of the core PM subsystem, or a failure
> > of some other part of the kernel to use the PM core correctly. �In
> > theory we should be able to fix such mistakes. �Right now I'm aware of
> > at least one possible failure scenario that could be fixed fairly
> > easily.
> >
> > 3 is the type of failure that suspend blockers were really meant to
> > handle, particularly the userspace suspend-blocker API.

> I don't see a big difference between 2 and 3. You can use suspend
> blockers to handle either.

You can, but they aren't necessary. If 2 were the only reason for
suspend blockers, I would say they shouldn't be merged.

Whereas 3, on the other hand, can _not_ be handled by any existing
mechanism. 3 is perhaps the most important reason for using suspend
blockers.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dmitry Torokhov on
On Mon, May 24, 2010 at 09:34:54PM -0400, Alan Stern wrote:
> On Mon, 24 May 2010, Arve Hj�nnev�g wrote:
>
> > > Wakeup events can be lost in at least three different ways:
> > >
> > > � � 1. A hardware signal (such as an IRQ) gets ignored.
> > >
> > > � � 2. The hardware event occurs, but without effect since the
> > > � � � �kernel thread that would handle the event has been frozen.
> > > � � � �The event just ends up sitting in a queue somewhere until
> > > � � � �something else wakes up the system.
> > >
> > > � � 3. The hardware event occurs and the kernel handles it fully,
> > > � � � �but the event propagates to userspace for further handling
> > > � � � �and the user program is already frozen.
> > >
> > > 1 is a hardware configuration failure (for example, it might happen as
> > > a result of using edge-triggered IRQs instead of level-triggered) and
> > > is outside the scope of this discussion.
> > >
> > > 2 generally represents a failure of the core PM subsystem, or a failure
> > > of some other part of the kernel to use the PM core correctly. �In
> > > theory we should be able to fix such mistakes. �Right now I'm aware of
> > > at least one possible failure scenario that could be fixed fairly
> > > easily.
> > >
> > > 3 is the type of failure that suspend blockers were really meant to
> > > handle, particularly the userspace suspend-blocker API.
>
> > I don't see a big difference between 2 and 3. You can use suspend
> > blockers to handle either.
>
> You can, but they aren't necessary. If 2 were the only reason for
> suspend blockers, I would say they shouldn't be merged.
>
> Whereas 3, on the other hand, can _not_ be handled by any existing
> mechanism. 3 is perhaps the most important reason for using suspend
> blockers.
>

I do not see why 3 has to be implemented using suspend blockers either.
If you are concerned that event gets stuck somewhere in the stack make
sure that devices in the stack do not suspend while their queue is not
empty. This way if you try opportunistic suspend it will keep failing
until you drained all important queues.

--
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on
On Tue, 25 May 2010, Dmitry Torokhov wrote:

> > > I don't see a big difference between 2 and 3. You can use suspend
> > > blockers to handle either.
> >
> > You can, but they aren't necessary. If 2 were the only reason for
> > suspend blockers, I would say they shouldn't be merged.
> >
> > Whereas 3, on the other hand, can _not_ be handled by any existing
> > mechanism. 3 is perhaps the most important reason for using suspend
> > blockers.
> >
>
> I do not see why 3 has to be implemented using suspend blockers either.
> If you are concerned that event gets stuck somewhere in the stack make
> sure that devices in the stack do not suspend while their queue is not
> empty. This way if you try opportunistic suspend it will keep failing
> until you drained all important queues.

Here's the scenario:

The system is awake, and the user presses a key. The keyboard driver
processes the keystroke and puts it in an input queue. A user process
reads it from the event queue, thereby emptying the queue.

At that moment, the system decides to go into opportunistic suspend.
Since the input queue is empty, there's nothing to stop it. As the
first step, userspace is frozen -- before the process has a chance to
do anything with the keystroke it just read. As a result, the system
stays asleep until something else wakes it up, even though the
keystroke was important and should have prevented it from sleeping.

Suspend blockers protect against this scenario. Here's how:

The user process doesn't read the input queue directly; instead it
does a select or poll. When it sees there is data in the queue, it
first acquires a suspend blocker and then reads the data.

Now the system _can't_ go into opportunistic suspend, because a suspend
blocker is active. The user process can do whatever it wants with the
keystroke. When it is finished, it releases the suspend blocker and
loops back to the select/poll call.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/