From: Rafael J. Wysocki on
On Tuesday 01 June 2010, Neil Brown wrote:
> On Tue, 1 Jun 2010 00:05:19 +0200
> "Rafael J. Wysocki" <rjw(a)sisk.pl> wrote:
>
> > On Monday 31 May 2010, Neil Brown wrote:
> > > On Thu, 27 May 2010 23:40:29 +0200 (CEST)
> > > Thomas Gleixner <tglx(a)linutronix.de> wrote:
> > >
> > > > On Thu, 27 May 2010, Rafael J. Wysocki wrote:
> > > >
> > > > > On Thursday 27 May 2010, Thomas Gleixner wrote:
> > > > > > On Thu, 27 May 2010, Alan Stern wrote:
> > > > > >
> > > > > > > On Thu, 27 May 2010, Felipe Balbi wrote:
> > > > > > >
> > > > > > > > On Thu, May 27, 2010 at 05:06:23PM +0200, ext Alan Stern wrote:
> > > > > > > > >If people don't mind, here is a greatly simplified summary of the
> > > > > > > > >comments and objections I have seen so far on this thread:
> > > > > > > > >
> > > > > > > > > The in-kernel suspend blocker implementation is okay, even
> > > > > > > > > beneficial.
> > > > > > > >
> > > > > > > > I disagree here. I believe expressing that as QoS is much better. Let
> > > > > > > > the kernel decide which power state is better as long as I can say I
> > > > > > > > need 100us IRQ latency or 100ms wakeup latency.
> > > > > > >
> > > > > > > Does this mean you believe "echo mem >/sys/power/state" is bad and
> > > > > > > should be removed? Or "echo disk >/sys/power/state"? They pay no
> > > > > >
> > > > > > mem should be replaced by an idle suspend to ram mechanism
> > > > >
> > > > > Well, what about when I want the machine to suspend _regardless_ of whether
> > > > > or not it's idle at the moment? That actually happens quite often to me. :-)
> > > >
> > > > Fair enough. Let's agree on a non ambigous terminology then:
> > > >
> > > > forced:
> > > >
> > > > suspend which you enforce via user interaction, which
> > > > also implies that you risk losing wakeups depending on
> > > > the hardware properties
> > >
> > > Reasonable definition I think. However the current implementation doesn't
> > > exactly match it.
> > > With the current implementation you risk losing wakeups *independent* of the
> > > hardware properties.
> >
> > Define "losing", please.
>
> I did. See next line in my original.
> "... by which I mean that they will not be seen until some other event
> effects a wake-up".

OK, sorry.

> By "seen" I mean "a user-space process has had a chance
> to react to the event, including having the opportunity to abort the suspend
> (or ensure an immediate wake-up)".
> Another way of saying it might be that the event - as an abstract concept -
> does not reach it's final destination promptly. This "final destination" may
> be well outside the kernel.
>
> > Currently, we simply don't regard hardware signals occuring _during_ the
> > suspend operation itself as wakeups (unless they are wakeup interrupts to be
> > precise, because these _are_ taken into account by our current code).
> >
> > The reason is that the meaning of given event may be _different_ at run time
> > and after the system has been suspended. For example, consider a power button
> > on a PC box. If it's pressed at run time, it usually means "power off the
> > system" to the kernel. After the system has been suspended, however, it means
> > "wake up". So, you have to switch from one interpretation of the event to the
> > other and that's not an atomic operaition (to put it lightly).
>
> Yes, a suspend-toggle switch is inherently racy.

For this reason we generally have to assume that some events occuring during
suspend will only be seen by user space after resume. Now, since we make
such an assumption anyway, there's a little point working around some races
related to it while leaving the others as they are (that wouldn't improve
things all that much).

> It is only wake-up sources that are not inherently racy that are interesting.
> e.g. a serial line from a GSM device which reports "You have an SMS message".
> I want to be able to turn my freerunner upside-down by which I tell it (via
> the accelerometers) that I am done and want it to turn off. If a TXT message
> comes in just then, I don't want it to suspend, I want it to make an alert
> noise.
> I can put code in to ignore the accelerometer if a txt has just recently come
> in, but if the TXT arrives just as the write to /sys/power/state starts, the
> UART interrupt handler could have completed before it has the PRE_SUSPEND
> method called. So the suspend will complete and the wakeup from the UART
> will have been "lost" in that the event didn't get all the way to its
> destination: my ear.

As I said before, we generally can't prevent such things from happening,
because even if we handle the particular race described above, it still is
possible that the event will be "lost" if it arrives just a bit later (eg.
during a suspend-toggle switch). So the PRE_SUSPEND thing won't really
solve the entire problem while increasing complexity.

> My freerunner has a single core so without CONFIG_PREEMPT it may be that
> there is no actual race-window - maybe the PRE_SUSPENDs will all run before a
> soft_irq thread has a chance to finish handling of the interrupt (my
> knowledge of these details is limits). But on a muilti-core device I think
> there would definitely be a race-window.

Yes, there always will be a race window. The only thing we can do is to
narrow it, but we cannot really close it (at least not on a PC, but I'm not
really sure it can be closed at all).

If you really want _all_ events to be delivered timely, the only way to go is
to avoid using suspend (and use the idle framework for power management).

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Gleixner on
On Tue, 1 Jun 2010, Rafael J. Wysocki wrote:
> On Tuesday 01 June 2010, Neil Brown wrote:
> > My freerunner has a single core so without CONFIG_PREEMPT it may be that
> > there is no actual race-window - maybe the PRE_SUSPENDs will all run before a
> > soft_irq thread has a chance to finish handling of the interrupt (my
> > knowledge of these details is limits). But on a muilti-core device I think
> > there would definitely be a race-window.
>
> Yes, there always will be a race window. The only thing we can do is to
> narrow it, but we cannot really close it (at least not on a PC, but I'm not
> really sure it can be closed at all).

It can be closed, when the state transition from normal event delivery
to wakeup mode is state safe, which it is on most platforms which are
designed for the mobile space.

Not so the current PC style x86 platforms, which are not relevant for
the problem at hand at all. Really, that stuff is going either to gain
sane properties or it's just going into the irrelevant realm.

Any attempt to solve the current x86/ACPI/BIOS/mess is waste of time
and is inevitably going to prevent progress.

> If you really want _all_ events to be delivered timely, the only way to go is
> to avoid using suspend (and use the idle framework for power management).

Amen.

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Neil Brown on
On Tue, 1 Jun 2010 02:32:20 +0200
"Rafael J. Wysocki" <rjw(a)sisk.pl> wrote:

> On Tuesday 01 June 2010, Neil Brown wrote:
> > I want to be able to turn my freerunner upside-down by which I tell it (via
> > the accelerometers) that I am done and want it to turn off. If a TXT message
> > comes in just then, I don't want it to suspend, I want it to make an alert
> > noise.
> > I can put code in to ignore the accelerometer if a txt has just recently come
> > in, but if the TXT arrives just as the write to /sys/power/state starts, the
> > UART interrupt handler could have completed before it has the PRE_SUSPEND
> > method called. So the suspend will complete and the wakeup from the UART
> > will have been "lost" in that the event didn't get all the way to its
> > destination: my ear.
>
> As I said before, we generally can't prevent such things from happening,
> because even if we handle the particular race described above, it still is
> possible that the event will be "lost" if it arrives just a bit later (eg.
> during a suspend-toggle switch). So the PRE_SUSPEND thing won't really
> solve the entire problem while increasing complexity.
>
> > My freerunner has a single core so without CONFIG_PREEMPT it may be that
> > there is no actual race-window - maybe the PRE_SUSPENDs will all run before a
> > soft_irq thread has a chance to finish handling of the interrupt (my
> > knowledge of these details is limits). But on a muilti-core device I think
> > there would definitely be a race-window.
>
> Yes, there always will be a race window. The only thing we can do is to
> narrow it, but we cannot really close it (at least not on a PC, but I'm not
> really sure it can be closed at all).

Well you are the expert so I assume you are right, but I would really like to
understand why this is.

I guess that if the event was delivered as an edge-triggered interrupt which
the interrupt controller couldn't latch, then it might get lost. Is that
what happens?
But if the event comes in as a level-triggered (or latched) interrupt, then
the driver simply chooses not to acknowledge the interrupt after PRE_SUSPEND
until RESUME. Then any suspend would immediately be woken. Maybe the window
for ignoring interrupt would have to be a bit smaller than that, but it
should be a well defined window that can be locked.
Why cannot we carry this sort of guarantee all the way up to user-space and
beyond? Am I completely misunderstanding the hardware?

And if you are right that the race window cannot be closed, then the whole
suspend-blocker infrastructure is pointless as the purpose of it is simply to
close that window. If it really does not and cannot work, then it would be
best to reject it for that reason rather than for less concrete aesthetic
arguments.
But presumably it does work in practice on Android hardware so ..... confused.

Having just seen the email from Thomas, maybe you mean that it cannot be
closed on devices using ACPI, but can on other devices. I can sort-of
imagine how that would be the case (I tried reading an ACPI spec once - my
hat is of to those of you who understand it).
That shouldn't prevent us from closing the race window on "sane" hardware
that allows it. This would, I think, be sufficient for Android's needs.

I'm hoping we can get agreement on:
- there is a race with suspend
- it can be closed on the sort of hardware that is typically used in the
mobile space
- closing it would address the need which lead to the suspend-block
proposal.

If we have agreement on that, we can move on to
- should we close the race? (hopefully "yes" because bugs should be fixed).
- how should we close the race? (lots of room for exploration there).


Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Gleixner on
On Tue, 1 Jun 2010, Neil Brown wrote:
> And if you are right that the race window cannot be closed, then the whole
> suspend-blocker infrastructure is pointless as the purpose of it is simply to
> close that window. If it really does not and cannot work, then it would be
> best to reject it for that reason rather than for less concrete aesthetic
> arguments.
> But presumably it does work in practice on Android hardware so ..... confused.
>
> Having just seen the email from Thomas, maybe you mean that it cannot be
> closed on devices using ACPI, but can on other devices. I can sort-of
> imagine how that would be the case (I tried reading an ACPI spec once - my
> hat is of to those of you who understand it).
> That shouldn't prevent us from closing the race window on "sane" hardware
> that allows it. This would, I think, be sufficient for Android's needs.
>
> I'm hoping we can get agreement on:
> - there is a race with suspend

That's a matter of how you define "suspend".

If "suspend" is another deep idle state and the hardware is sane,
there is no race at all - assumed that the driver/platform developer
got it right. It's not rocket science to transition from "normal" irq
delivery to wakeup based delivery raceless (except for PC style x86
hardware of today)

If "suspend" is the thing we are used to via /sys/power/state then the
race will persist forever except for the suspend blocker workaround,
which we can express in QoS terms as well w/o adding another suspend
related user space API.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on
On Tue, 1 Jun 2010, Neil Brown wrote:

> > As I said before, we generally can't prevent such things from happening,
> > because even if we handle the particular race described above, it still is
> > possible that the event will be "lost" if it arrives just a bit later (eg.
> > during a suspend-toggle switch). So the PRE_SUSPEND thing won't really
> > solve the entire problem while increasing complexity.
> >
> > > My freerunner has a single core so without CONFIG_PREEMPT it may be that
> > > there is no actual race-window - maybe the PRE_SUSPENDs will all run before a
> > > soft_irq thread has a chance to finish handling of the interrupt (my
> > > knowledge of these details is limits). But on a muilti-core device I think
> > > there would definitely be a race-window.
> >
> > Yes, there always will be a race window. The only thing we can do is to
> > narrow it, but we cannot really close it (at least not on a PC, but I'm not
> > really sure it can be closed at all).
>
> Well you are the expert so I assume you are right, but I would really like to
> understand why this is.
>
> I guess that if the event was delivered as an edge-triggered interrupt which
> the interrupt controller couldn't latch, then it might get lost. Is that
> what happens?

You're barking up the wrong tree. If I understand correctly, Rafael is
saying that there's a race involving events which are not _wakeup_
events. If a non-wakeup event arrives shortly before a suspend, it can
have its normal effect. If it arrives while a suspend is in progress,
its delivery may be delayed until the system resumes. And if it
arrives after the system is fully suspended, it may never be noticed at
all.

With wakeup events the problem isn't so bad. Wakeup events are always
noticed, and if the system is designed properly they will either abort
a suspend-in-progress or else cause the system to resume as soon as the
suspend is complete. (Linux is not yet properly designed in this
sense.)

Or maybe I'm misunderstanding also, and Rafael is saying that there's
a race involving events whose meaning changes depending on whether or
not the system is asleep. This is obviously true and clearly
unavoidable.

> And if you are right that the race window cannot be closed, then the whole
> suspend-blocker infrastructure is pointless as the purpose of it is simply to
> close that window. If it really does not and cannot work, then it would be
> best to reject it for that reason rather than for less concrete aesthetic
> arguments.
> But presumably it does work in practice on Android hardware so ..... confused.

The point you're missing is that Android works with regard to wakeup
events. It doesn't necessarily always receive non-wakeup events
(although I don't know how Android classifies events -- maybe
everything is a wakeup event for them).

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/