From: James Bottomley on
On Fri, 2010-08-13 at 19:19 +0300, Felipe Contreras wrote:
> On Fri, Aug 13, 2010 at 6:57 PM, Dominik Brodowski
> <linux(a)dominikbrodowski.net> wrote:
> >> >> Not Ubuntu, not Fedora, not MeeGo, not anyone with a typical
> >> >> user-space seems to be having this problem. I can argue to you that
> >> >> this problem can be solved in easier ways, but instead I will argue
> >> >> that perhaps we should wait for somebody besides Android to complain
> >> >> about it before providing a "solution". Because after all, what good
> >> >> is a "solution" provided by the kernel, if the user-space is not going
> >> >> to use it, ever.
> >> >
> >> > At this point in the discussion, I am quite prepared to believe that you
> >> > will avoid using suspend blockers, and that you will further do everything
> >> > in your power to prevent anyone else from using suspend blockers. ;-)
> >>
> >> I'm not tying anybody's hands.
> >>
> >> How are people using real-time linux if it's not on mainline? Well,
> >> duuh, you apply the patches. If say Fedora was interested on it, they
> >> could apply the patches, and see for themselves. People do that all
> >> the time, with the mm tree, with Con Koliva's patches, etc. Once
> >> people are happy with the results, things get merged. Why should this
> >> be any different?
> >
> > Because millions of users are happy -- with Android, including suspend
> > blockers.
>
> I explicitly said somebody besides Android, specifically, somebody
> with a typical linux ecosystem. You are not addressing the argument at
> hand, that nobody else wants to tackle the issue this way, thus only
> making the discussion more difficult.

Can we stop arguing about the pointless?

The facts are that suspend blockers identifies a race within our suspend
to ram system that permeates from top to bottom (that's from server to
mobile). The problem is that resume events are racy with respect to
suspend and vice versa. This manifests itself most annoyingly on my
laptop in the "double suspend" case: where I suspend with a pending
suspend event, my laptop will resume and then immediately re-suspend
(leading me to kick myself and remind myself to check it stayed up
before pushing unsuspend and walking away). The other annoying case is
that if I accidentally close the lid before presenting, I have to wait
until the system is fully down before pressing resume.

In a Data Centre controlling power, if you sent a suspend then a wake on
lan, there's a window where the machine will still be down (because the
wol got ignored).

There are easy fixes to all the above ... I should wait to verify
suspend and resume in my laptop and I have to accept the wait time
between the two. In the data centre, you just repeat your power control
commands a few times with about 5s between them and so on.

The simple hacky work arounds mean that a user space invasive solution
like suspend blockers is a bit of a non starter as a solution to the
general case. However, it has shown that we do have a problem and
furthermore it's a problem encountered by more than android.

The technical problem with suspend blockers is that they're a solution
to a general problem that only works for a specific case. What we're
searching for is a general solution that can also be used in the android
specific case.

So far, we have three possibilities:

1. Stubs with deprecation - this has been rejected by android, so
looks like a non starter.
2. update pm_qos so that the suspend blocks become qos constraints.
This may or may not be coupled with a user space suspend
manager, but in the latter case it's essentially full suspend
blockers (with the additional opportunistic suspend kernel code)
but with information systems outside of android can use.
3. Rafael's patch that makes it possible to avoid the races between
wakeup and suspend. This requires a user space suspend manager

(There's a whole other load of implementation details like stats and the
like, but the above is the concept view).

Unless anyone has something substantive to add to either the problem
space or the solution space, the android discussion piece of this thread
has degenerated to pure noise.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ted Ts'o on
On Fri, Aug 13, 2010 at 01:11:29PM -0400, James Bottomley wrote:
>
> The facts are that suspend blockers identifies a race within our suspend
> to ram system that permeates from top to bottom (that's from server to
> mobile). The problem is that resume events are racy with respect to
> suspend and vice versa. This manifests itself most annoyingly on my
> laptop in the "double suspend" case: where I suspend with a pending
> suspend event, my laptop will resume and then immediately re-suspend
> (leading me to kick myself and remind myself to check it stayed up
> before pushing unsuspend and walking away). The other annoying case is
> that if I accidentally close the lid before presenting, I have to wait
> until the system is fully down before pressing resume.

This is all true, but it's also only one aspect of the problem. I
agree with you that this is the part of the problem which affects
Linux at all scales, from Cloud servers in a data center that want to
suspend themselves when there's no work to do (and then fail to
respond to the WOL packet) to mobile platforms that are suspending
much more frequently.

However, it doesn't follow that this is the _only_ problem that the
Android folks might be interested in solving. Opportunistic suspend
is a different part of the problem space, which is generally believed
by the Android developers as being far more efficient than a
user-space suspend manager. Rafael has stated his complete
unwillingness to deal with this part of the problem. OK, so that
probably means that for Android, it will have to be an out-of-tree
kernel patch.

The question, then, is whether a solution which addresses the only
part of the problem which Rafael is interested in dealing with at this
point, is sufficient such that (a) the kernel-level opportunistic
suspend can be done as an out-of-tree patch, while simultaneously (b)
allowing device drivers for Android devices can utilize Rafael's
interfaces to solve the race design bug currently found in our suspend
subsystem, while (c) requiring minimal changes to the Android
userspace, and (d) providing all of the statistics and debugging
functionality required by the Android userspace.

If we can engineer a solution which meets (a), (b), (c), and (d)
above, then everyone will be happy.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Brian Swetland on
On Fri, Aug 13, 2010 at 12:08 PM, Ted Ts'o <tytso(a)mit.edu> wrote:
> On Fri, Aug 13, 2010 at 01:11:29PM -0400, James Bottomley wrote:
>>
>> The facts are that suspend blockers identifies a race within our suspend
>> to ram system that permeates from top to bottom (that's from server to
>> mobile).  The problem is that resume events are racy with respect to
>> suspend and vice versa.  This manifests itself most annoyingly on my
>> laptop in the "double suspend" case: where I suspend with a pending
>> suspend event, my laptop will resume and then immediately re-suspend
>> (leading me to kick myself and remind myself to check it stayed up
>> before pushing unsuspend and walking away).  The other annoying case is
>> that if I accidentally close the lid before presenting, I have to wait
>> until the system is fully down before pressing resume.
>
> This is all true, but it's also only one aspect of the problem.  I
> agree with you that this is the part of the problem which affects
> Linux at all scales, from Cloud servers in a data center that want to
> suspend themselves when there's no work to do (and then fail to
> respond to the WOL packet) to mobile platforms that are suspending
> much more frequently.
>
> However, it doesn't follow that this is the _only_ problem that the
> Android folks might be interested in solving.  Opportunistic suspend
> is a different part of the problem space, which is generally believed
> by the Android developers as being far more efficient than a
> user-space suspend manager.  Rafael has stated his complete
> unwillingness to deal with this part of the problem.  OK, so that
> probably means that for Android, it will have to be an out-of-tree
> kernel patch.
>
> The question, then, is whether a solution which addresses the only
> part of the problem which Rafael is interested in dealing with at this
> point, is sufficient such that (a) the kernel-level opportunistic
> suspend can be done as an out-of-tree patch, while simultaneously (b)
> allowing device drivers for Android devices can utilize Rafael's
> interfaces to solve the race design bug currently found in our suspend
> subsystem, while (c) requiring minimal changes to the Android
> userspace, and (d) providing all of the statistics and debugging
> functionality required by the Android userspace.
>
> If we can engineer a solution which meets (a), (b), (c), and (d)
> above, then everyone will be happy.

Arve's suspend blockers patch stack actually separates the core
functionality (ability for drivers to delay suspend while doing work
suspend would interfere with), from the ability to hold suspend
blockers from userspace (a separate, smaller patch building on the
core functionality).

Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: James Bottomley on
On Fri, 2010-08-13 at 15:08 -0400, Ted Ts'o wrote:
> On Fri, Aug 13, 2010 at 01:11:29PM -0400, James Bottomley wrote:
> >
> > The facts are that suspend blockers identifies a race within our suspend
> > to ram system that permeates from top to bottom (that's from server to
> > mobile). The problem is that resume events are racy with respect to
> > suspend and vice versa. This manifests itself most annoyingly on my
> > laptop in the "double suspend" case: where I suspend with a pending
> > suspend event, my laptop will resume and then immediately re-suspend
> > (leading me to kick myself and remind myself to check it stayed up
> > before pushing unsuspend and walking away). The other annoying case is
> > that if I accidentally close the lid before presenting, I have to wait
> > until the system is fully down before pressing resume.
>
> This is all true, but it's also only one aspect of the problem. I
> agree with you that this is the part of the problem which affects
> Linux at all scales, from Cloud servers in a data center that want to
> suspend themselves when there's no work to do (and then fail to
> respond to the WOL packet) to mobile platforms that are suspending
> much more frequently.
>
> However, it doesn't follow that this is the _only_ problem that the
> Android folks might be interested in solving. Opportunistic suspend
> is a different part of the problem space, which is generally believed
> by the Android developers as being far more efficient than a
> user-space suspend manager. Rafael has stated his complete
> unwillingness to deal with this part of the problem. OK, so that
> probably means that for Android, it will have to be an out-of-tree
> kernel patch.

OK, so I tried desperately to avoid the question of whether
opportunistic suspend is a good way of managing power. However, it
seems to me that it is in use by several systems (android, olpc, etc).
I'll defer the question of whether it's better in user space or kernel
space to Rafael's investigations ... but I will point out that the
kernel space patch, once the suspend blockers issue is taken care of
looks like a single patch to one file, so should be locally containable
and should allow upstream to be useful as the driver base again.

> The question, then, is whether a solution which addresses the only
> part of the problem which Rafael is interested in dealing with at this
> point, is sufficient such that (a) the kernel-level opportunistic
> suspend can be done as an out-of-tree patch, while simultaneously (b)
> allowing device drivers for Android devices can utilize Rafael's
> interfaces to solve the race design bug currently found in our suspend
> subsystem, while (c) requiring minimal changes to the Android
> userspace, and (d) providing all of the statistics and debugging
> functionality required by the Android userspace.
>
> If we can engineer a solution which meets (a), (b), (c), and (d)
> above, then everyone will be happy.

That's my goal.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/