From: Mark Brown on
On Thu, Aug 05, 2010 at 06:29:27PM -0700, Arve Hj?nnev?g wrote:

> sleep_time, total time the wake lock has been active when the screen was off.

This one seems a little odd, and won't make sense on some devices like
those with E Ink displays or those which don't have screens at all. I
guess it's really focused on metering when the user is not actively
interacting with the device?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Paul E. McKenney on
On Thu, Aug 05, 2010 at 06:29:27PM -0700, Arve Hj�nnev�g wrote:
> 2010/8/5 Rafael J. Wysocki <rjw(a)sisk.pl>:
> > On Friday, August 06, 2010, Arve Hj�nnev�g wrote:
> >> 2010/8/5 Rafael J. Wysocki <rjw(a)sisk.pl>:
> >> > On Thursday, August 05, 2010, Arve Hj�nnev�g wrote:
> >> >> 2010/8/4 Rafael J. Wysocki <rjw(a)sisk.pl>:
> >> >> > On Thursday, August 05, 2010, Arve Hj�nnev�g wrote:
> >> >> >> On Wed, Aug 4, 2010 at 1:56 PM, Matthew Garrett <mjg59(a)srcf.ucam.org> wrote:
> >> >> >> > On Wed, Aug 04, 2010 at 10:51:07PM +0200, Rafael J. Wysocki wrote:
> >> >> >> >> On Wednesday, August 04, 2010, Matthew Garrett wrote:
> >> >> >> >> > No! And that's precisely the issue. Android's existing behaviour could
> >> >> >> >> > be entirely implemented in the form of binary that manually triggers
> >> >> >> >> > suspend when (a) the screen is off and (b) no userspace applications
> >> >> >> >> > have indicated that the system shouldn't sleep, except for the wakeup
> >> >> >> >> > event race. Imagine the following:
> >> >> >> >> >
> >> >> >> >> > 1) The policy timeout is about to expire. No applications are holding
> >> >> >> >> > wakelocks. The system will suspend providing nothing takes a wakelock.
> >> >> >> >> > 2) A network packet arrives indicating an incoming SIP call
> >> >> >> >> > 3) The VOIP application takes a wakelock and prevents the phone from
> >> >> >> >> > suspending while the call is in progress
> >> >> >> >> >
> >> >> >> >> > What stops the system going to sleep between (2) and (3)? cgroups don't,
> >> >> >> >> > because the voip app is an otherwise untrusted application that you've
> >> >> >> >> > just told the scheduler to ignore.
> >> >> >> >>
> >> >> >> >> I _think_ you can use the just-merged /sys/power/wakeup_count mechanism to
> >> >> >> >> avoid the race (if pm_wakeup_event() is called at 2)).
> >> >> >> >
> >> >> >> > Yes, I think that solves the problem. The only question then is whether
> >> >> >>
> >> >> >> How? By passing a timeout to pm_wakeup_event when the network driver
> >> >> >> gets the packet or by passing 0. If you pass a timeout it is the same
> >> >> >> as using a wakelock with a timeout and should work (assuming the
> >> >> >> timeout you picked is long enough). If you don't pass a timeout it
> >> >> >> does not work, since the packet may not be visible to user-space yet.
> >> >> >
> >> >> > Alternatively, pm_stay_awake() / pm_relax() can be used.
> >> >> >
> >> >>
> >> >> Which makes the driver and/or network stack changes identical to using
> >> >> wakelocks, right?
> >> >
> >> > Please refer to the Matthew's response.
> >> >
> >> >> >> > it's preferable to use cgroups or suspend fully, which is pretty much up
> >> >> >> > to the implementation. In other words, is there a reason we're still
> >> >> >>
> >> >> >> I have seen no proposed way to use cgroups that will work. If you
> >> >> >> leave some processes running while other processes are frozen you run
> >> >> >> into problems when a frozen process holds a resource that a running
> >> >> >> process needs.
> >> >> >>
> >> >> >>
> >> >> >> > having this conversation? :) It'd be good to have some feedback from
> >> >> >> > Google as to whether this satisfies their functional requirements.
> >> >> >> >
> >> >> >>
> >> >> >> That is "this"? The merged code? If so, no it does not satisfy our
> >> >> >> requirements. The in kernel api, while offering similar functionality
> >> >> >> to the wakelock interface, does not use any handles which makes it
> >> >> >> impossible to get reasonable stats (You don't know which pm_stay_awake
> >> >> >> request pm_relax is reverting).
> >> >> >
> >> >> > Why is that a problem (out of curiosity)?
> >> >> >
> >> >>
> >> >> Not having stats or not knowing what pm_relax is undoing? We need
> >> >> stats to be able to debug the system.
> >> >
> >> > You have the stats in struct device and they are available via sysfs.
> >> > I suppose they are insufficient, but I'd like to know why exactly.
> >> >
> >>
> >> Our wakelock stats currently have
> >> (name,)count,expire_count,wake_count,active_since,total_time,sleep_time,max_time
> >> and last_change. Not all of these are equally important (total_time is
> >> most important followed by active_since), but you only have count.
> >> Also as discussed before, many wakelocks/suspendblockers are not
> >> associated with a struct device.
> >
> > OK
> >
> > How much of that is used in practice and what for exactly?
>
> count, tells you how many times the wakelock was activated. If a
> wakelock prevented suspend for a long time a large count tells you it
> handled a lot of events while a small count tells you it took a long
> time to process the events, or the wakelock was not released properly.

Got it!

> expire_count, tells you how many times the timeout expired. For the
> input event wakelock in the android kernel (which has a timeout) an
> expire count that matches the count tells you that someone opened an
> input device but is not reading from it (this has happened several
> times).

Ah, I hadn't considered that an power-oblivious application could take
this approach to waste power. I now better understand the purpose of
the timeouts, and have added the corresponding requirement. ;-)

> wake_count, tells you that this is the first wakelock that was
> acquired in the resume path. This is currently less useful than I
> would like on the Nexus One since it is usually "SMD_RPCCALL" which
> does not tell me a lot.

Got it!

> active_since, tells you how long a a still active wakelock has been
> active. If someone activated a wakelock and never released it, it will
> be obvious here.

This one is zero if the wakelock is not currently held, correct?

> total_time, total time the wake lock has been active. This one should
> be obvious.

Got it!

> sleep_time, total time the wake lock has been active when the screen was off.

Hmmm... The statistics for apps are collected in userspace, correct?
If so, this would be collecting the total time that suspend blockers
that were acquired by some driver have been held while the screen was
powered off. So if things were working properly, sleep_time should be
quite small -- with the exception of the suspend blocker that the
userspace daemon was using as a proxy for all of the userspace suspend
blockers.

Or am I confused about this?

> max_time, longest time the wakelock was active uninterrupted. This
> used less often, but the battery on a device was draining fast, but
> the problem went away before looking at the stats this will show if a
> wakelock was active for a long time.

Got it!

Thank you for this additional info, it really shines some light on
what you are trying to do.

One question: are these statistics enabled in production devices, are
is this a debug-only facility?

> > Do you _really_ have to debug the wakelocks in drivers that much?
> >
>
> Wake locks in drivers sometimes need to be debugged. If the api has no
> accountability, then these problems would take forever to fix.

So there is a separate set of suspend-blocker statistics collected in
userspace, correct? (The kernel probably doesn't need to care about this,
and I don't believe that I need anything more than a "yes" or "no" answer,
but just want to make sure I understand how these statistics are used.)

> >> >> If the system does not suspend
> >> >> at all or is awake for too long, the wakelock stats tells us which
> >> >> component is at fault. Since pm_stay_awake and pm_relax does not
> >> >> operate on a handle, you cannot determine how long it prevented
> >> >> suspend for.
> >> >
> >> > Well, if you need that, you can add a counter of "completed events" into
> >>
> >> We need more than that (see above).
> >>
> >> > struct dev_pm_info and a function similar to pm_relax() that
> >> > will update that counter. �I don't think anyone will object to that change.
> >> >
> >>
> >> What about adding a handle that is passed to all three functions?

The reason you want the handle is to allow the calling code to indicate
which calling points are dealing with the same suspend-blocker entity?

Thaxn, Paul

> > I don't think that will fly at this point.
> >
>
> Why not? I think allowing drivers to modify a global reference count
> with no accountability is a terrible idea.
>
> >> >> >> The proposed in user-space interface
> >> >> >> of calling into every process that receives wakeup events before every
> >> >> >> suspend call
> >> >> >
> >> >> > Well, �you don't really need to do that.
> >> >> >
> >> >>
> >> >> Only if the driver blocks suspend until user-space has read the event.
> >> >> This means that for android to work we need to block suspend when
> >> >> input events are not processed, but a system using your scheme needs a
> >> >> pm_wakeup_event call when the input event is queued. How to you switch
> >> >> between them? Do we add separate ioctls in the input device to enable
> >> >> each scheme? If someone has a single threaded user space power manager
> >> >> that also reads input event it will deadlock if you block suspend
> >> >> until it reads the input events since you block when reading the wake
> >> >> count.
> >> >
> >> > Well, until someone actually tries to implement a power manager in user space
> >> > it's a bit vague.
> >> >
> >>
> >> Not having clear rules for what the drivers should do is a problem.
> >> The comments in your code seem to advocate using timeouts instead of
> >> overlapping pm_stay_awake/pm_relax sections. I find this
> >> recommendation strange given all the opposition to
> >> wakelock/suspendblocker timeouts.
> >
> > There's no recommendation either way.
>
> I'm referring to this paragraph:
>
> * Second, a wakeup event may be detected by one functional unit and processed
> * by another one. In that case the unit that has detected it cannot really
> * "close" the "no suspend" period associated with it, unless it knows in
> * advance what's going to happen to the event during processing. This
> * knowledge, however, may not be available to it, so it can simply
> specify time
> * to wait before the system can be suspended and pass it as the second
> * argument of pm_wakeup_event().
>
> >
> >> But more importantly, calling
> >> pm_wakeup_event with a timeout of 0 is incompatible with the android
> >> user space code,
> >
> > Which I don't find really relevant, sorry.
> >
> >> and I would prefer that the kernel interfaces would
> >> encourage drivers to block suspend until user space has consumed the
> >> event, which works for the android user space, instead of just long
> >> enough to work with a hypothetical user space power manager.
> >
> > Well, that are your personal preferences, which I respect. �I also have some
> > personal preferences that are not necessarily followed by the kernel code.
> >
> > Thanks,
> > Rafael
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo(a)vger.kernel.org
> > More majordomo info at �http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at �http://www.tux.org/lkml/
> >
>
>
>
> --
> Arve Hj�nnev�g
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Paul E. McKenney on
On Thu, Aug 05, 2010 at 05:29:24PM -0700, Brian Swetland wrote:
> 2010/8/5 Rafael J. Wysocki <rjw(a)sisk.pl>:
> >>
> >> Our wakelock stats currently have
> >> (name,)count,expire_count,wake_count,active_since,total_time,sleep_time,max_time
> >> and last_change. Not all of these are equally important (total_time is
> >> most important followed by active_since), but you only have count.
> >> Also as discussed before, many wakelocks/suspendblockers are not
> >> associated with a struct device.
> >
> > OK
> >
> > How much of that is used in practice and what for exactly?
> >
> > Do you _really_ have to debug the wakelocks in drivers that much?
>
> Debugging power related issues is pretty critical to building
> competitive mobile devices.
>
> Throughout the several months of this discussion I have been
> continually scratching my head at this "debugability considered
> harmful" attitude that I see in reaction to our interest in having the
> ability (gated behind a config option even -- really, that'd be fine,
> not everyone need enable it) to gather useful stats and examine the
> state of the system.

In my case, it has not been "debuggability considered harmful", but
rather my lack of understanding of the kinds of bugs that can arise and
what information is most helpful in tracking them down. Arve's post
listing the meanings of the stats helped me quite a bit, although I am
sure that my understanding is still quite rudimentary.

> At this point it sounds like there's no interest in the solution we
> have, which works and has worked for a few years, and has been revised
> 10+ times based on feedback here, and no interest in providing a
> solution that accomplishes similar functionality, so perhaps it's time
> for us to cut our losses and just go back to maintaining our patches
> instead of having the same arguments over and over again.

And this brings up another aspect of Android requirements. Mainlining
Android features is not necessarily an all-or-nothing proposition. Here
is a prototype list of degrees of mainlining, along with at least a few
of the benefits and difficulties of each level:

o Mainline that provides exactly what Android needs.

This is of course the best case, but, as you may have noticed,
might not be easy to achieve. The plain fact is that Linux
must support a wide variety of workloads, so some give-and-take
is usually required.

o Mainline that exactly matches the ideal API, but which fails
to implement some feature or another.

This is still not bad. You have to carry a specific patch to
provide the feature(s), but all the calling sites are carried
upstream in mainline.

o Mainline provides something that is close to the ideal API, but
some additional constant arguments are required.

This is not so good, but at least the drivers can be upstreamed
and the changes required to get to an Android-ready kernel
are fairly simple.

o Mainline provides something, but Android requires changes to the
supplied API which require additional data to be passed from
call site to call site.

Again, at least drivers can be upstreamed, but the changes required
to get to an Android-ready kernel are a bit more involved.

o Mainline provides a stubbed-out API.

Once again, the drivers can at least be upstreamed, but Android
is the only project validating the placement of calls to the
API. This means that changes to the drivers are likely to
mess up the placement of the call sites. Therefore, getting
to an Android-ready kernel requires careful inspection of the
drivers to adjust for bugs introduced by others.

o Mainline provides nothing.

Status quo. This means that some other group will likely
introduce the required functionality in an incremental fashion.
This process will likely fail to take Android's needs into
account, probably leading to...

o Mainline provides functionality that is similar to what
Android needs, but in a completely incompatible manner
that cannot be used easily (or perhaps at all) by Android.

My guess is that there is value to Android in a number of the non-perfect
cases.

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Paul E. McKenney on
On Fri, Aug 06, 2010 at 01:30:48PM +0100, Mark Brown wrote:
> On Thu, Aug 05, 2010 at 06:01:24PM -0700, david(a)lang.hm wrote:
> > On Thu, 5 Aug 2010, Brian Swetland wrote:
>
> >> Obviously not all clocks are stopped (the DSP and codec are powered
> >> and clocked, for example), but yeah we can clock gate and power gate
> >> the cpu and most other peripherals while audio is playing on a number
> >> of ARM SoC designs available today (and the past few years).
>
> > does this then mean that you have multiple variations of suspend?
>
> > for example, one where the audio stuff is left powered, and one where it
> > isn't?
>
> This was the core of the issue I was raising in the last thread about
> this (the one following the rename to suspend blockers). Essentially
> what happens in a mainline context is that some subsystems can with
> varying degress of optionality ignore some or all of the instruction to
> suspend and keep bits of the system alive during suspend.
>
> Those that stay alive will either have per subsystem handling or will be
> outside the direct control of the kernel entirely (the modem is a good
> example of the latter case in many systems - in terms of the software
> it's essentially a parallel computer that's sitting in the system rather
> than a perhiperal of the AP).

This underscores a basic difference between servers and these embedded
devices. When you suspend a server, it is doing nothing, because servers
rely very heavily on the CPUs. In contrast, many embedded devices can
perform useful work even when the CPUs are completely powered down.

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mark Brown on
On Fri, Aug 06, 2010 at 10:22:26AM -0700, Paul E. McKenney wrote:
> On Fri, Aug 06, 2010 at 01:30:48PM +0100, Mark Brown wrote:

> > this (the one following the rename to suspend blockers). Essentially
> > what happens in a mainline context is that some subsystems can with
> > varying degress of optionality ignore some or all of the instruction to
> > suspend and keep bits of the system alive during suspend.

> This underscores a basic difference between servers and these embedded
> devices. When you suspend a server, it is doing nothing, because servers
> rely very heavily on the CPUs. In contrast, many embedded devices can
> perform useful work even when the CPUs are completely powered down.

Well, not really from the Linux point of view. It's not massively
different to something like keeping an ethernet controller sufficiently
alive to allow it to provide wake on LAN functionality while the system
is suspended in terms of what Linux has to do, and quite a few servers
have lights out management systems which aren't a million miles away
from the modem on a phone in terms of their relationship with the host
computer.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/