From: Gross, Mark on

>-----Original Message-----
>From: Arve Hj�nnev�g [mailto:arve(a)android.com]
>Sent: Tuesday, June 01, 2010 8:15 PM
>To: Gross, Mark
>Cc: James Bottomley; Rafael J. Wysocki; Matthew Garrett; Thomas Gleixner;
>Peter Zijlstra; tytso(a)mit.edu; LKML; Florian Mickler; Linux PM; Linux OMAP
>Mailing List; felipe.balbi(a)nokia.com; Alan Cox; Alan Stern; Neil Brown
>Subject: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)
>
>2010/6/1 Gross, Mark <mark.gross(a)intel.com>:
>...
>>>4. It would be useful to change pm_qos_add_request to not allocate
>>>anything so can add constraints from init functions that currently
>>>cannot fail.
>> [mtg: ] I'm not sure how to do this but I agree it would be good. �I
>guess we could have a block of pm_qos requests pre-allocated statically and
>re-use them. �In practice there will not be more than a handful of requests
>ever. �Dynamic allocation does seem like a bit of a waste.
>
>The calling code will have to store a pointer to your structure
>anyway, you may as well have them provide the whole structure.
[mtg: ] duh! You are right. Make the caller's hold the structure. Its been a long day. That would be easy todo.

--gmross


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arve Hjønnevåg on
2010/6/1 Thomas Gleixner <tglx(a)linutronix.de>:
>
> On Mon, 31 May 2010, Arve Hj�nnev�g wrote:
>
>> On Mon, May 31, 2010 at 2:46 PM, Thomas Gleixner <tglx(a)linutronix.de> wrote:
>> > On Mon, 31 May 2010, James Bottomley wrote:
>> >>
>> >> For MSM hardware, it looks possible to unify the S and C states by doing
>> >> suspend to ram from idle but I'm not sure how much work that is.
>> >
>> > On ARM, it's not rocket science and we have in tree support for this
>> > already (OMAP). I have done the same thing on a Samsung part as a
>> > prove of concept two years ago and it's really easy as the hardware is
>> > sane. Hint: It's designed for mobile devices :)
>> >
>>
>> We already enter the same power state from idle and suspend on msm. In
>> the absence of misbehaving apps, the difference in power consumption
>> is entirely caused by periodic timers in the user-space framework
>> _and_ kernel. It only takes a few timers triggering per second (I
>> think 3 if they do no work) to double the average power consumption on
>> the G1 if the radio is off. We originally added wakelocks because the
>> hardware we had at the time had much lower power consumption in
>> suspend then idle, but we still use suspend because it saves power.
>
> So how do you differentiate between timers which _should_ fire and
> those you do not care about ?
>

Only alarms are allowed to fire while suspended.

> We have mechanisms in place to defer timers so the wakeups are
> minimized. If that's not enough we need to revisit.
>

Deferring the the timers forever without stopping the clock can cause
problems. Our user space code has a lot of timeouts that will trigger
an error if an app does not respond in time. Freezing everything and
stopping the clock while suspended is a lot simpler than trying to
stop individual timers and processes from running.


--
Arve Hj�nnev�g
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arve Hjønnevåg on
On Tue, Jun 1, 2010 at 7:05 AM, mark gross <640e9920(a)gmail.com> wrote:
> On Tue, Jun 01, 2010 at 09:07:37AM +0200, Florian Mickler wrote:
....
>> +static void update_target_val(int pm_qos_class, s32 val)
>> +{
>> + � � s32 extreme_value;
>> + � � s32 new_value;
>> + � � extreme_value = atomic_read(&pm_qos_array[pm_qos_class]->target_value);
>> + � � new_value = pm_qos_array[pm_qos_class]->comparitor(val,extreme_value);
>> + � � if (extreme_value != new_value)
>> + � � � � � � atomic_set(&pm_qos_array[pm_qos_class]->target_value,new_value);
>> +}
>> +
>
> Only works 1/2 the time, but I like the idea!
> It fails to get the righ answer when constraints are reduced. �But, this
> idea is a good improvement i'll roll into the next pm_qos update!
>

I think it would be a better idea to track your constraints with a
sorted data structure. That way you can to better than O(n) for both
directions. If you have a lot of constraints with the same value, it
may even be worthwhile to have a two stage structure where for
instance you use a rbtree for the unique values and list for identical
constraints.

--
Arve Hj�nnev�g
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: James Bottomley on
On Tue, 2010-06-01 at 18:10 -0700, Arve Hjønnevåg wrote:
> On Tue, Jun 1, 2010 at 3:36 PM, James Bottomley <James.Bottomley(a)suse.de> wrote:
> > On Wed, 2010-06-02 at 00:24 +0200, Rafael J. Wysocki wrote:
> >> On Tuesday 01 June 2010, James Bottomley wrote:
> >> > On Tue, 2010-06-01 at 14:51 +0100, Matthew Garrett wrote:
> >> > > On Mon, May 31, 2010 at 04:21:09PM -0500, James Bottomley wrote:
> >> > >
> >> > > > You're the one mentioning x86, not me. I already explained that some
> >> > > > MSM hardware (the G1 for example) has lower power consumption in S3
> >> > > > (which I'm using as an ACPI shorthand for suspend to ram) than any
> >> > > > suspend from idle C state. The fact that current x86 hardware has the
> >> > > > same problem may be true, but it's not entirely relevant.
> >> > >
> >> > > As long as you can set a wakeup timer, an S state is just a C state with
> >> > > side effects. The significant one is that entering an S state stops the
> >> > > process scheduler and any in-kernel timers. I don't think Google care at
> >> > > all about whether suspend is entered through an explicit transition or
> >> > > something hooked into cpuidle - the relevant issue is that they want to
> >> > > be able to express a set of constraints that lets them control whether
> >> > > or not the scheduler keeps on scheduling, and which doesn't let them
> >> > > lose wakeup events in the process.
> >> >
> >> > Exactly, so my understanding of where we currently are is:
> >>
> >> Thanks for the recap.
> >>
> >> > 1. pm_qos will be updated to be able to express the android suspend
> >> > blockers as interactivity constraints (exact name TBD, but
> >> > probably /dev/cpu_interactivity)
> >>
> >> I think that's not been decided yet precisely enough. I saw a few ideas
> >> here and there in the thread, but which of them are we going to follow?
> >
> > Well, android only needs two states (block and don't block), so that
> > gets translated as 2 s32 values (say 0 and INT_MAX). I've seen defines
> > like QOS_INTERACTIVE and QOS_NONE (or QOS_DRECKLY or QOS_MANANA) to
> > describe these, but if all we're arguing over is the define name, that's
> > progress.
>
> I think we need separate state constraints for suspend and idle low
> power modes. On the msm platform only a subset of the interrupts can
> wake up from the low power mode, so we block the use if the low power
> mode from idle while other interrupts are enabled. We do not block
> suspend however if those interrupts are not marked as wakeup
> interrupts. Most constraints that prevent suspend are not hardware
> specific and should not prevent entering low power modes from idle. In
> other words we may need to prevent low power idle modes while allowing
> suspend, and we may need to prevent suspend while allowing low power
> idle modes.

Well, as I said, pm_qos is s32 ... it's easy to make the constraint
ternary instead of binary.

> It would also be good to not have an implementation that gets slower
> and slower the more clients you have. With binary constraints this is
> trivial.

Well, that's an implementation detail ... ordering the list or using a
btree would significantly fix that. However, the most number of
constraint users I've seen in android is around 60 ... that's not huge
from a kernel linear list perspective, so is this really a concern? ...
particularly when most uses don't necessarily change the constrain, so a
list search isn't done.

> > The other piece they need is the suspend block name, which comes with
> > the stats API, and finally we need to decide what the actual constraint
> > is called (which is how the dev node gets its name) ...
> >
> >> > 2. pm_qos will be updated to be callable from atomic context
> >> > 3. pm_qos will be updated to export statistics initially closely
> >> > matching what suspend blockers provides (simple update of the rw
> >> > interface?)
>
> 4. It would be useful to change pm_qos_add_request to not allocate
> anything so can add constraints from init functions that currently
> cannot fail.

Sure .. we do that for the delayed work queues, it's just an API which
takes the structure as an argument leaving it the responsibility of the
caller to free.

> >> > After this is done, the current android suspend block patch becomes a
> >> > re-expression in kernel space in terms of pm_qos, with the current
> >> > userspace wakelocks being adapted by the android framework into pm_qos
> >> > requirements expressed to /dev/cpu_interactivity (or whatever name is
> >> > chosen). Then opportunistic suspend is either a small add-on kernel
> >> > patch they have in their tree to suspend when the interactivity
> >> > constraint goes to NONE, or it could be done entirely by a userspace
> >> > process. Long term this could migrate to the freezer and suspend from
> >> > idle approach as the various problem timers get fixed.
> >> >
> >> > I think the big unresolved issue is the stats extension. For android,
> >> > we need just a name written along with the value, so we have something
> >> > to hang the stats off ... current pm_qos userspace users just write a
> >> > value, so the name would be optional. From the kernel, we probably just
> >> > need an additional API that takes a stats name or NULL if none
> >> > (pm_qos_add_request_named()?). Then reading the stats could be done by
> >> > implementing a fops read routine on the misc device.
> >>
> >> Is the original idea of having that information in debugfs objectionable?
> >
> > Well ... debugfs is usually used to get around the sysfs rules. In this
> > case, pm_qos has a dev interface ... I don't specifically object to
> > using debugfs, but I don't see any reason to forbid it from being a
> > simple dev read interface either.
> >
>
> We don't currently have a dev interface for stats so this is not an
> immediate requirement. The suspend blocker debugfs interface is just
> as good as the proc interface we have for wakelocks.

OK, great ... what actually exports the statistics is just an
implementation detail.

James



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: James Bottomley on
On Tue, 2010-06-01 at 19:45 -0700, mark gross wrote:
> On Tue, Jun 01, 2010 at 04:01:25PM -0500, James Bottomley wrote:
> > On Tue, 2010-06-01 at 14:51 +0100, Matthew Garrett wrote:
> > > On Mon, May 31, 2010 at 04:21:09PM -0500, James Bottomley wrote:
> > >
> > > > You're the one mentioning x86, not me. I already explained that some
> > > > MSM hardware (the G1 for example) has lower power consumption in S3
> > > > (which I'm using as an ACPI shorthand for suspend to ram) than any
> > > > suspend from idle C state. The fact that current x86 hardware has the
> > > > same problem may be true, but it's not entirely relevant.
> > >
> > > As long as you can set a wakeup timer, an S state is just a C state with
> > > side effects. The significant one is that entering an S state stops the
> > > process scheduler and any in-kernel timers. I don't think Google care at
> > > all about whether suspend is entered through an explicit transition or
> > > something hooked into cpuidle - the relevant issue is that they want to
> > > be able to express a set of constraints that lets them control whether
> > > or not the scheduler keeps on scheduling, and which doesn't let them
> > > lose wakeup events in the process.
> >
> > Exactly, so my understanding of where we currently are is:
> >
> > 1. pm_qos will be updated to be able to express the android suspend
> > blockers as interactivity constraints (exact name TBD, but
> > probably /dev/cpu_interactivity)
> > 2. pm_qos will be updated to be callable from atomic context
> > 3. pm_qos will be updated to export statistics initially closely
> > matching what suspend blockers provides (simple update of the rw
> > interface?)
> >
> > After this is done, the current android suspend block patch becomes a
> > re-expression in kernel space in terms of pm_qos, with the current
> > userspace wakelocks being adapted by the android framework into pm_qos
> > requirements expressed to /dev/cpu_interactivity (or whatever name is
> > chosen). Then opportunistic suspend is either a small add-on kernel
> > patch they have in their tree to suspend when the interactivity
> > constraint goes to NONE, or it could be done entirely by a userspace
> > process. Long term this could migrate to the freezer and suspend from
> > idle approach as the various problem timers get fixed.
>
> This is all nice but, all this does is implement the exact same thing as
> the wake lock / suspend blocker API as a pm_qos request-class.

funny that ...

> It
> leaves the overlapping constraint issue from ISR to user mode in place
> depending on exactly how the oppertunistic suspend is implemented.

if the vanilla kernel is simply consuming the pm_qos infrastructure and
using suspend from idle, this is irrelevant. As I said, S3 suspend
*can* be implemented via a suspend manager process from userspace (the
alan stern proposal). However, if I were coding the android kernel, I'd
do it as a tiny add on kernel patch. The main goal of making the
android kernel close enough to the vanilla kernel for there not to be
two separate upstreams for the device driver writers has been achieved
regardless of which path is taken.

> I expect it will be via a notifier on the pm_qos request-class update
> that would do exactly what the wake lock code does today. just load up
> an a "suspend_on_non_interactivity" driver that registers for the call
> back, have it enabled by the user mode PM, and you have the equivelent
> architecture as what was proposed by the wake lock patches.
>
> it gives the Android guys what they want, without adding a new
> subsystem, minimizing the changes and makes most of the architecture
> much more politicaly acceptible.
>
> But doesn't it have the same issues with getting the overlapping
> constraints right from wake up source to user mode and dealing with the
> wake up envents in a sane way? Instead of sprinkling suspend-blockers
> about the kernel we'll sprinkle pm_qos_requests about. I like getting
> more users of pm_qos, but isn't this the same thing?

Suspend from idle doesn't have the wakeup problem. it only manifests if
you want to take the system down via the S states. I think long term,
making suspend from idle work for all hardware is the agreed goal, even
if android can't implement it today and has to use an S state work
around.

> > I think the big unresolved issue is the stats extension. For android,
> > we need just a name written along with the value, so we have something
> > to hang the stats off ... current pm_qos userspace users just write a
> > value, so the name would be optional. From the kernel, we probably just
> > need an additional API that takes a stats name or NULL if none
> > (pm_qos_add_request_named()?). Then reading the stats could be done by
> > implementing a fops read routine on the misc device.
>
> I don't think the status would be a big deal to add.
>
>
> However; I am really burned out by this discussion. I am willing to
> stub this out ASAP if it puts this behind us if the principles in the
> discussion are in more or less agreement.
>
> --mgross
>
> For the record, I still like my low power event idea, which could
> coexist with the above.

The proposal is isomorphic to what I said above ... just
s/pm_qos/whatever the lp API is/

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/