From: Thomas Gleixner on
On Fri, 4 Jun 2010, Peter Zijlstra wrote:

> On Fri, 2010-06-04 at 11:43 +0200, Peter Zijlstra wrote:
> > I still believe containment is a cgroup problem. The freeze/snapshot/resume
> > container folks seem to face many of the same problems. Including the
> > pending timer one I suspect. Lets solve it there.
>
> While talking to Thomas about this, we'd probably need a CLOCK_MONOTONIC
> namespace to pull this off, so that resumed apps don't see the jump in
> absolute time.
>
> This would also help with locating the relevant timers, since they'd be
> on the related timer base.
>
> The only 'interesting' issue I can see here is that if you create 1000
> CLOCK_MONOTONIC namepaces, we'd need to have a tree of trees in order to
> efficiently find the leftmost timer.

We can do more clever than that. All CLOCK_MONOTONIC timers can live
in the CLOCK_MONOTONIC rbtree, we just need proper annotation, i.e.:

struct hrtimer {
ktime_t expires;
......
struct list_head namespace;
ktime_t base_offset;
};

So expires would be on CLOCK_MONOTONIC as seen from the kernel, just
the user space interfaces would take the base_offset into account.

On freeze we remove the timers from the rbtree (they are easy to
find via the namespace list) and on thaw we set the base_offset
accordingly and insert them again. So no surprise for user space and
no tree of trees to walk through.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Fri, 2010-06-04 at 12:11 +0200, Thomas Gleixner wrote:
> On Fri, 4 Jun 2010, Peter Zijlstra wrote:
>
> > On Fri, 2010-06-04 at 11:43 +0200, Peter Zijlstra wrote:
> > > I still believe containment is a cgroup problem. The freeze/snapshot/resume
> > > container folks seem to face many of the same problems. Including the
> > > pending timer one I suspect. Lets solve it there.
> >
> > While talking to Thomas about this, we'd probably need a CLOCK_MONOTONIC
> > namespace to pull this off, so that resumed apps don't see the jump in
> > absolute time.
> >
> > This would also help with locating the relevant timers, since they'd be
> > on the related timer base.
> >
> > The only 'interesting' issue I can see here is that if you create 1000
> > CLOCK_MONOTONIC namepaces, we'd need to have a tree of trees in order to
> > efficiently find the leftmost timer.
>
> We can do more clever than that. All CLOCK_MONOTONIC timers can live
> in the CLOCK_MONOTONIC rbtree, we just need proper annotation, i.e.:
>
> struct hrtimer {
> ktime_t expires;
> ......
> struct list_head namespace;
> ktime_t base_offset;
> };
>
> So expires would be on CLOCK_MONOTONIC as seen from the kernel, just
> the user space interfaces would take the base_offset into account.
>
> On freeze we remove the timers from the rbtree (they are easy to
> find via the namespace list) and on thaw we set the base_offset
> accordingly and insert them again. So no surprise for user space and
> no tree of trees to walk through.

Ah indeed, much nicer.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Brian Swetland on
On Fri, Jun 4, 2010 at 2:59 AM, Ingo Molnar <mingo(a)elte.hu> wrote:
>
> You can certainly put in a suspend_blockers.h thing into some Android
> directory, and populate it with empty wrappers - as long as you only use it
> within Android drivers and not core kernel code or other subsystems you dont
> maintain.
>
> It's being done all the time and helpful cleanup patches eliminating the stubs
> are frowned upon (unless the subs are there like for years with no progress
> and no maintenance in sight).
>
> Putting empty stubs into include/linux/ would be pushing things i think.
>
> In fact sometimes architectures even jump the gun with major kernel features:
> we had a dynticks implementation in ARM for years, we had RTLinux stubs in x86
> code for quite some time, and we still have perfmon in IA64 - despite the core
> kernel having gone for a different design.
>
> It's certainly not ideal, but it's certainly a solution that is used every now
> and then. The less difference there is between trees the easier it becomes to
> merge - for both sides, both technically and socially.

Totally -- our goal would be that as drivers find their way from our
tree to mainline we'd keep them 1:1 between the trees. If we can it a
local suspend_blocker.h somewhere while the long term solution gets
hashed out that'd remove the biggest painpoint on a driver level. I'm
not quite sure where the best place to drop such a thing would be --
we'd likely be including it from mach-msm, mach-tegra2, and drivers
for both those architectures in the normal driver places for the tree.
I guess we could just drop it in
arch/arm/mach-{msm,tegra2}/include/mach/ and both the subarch code and
subarch-specific-drivers we've been writing could pick it up via
#include <mach/suspend_blockers.h>

>> Yeah, I do understand that we're not making it easy for ourselves here.  I
>> think we hit the point where Rafael and Matthew signed off on things and
>> thought "aha, linux-pm maintainers are happy, now we're getting somewhere"
>> only to realize the light at the end of the tunnel was a bit further out
>> than we anticipated ^^
>
> That's a well-known problem on lkml: the light at the end of the tunnel was
> the other train ;-)
>
> Anyway, i'm not pessimistic at all: _some_ sort of scheme appears to be
> crystalising out today. Everyone seems to agree now that the main usecases are
> indeed useful and need handling one way or another - the rest is really just
> technological discussions how to achieve the mostly-agreed-upon end goal.
>
> The worst situation are features where one side says 'we dont need this kind
> of functionality at all' - IMO auto/opportunistic-suspend isnt in that
> situation, fortunately.

It is encouraging that there's at least some general consensus that
the feature is useful, and as Arve and I have both mentioned, we're
really not religious about names, etc, provided we can solve the
problem we're trying to solve, so if it ends up being qos constraints
or something else entirely but still gets us where we're trying to go,
it's good news.

I think one point of contention remaining may be "just blocking
suspend" vs "halting specific untrusted processes". The latter is
difficult for us to work with because of the overall complexity of
(our) userspace environment. A big hammer where we stop it all and
suspend ends up being less deadlock/inversion-prone. Of course if the
general solution ends up being able to do either, then perhaps
everyone's happy.

Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on
Linus Torvalds <torvalds(a)linux-foundation.org> writes:

> On Thu, 3 Jun 2010, Arjan van de Ven wrote:
>>
>> And because there's then no power saving (but a performance cost), it's
>> actually a negative for battery life/total energy.
>
> Including the UP optimizations we do (ie lock prefix removal)? It's

Those only help the kernel and most workloads do not do enough kernel
execution for it to really matter, but spend most of their
time in user space.

Even if as kernel programmers we often have a different view, in most
cases most cycles are in user space :)

-Andi
--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Fri, 2010-06-04 at 01:56 -0700, Arve Hjønnevåg wrote:
> On Fri, Jun 4, 2010 at 1:34 AM, Ingo Molnar <mingo(a)elte.hu> wrote:
> >
> > * Arve Hj?nnev?g <arve(a)android.com> wrote:
> >
> >> > [...]
> >> >
> >> > Why do you need to track input wakeups? It's rather fragile and rather
> >> > unnecessary [...]
> >>
> >> Because we have keys that should always turn the screen on, but the problem
> >> is not specific to input events. If we enabled a wakeup event it usually
> >> means we need this event to always work, not just when the system is fully
> >> awake or fully suspended.
> >
> > Hm, i cannot follow that generic claim. Could you please point out the problem
> > to me via a specific example? Which task does what, what undesirable thing
> > happens where, etc.
> >
>
> We have many wakeup events, and some of them are invisible to the
> user. For instance on the Nexus One wake up every 10 minutes monitor
> the battery health.

> If the user presses a key right after this work
> has finished and we did not block suspend until userspace could
> process this key event, we risk suspending before we could turn the
> screen on, which to the user looks like the key did not work.

> Another
> example, the user pressed the power key which turns the screen off and
> allows suspend. We initiate suspend and a phone call comes in. If we
> don't block suspend until we processed the incoming phone call
> notification, the phone may never ring (some devices will send a new
> message every few seconds for this, so on those devices it would just
> delay the ringing).

Right, so in the proposed scheme all these tasks would be executed by
trusted processes, and trusted processes will never get frozen and so
will never be delayed in processing these events.

Only untrusted code will be frozen. And trusted processes are reliable
for thawing the untrusted processes and delivering events to it.

Trusted processes are assumed to be sane and idle when there is nothing
for them to do, allowing the machine to go into deep idle states.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/