From: Peter Zijlstra on
On Fri, 2010-06-04 at 01:23 +0200, Ingo Molnar wrote:
> Btw., i'd like to summarize the scheduler based suspend scheme proposed by
> Thomas Gleixner, Peter Zijlstra and myself. I found no good summary of it in
> the big thread, and there are also new elements of the proposal:

Just to clarify, my proposition doesn't go much further than treating
'suspend' as a genuine idle state (on suitable hardware, which x86 isn't).

> - Create a 'deep idle' mode that suspends. This, if all constraints
> are met, is triggered by the scheduler automatically: just like the other
> idle modes are triggered currently. This approach fixes the wakeup
> races because an incoming wakeup event will set need_resched() and
> abort the suspend.
>

Right, so 'suspend' as idle seems (at least on UP/arm) a very sensible
idea. On SMP current suspend hot-unplugs all but the boot cpu, I'm not
sure we need to do that, since if the system is genuinely idle, what races
are there?

And if its not idle...

> ( This mode can even use the existing suspend code to bring stuff down,
> therefore it also solves the pending timer problem and works even on
> PC style x86. )

You cannot solve the pending timer issue from idle, unless you allow idle
to stop clock_monotonic, which would change idle semantics, and that is not
something I can say is a good idea.

You want all idle states to have the same semantics, otherwise things just
get way too confusing.

> - Solve crappy app confinement via the scheduler:
>
> A first proposal was to use the existing cgroup mechanism,

I still believe containment is a cgroup problem. The freeze/snapshot/resume
container folks seem to face many of the same problems. Including the
pending timer one I suspect. Lets solve it there.

> - Controlled auto-suspend: drivers (such as input) could on wakeup
> automatically set the 'minimum wakeup latency' value of wakee tasks to a
> lower value. This automatically prevents another auto-suspend in the near
> future: up to the point the wakee task increases its latency (via the
> scheduler syscall) again and allows suspend again.

I think treating wakeups special like that is a mistake. I also think the
kernel should never adjust a task's QoS attributes, the user set them in
the expectation of them being respected.

I'm not really sure about the interaction between wakeups and untrusted
apps. It seems to me that an untrusted app needs a trusted intermediate
anyway, that intermediate can be responsible for freezing/unfreezing of the
untrusted app.

So either the app asks for suspend blockers through the intermediate, or it's
cgroup is managed by the intermediate -- should work out to the same end
result, right?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Fri, 2010-06-04 at 11:43 +0200, Peter Zijlstra wrote:
> I still believe containment is a cgroup problem. The freeze/snapshot/resume
> container folks seem to face many of the same problems. Including the
> pending timer one I suspect. Lets solve it there.

While talking to Thomas about this, we'd probably need a CLOCK_MONOTONIC
namespace to pull this off, so that resumed apps don't see the jump in
absolute time.

This would also help with locating the relevant timers, since they'd be
on the related timer base.

The only 'interesting' issue I can see here is that if you create 1000
CLOCK_MONOTONIC namepaces, we'd need to have a tree of trees in order to
efficiently find the leftmost timer.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Brian Swetland <swetland(a)google.com> wrote:

> On Fri, Jun 4, 2010 at 1:55 AM, Ingo Molnar <mingo(a)elte.hu> wrote:
> > * Brian Swetland <swetland(a)google.com> wrote:
> >> > After basically two years of growing your fork (and some attempts to get
> >> > your drivers into drivers/staging/ - from where they have meanwhile
> >> > dropped out again) you re-started with the worst possible thing to merge:
> >> > a big and difficult kernel feature affecting many subsystems. Why?
> >>
> >> Because a large number of our drivers depend on it.
> >
> > So why not put in some stub or so? Auto-suspend/suspend-blockers is a
> > feature, and drivers ought to be able to work without a feature as well.
> > Keep the suspend-blocker changes in the android tree initially, and get
> > the main body of changes out first, and establish a flow of timely
> > changes. That reduces your maintenance burden and increases trust for
> > future changes - a win-win situation.
>
> The impression I got from previous discussions was that upstream did not
> want things that were built conditionally around APIs that did not exist in
> mainline nor stub implementations for things that were not agreed upon.

Well, if it's some ugly #ifdef solution i could imagine light objections on
pure aesthetic micro-grounds.

> We could easily either #if defined(CONFIG_SUSPEND_BLOCKERS) or submit a
> suspend_blockers.h that just makes everything a no-op, if that's an
> acceptable transition vehicle. I didn't think either were an option open to
> us.

You can certainly put in a suspend_blockers.h thing into some Android
directory, and populate it with empty wrappers - as long as you only use it
within Android drivers and not core kernel code or other subsystems you dont
maintain.

It's being done all the time and helpful cleanup patches eliminating the stubs
are frowned upon (unless the subs are there like for years with no progress
and no maintenance in sight).

Putting empty stubs into include/linux/ would be pushing things i think.

In fact sometimes architectures even jump the gun with major kernel features:
we had a dynticks implementation in ARM for years, we had RTLinux stubs in x86
code for quite some time, and we still have perfmon in IA64 - despite the core
kernel having gone for a different design.

It's certainly not ideal, but it's certainly a solution that is used every now
and then. The less difference there is between trees the easier it becomes to
merge - for both sides, both technically and socially.

> > In any case, this is not to suggest that the suspend-blocker bits are
> > 'impossible' to merge. I just say that if you start with your most
> > difficult feature you should not be surprised to be on the receiving end
> > of a 1000+ mails flamewar on lkml ;-)
>
> Yeah, I do understand that we're not making it easy for ourselves here. I
> think we hit the point where Rafael and Matthew signed off on things and
> thought "aha, linux-pm maintainers are happy, now we're getting somewhere"
> only to realize the light at the end of the tunnel was a bit further out
> than we anticipated ^^

That's a well-known problem on lkml: the light at the end of the tunnel was
the other train ;-)

Anyway, i'm not pessimistic at all: _some_ sort of scheme appears to be
crystalising out today. Everyone seems to agree now that the main usecases are
indeed useful and need handling one way or another - the rest is really just
technological discussions how to achieve the mostly-agreed-upon end goal.

The worst situation are features where one side says 'we dont need this kind
of functionality at all' - IMO auto/opportunistic-suspend isnt in that
situation, fortunately.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Peter Zijlstra <peterz(a)infradead.org> wrote:

> On Fri, 2010-06-04 at 11:43 +0200, Peter Zijlstra wrote:
> > I still believe containment is a cgroup problem. The freeze/snapshot/resume
> > container folks seem to face many of the same problems. Including the
> > pending timer one I suspect. Lets solve it there.
>
> While talking to Thomas about this, we'd probably need a CLOCK_MONOTONIC
> namespace to pull this off, so that resumed apps don't see the jump in
> absolute time.
>
> This would also help with locating the relevant timers, since they'd be on
> the related timer base.

Ok - this looks workable, and looks technically isolated that can be pursued
as a separate module of this whole topic.

> The only 'interesting' issue I can see here is that if you create 1000
> CLOCK_MONOTONIC namepaces, we'd need to have a tree of trees in order to
> efficiently find the leftmost timer.

Realistically Android userspace would create just a single such namespace for
all the untrusted/unknown/uncontrolled apps, right?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Fri, 2010-06-04 at 12:03 +0200, Ingo Molnar wrote:

> > The only 'interesting' issue I can see here is that if you create 1000
> > CLOCK_MONOTONIC namepaces, we'd need to have a tree of trees in order to
> > efficiently find the leftmost timer.
>
> Realistically Android userspace would create just a single such namespace for
> all the untrusted/unknown/uncontrolled apps, right?

Possibly, yeah.

But it might not stop someone else from create an insane amount of them.
So we do need to deal with that, and a linear loop over all timer bases,
which then will be a user controlled quantity, just doesn't sound
right :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/