suspend blockers & Android integration [Kernel]

Prev: How do I ignore the changes made by CVS keyword substitution efficiently?
Next: [PATCH 1/2] x86: make save_stack_address() !CONFIG_FRAME_POINTER friendly

From: Brian Swetland on 4 Jun 2010 04:40

On Fri, Jun 4, 2010 at 12:57 AM, Ingo Molnar <mingo(a)elte.hu> wrote:
> * Brian Swetland <swetland(a)google.com> wrote:
>>
>> We started here because it's possibly the only api level change we have --
>> almost everything else is driver or subarch type work or controversial but
>> entirely self-contained (like the binder, which I would be shocked to see
>> ever hit mainline). [...]
>
> So why arent those bits mainline? It's a 1000 times easier to get drivers and
> small improvements and non-ABI changes upstream.
>
> After basically two years of growing your fork (and some attempts to get your
> drivers into drivers/staging/ - from where they have meanwhile dropped out
> again) you re-started with the worst possible thing to merge: a big and
> difficult kernel feature affecting many subsystems. Why?

Because a large number of our drivers depend on it.

> This is one of the fundamental problems here. People simply dont know you,
> because you have not worked with us much - and hence they dont trust you
> positively out of box - they are neutral at best.
>
> And believe me, it's hard enough to get difficult features upstream if people
> _do_ know you and when they positively _do_ trust you ... Arent you talking to
> Andrew Morton about how to do these things properly? This is kernel
> contribution 101 really.
>
>> [...] Assertions have been made that because the "android kernel" (not a
>> term I like -- linux is linux, we have some assorted patches on top) [...]
>
> I've been tracking android-common and android-msm for a while and i have to
> say that it shows a very lackluster attitude towards upstream:
>
> - The latest branches i can see are v2.6.32 based today. We are in the
> v2.6.35 stabilization cycle and are developing v2.6.36. I.e. your upstream
> base is about a year too old.

We have some branch naming confusion and work going on in
experimental, but our active work right now is against 2.6.34 and
2.6.35-rc. The tegra2 work has been very aggressively following
mainline (rebasing against 2.6.34rc as they were getting underway),
and we've been sending those patches out for review, in hopes of
getting that tree off on a better foot.

>
> - The last commit is a couple of weeks old AFAICS.
>
> - The diffstat of android-common/android-2.6.32 is:
>
> 890 files changed, 39962 insertions(+), 6286 deletions(-)
>
> Those assorted patches have spread over nearly a thousand files. FYI, by
> the looks of it you are facing an exponentially worsening maintenance
> overhead curve here.
>
> Is there perhaps some other tree i should be following? I'm looking at:
>
> [remote "android-msm"]
> url = git://android.git.kernel.org/kernel/msm.git
> fetch = +refs/heads/*:refs/remotes/android-msm/*
> [remote "android-common"]
> url = git://android.git.kernel.org/kernel/common.git
> fetch = +refs/heads/*:refs/remotes/android-common/*
>
> Btw., the commits i've glanced at looked mostly clean and well structured, so
> i see no fundamental reason why this couldn't be done better.

I think the fundamental issue we keep bumping into is the turnaround
time on patch review / inclusion (again we're trying to get things
going much earlier on tegra2 to hopefully have less pain there). We
aim for kernel style compliance (though we're not perfect and we make
our share of mistakes), but previously when I tried sending mach-msm
stuff out, it seemed infeasible to send 30-60+ patches, so we'd start
with 5-10, feedback would trickle in over the course of a week, I'd
respin, etc. After a couple weeks some stuff would get picked up
toward a merge window but the rest would have to wait. And then we
hit crunch to ship, etc, and get behind.

Totally our fault that we're not just constantly pushing patches (and
we're trying to get a fulltime engineer or two just to work on
upstream related stuff), but we rapidly hit the point where what we're
sending up is a drop in the bucket compared to the work we're doing
and things keep diverging, etc.

I'm told this happens to everyone, is common, etc. We're (seriously)
a small team, trying to ship multiple products a year and keep our
head above water here, and unfortunately that means we keep tabling
these projects until we can find some cycles to give it another go and
the delta grows.

>> So, we figure, let's sort out the hard problem first and then move on with
>> our lives.
>
> Well, my suggestion would be to first build up a path towards upstream, build
> up trust, reduce your very high cross section to mainline - and do the most
> difficult bits last.

Having to maintain two versions of about half our driver code because
we depend on an ABI not in mainline is a significant factor for us --
it's difficult to have what's going upstream lag behind our active
work (basically we have to maintain two different trees -- one for
mainline one for ship) already, but having these codelines also be
different makes it worse for us.

> Especially 'move on with our lives' suggests that you just want to get rid of
> this ABI divergence and continue-as-usual with the pattern of non-cooperation,
> hm?

I'd like to make some forward progress either to get something
wakelock-ish in and shift to whatever that api is, or to get a clear
"no not going to happen" and deal with the fallout there.

....

Sadly, for mach-msm, we're now further out due to maintainership
shifts (Daniel stepped up to do msm stuff, is pushing up some hybrid
of our work and Qualcomm's work that doesn't seem to really fit with
either, and I have no idea how to sanely get our stuff to sit on top
of that). I'd love to find some time to sit down, clean up the whole
msm tree for 8x50/7x30 which is (largely) pretty clean, and is
extremely stable and shippable, and try to get it into a patch series
and headed upstream, but we're now colliding with the upstream
mach-msm which has gone off in a different direction, etc.

Anyway, we continue to try to figure out how to make stuff work better
(again, trying some different approaches with tegra2), but so far the
process of getting code upstream has been extremely time intensive and
rather frustrating and it remains unclear who can sign off on what and
how many hoops different people will keep asking us to jump through.

Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Arve Hjønnevåg on 4 Jun 2010 05:00

On Fri, Jun 4, 2010 at 1:34 AM, Ingo Molnar <mingo(a)elte.hu> wrote:
>
> * Arve Hj?nnev?g <arve(a)android.com> wrote:
>
>> > [...]
>> >
>> > Why do you need to track input wakeups? It's rather fragile and rather
>> > unnecessary [...]
>>
>> Because we have keys that should always turn the screen on, but the problem
>> is not specific to input events. If we enabled a wakeup event it usually
>> means we need this event to always work, not just when the system is fully
>> awake or fully suspended.
>
> Hm, i cannot follow that generic claim. Could you please point out the problem
> to me via a specific example? Which task does what, what undesirable thing
> happens where, etc.
>

We have many wakeup events, and some of them are invisible to the
user. For instance on the Nexus One wake up every 10 minutes monitor
the battery health. If the user presses a key right after this work
has finished and we did not block suspend until userspace could
process this key event, we risk suspending before we could turn the
screen on, which to the user looks like the key did not work. Another
example, the user pressed the power key which turns the screen off and
allows suspend. We initiate suspend and a phone call comes in. If we
don't block suspend until we processed the incoming phone call
notification, the phone may never ring (some devices will send a new
message every few seconds for this, so on those devices it would just
delay the ringing).

--
Arve Hj�nnev�g
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ingo Molnar on 4 Jun 2010 05:00

* Brian Swetland <swetland(a)google.com> wrote:

> On Fri, Jun 4, 2010 at 12:57 AM, Ingo Molnar <mingo(a)elte.hu> wrote:
> > * Brian Swetland <swetland(a)google.com> wrote:
> >>
> >> We started here because it's possibly the only api level change we have
> >> -- almost everything else is driver or subarch type work or controversial
> >> but entirely self-contained (like the binder, which I would be shocked to
> >> see ever hit mainline). [...]
> >
> > So why arent those bits mainline? It's a 1000 times easier to get drivers
> > and small improvements and non-ABI changes upstream.
> >
> > After basically two years of growing your fork (and some attempts to get
> > your drivers into drivers/staging/ - from where they have meanwhile
> > dropped out again) you re-started with the worst possible thing to merge:
> > a big and difficult kernel feature affecting many subsystems. Why?
>
> Because a large number of our drivers depend on it.

So why not put in some stub or so? Auto-suspend/suspend-blockers is a feature,
and drivers ought to be able to work without a feature as well. Keep the
suspend-blocker changes in the android tree initially, and get the main body
of changes out first, and establish a flow of timely changes. That reduces
your maintenance burden and increases trust for future changes - a win-win
situation.

In any case, this is not to suggest that the suspend-blocker bits are
'impossible' to merge. I just say that if you start with your most difficult
feature you should not be surprised to be on the receiving end of a 1000+
mails flamewar on lkml ;-)

> > I've been tracking android-common and android-msm for a while and i have
> > to say that it shows a very lackluster attitude towards upstream:
> >
> > ??- The latest branches i can see are v2.6.32 based today. We are in the
> > ?? v2.6.35 stabilization cycle and are developing v2.6.36. I.e. your
> > upstream ?? base is about a year too old.
>
> We have some branch naming confusion and work going on in
> experimental, but our active work right now is against 2.6.34 and
> 2.6.35-rc. [...]

That's nice!

> [...] The tegra2 work has been very aggressively following mainline
> (rebasing against 2.6.34rc as they were getting underway), and we've been
> sending those patches out for review, in hopes of getting that tree off on a
> better foot.

Ah, googling for 'tegra2' gave me the magic URI:

git remote add android-tegra2 git://android.git.kernel.org/kernel/tegra.git

I generally roam various trees for scheduler patches when i can, seeing what
problems people are facing and trying to prevent more painful forks from
developing. You have these changes there currently:

d82647e: sched: make task dump print all 15 chars of proc comm
5e3e0f1: sched: Enable might_sleep before initializing drivers.

Please submit 5e3e0f1. We can probably do that one even simpler, by turning
__might_sleep_init_called into the only flag that __might_sleep() checks -
i.e. not checking system_state at all.

Also, please submit d82647e, it makes sense too.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Pekka Enberg on 4 Jun 2010 05:10

On Fri, Jun 4, 2010 at 11:55 AM, Ingo Molnar <mingo(a)elte.hu> wrote:
> In any case, this is not to suggest that the suspend-blocker bits are
> 'impossible' to merge. I just say that if you start with your most difficult
> feature you should not be surprised to be on the receiving end of a 1000+
> mails flamewar on lkml ;-)

Indeed. This 'all or nothing' approach hasn't worked well in the past
and I highly doubt it will work now. It's much easier to work with
people when you have a track record of getting things merged and
actually maintaining the code.

Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Brian Swetland on 4 Jun 2010 05:10

On Fri, Jun 4, 2010 at 1:55 AM, Ingo Molnar <mingo(a)elte.hu> wrote:
> * Brian Swetland <swetland(a)google.com> wrote:
>> > After basically two years of growing your fork (and some attempts to get
>> > your drivers into drivers/staging/ - from where they have meanwhile
>> > dropped out again) you re-started with the worst possible thing to merge:
>> > a big and difficult kernel feature affecting many subsystems. Why?
>>
>> Because a large number of our drivers depend on it.
>
> So why not put in some stub or so? Auto-suspend/suspend-blockers is a feature,
> and drivers ought to be able to work without a feature as well. Keep the
> suspend-blocker changes in the android tree initially, and get the main body
> of changes out first, and establish a flow of timely changes. That reduces
> your maintenance burden and increases trust for future changes - a win-win
> situation.

The impression I got from previous discussions was that upstream did
not want things that were built conditionally around APIs that did not
exist in mainline nor stub implementations for things that were not
agreed upon.

We could easily either #if defined(CONFIG_SUSPEND_BLOCKERS) or submit
a suspend_blockers.h that just makes everything a no-op, if that's an
acceptable transition vehicle. I didn't think either were an option
open to us.

> In any case, this is not to suggest that the suspend-blocker bits are
> 'impossible' to merge. I just say that if you start with your most difficult
> feature you should not be surprised to be on the receiving end of a 1000+
> mails flamewar on lkml ;-)

Yeah, I do understand that we're not making it easy for ourselves
here. I think we hit the point where Rafael and Matthew signed off on
things and thought "aha, linux-pm maintainers are happy, now we're
getting somewhere" only to realize the light at the end of the tunnel
was a bit further out than we anticipated ^^

Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Prev: How do I ignore the changes made by CVS keyword substitution efficiently?
Next: [PATCH 1/2] x86: make save_stack_address() !CONFIG_FRAME_POINTER friendly