From: Arve Hjønnevåg on
2010/6/5 Alan Stern <stern(a)rowland.harvard.edu>:
> On Sat, 5 Jun 2010, Arve Hj�nnev�g wrote:
>
>> Yes, we can keep all our user space suspend blockers and thaw the
>> frozen cgroup when any suspend blocker is held, but this would
>> eliminate any power advantage that freezing a cgroup has over using
>> suspend to freeze all processes. Without annotating the drivers to
>> block the cgroup freezing in the same places as we now block suspend,
>> it also prevents processes in the cgroup that we freeze from directly
>> consuming wakup events.
>
> The driver annotations don't need to block the cgroup freezing. �They
> just need to keep the system running long enough to awaken a thread
> that will handle the wakeup event. �(See below.) �A pm-qos constraint
> is good enough for this.
>

I'm not sure what you mean by this, either you need to annotate the
drivers or you don't.

>> If you are referring to the approach that we don't use suspend but
>> freeze a cgroup instead, this only solves the problem of bad apps. It
>> does not help pause timers in trusted user space code and in the
>> kernel, so it does not lower our average power consumption.
>
> You can solve this problem if you restructure your "trusted" apps in
> the right way. �Require a trusted app to guarantee that whenever it
> doesn't hold any suspend blockers, it will do nothing but wait (in a
> poll() system call for example) for a wakeup event. �When the event
> occurs, it must then activate a suspend blocker.
>

This breaks existing apps. It effectively requires that a process that
use suspend blocker do no work that does not block suspend.

> Better yet, make it more fine-grained. �Instead of trusted apps, have
> trusted threads. �Freeze the untrusted threads along with everything
> else, and require the trusted threads to satisfy this guarantee.
>

This would create a minefield of possible deadlocks. You now have to
make sure that your trusted threads do not share any locks with your
untrusted threads. For instance you cannot safely call into the heap
while any threads in your process are frozen.

> In this way, while the system is idle no user timers will get renewed.
> Kernel timers are another matter, but we should be able to handle them.
> There's nothing Android-specific about wanting to reduce kernel timer
> wakeups while in a low-power mode.
>
>> And, it
>> does not solve the problem for systems that enters lower power states
>> from suspend than it can from idle. The last point my not be relevant
>> to android anymore, but desktop systems already have auto suspend and
>> it would be preferable to have a race free kernel api for this.
>
> This is an entirely different matter from the rest of the discussion.
> It would be better to consider this separately after Android's current
> problems have been addressed.
>

Yes there has not been much discussion about this, but I don't
understand why not. Automatic suspend is used outside Android, and it
has the same race conditions that suspend blockers fix.

--
Arve Hj�nnev�g
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Walleij on
2010/6/7 Peter Zijlstra <peterz(a)infradead.org>:
> On Sun, 2010-06-06 at 12:58 -0700, Brian Swetland wrote:
>> Somebody will have to broker a deal with the frameworks/apps folks to
>> get rid of the binder. �They like it a lot. �Of course if somebody
>> built a drop-in replacement for the userspace side that didn't require
>> a kernel driver, had the same performance characteristics, solved the
>> same problems, etc, they could probably make an argument for it (or
>> just provide it as a drop-in replacement for people who want a more
>> "pure" linux underneath Android, even if we didn't pick it up).
>
> So what's up with this Binder stuff, from what I can see its just
> yet-another-CORBA. Why does it need a kernel part at all, can't you
> simply run with a user-space ORB instead?
>
> I really don't get why people keep re-inventing CORBA, there's some
> really nice (free) ORBs out there, like:
>
> �http://www.cs.wustl.edu/~schmidt/TAO.html

There was a mailthread on LKML a while back where binder was
discussed, where Dianne Hackborn explained in detail how Android
uses binder. At the time it was contrasted with D-Bus (the IPC
mechanism that has largely replaced DCOP (KDE) and Bonobo
(GNOME), the latter was actually CORBA-based).

I don't think there was any conclusion, but it was pretty clear that
binder is an Android key asset, actually the key component that
the Android people have brought with them from BeOS to
Palmsource to Android to Google, and they really really like to use
that thing.

It's built into the entire Android userspace for all IPC, except the
stuff that's handled by D-Bus instead (yes they have
both for some cases).

What sets binder aside from the others is that it's kernel-based;
things like low-latency and large buffer-passing have been mentioned
as key features of the kernel driver.

Solving binder one way is to just include it and say it's needed
to run Android, the other is to define the technical issue at hand,
which is: "can the kernel support high-speed, low-latency,
partly marshalled, large-buffer IPC?"

D-Bus (on a local machine, mind you, it can use TCP also) will use
a simple unix domain socket by:

socket(PF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0)
as can be seen here:
http://cgit.freedesktop.org/dbus/dbus/tree/dbus/dbus-sysdeps-unix.c

ACE/TAO as referenced seems to use only TCP sockets actually:
https://svn.dre.vanderbilt.edu/viewvc/Middleware/trunk/ACE/ace/Sock_Connect.cpp?view=co
Perhaps it simply uses 127.0.0.1 for local IPC. (The source is
voluminous and hard for me to navigate, perhaps someone
familiar with it can add something here.)

Then either D-Bus or TAO builds a complete marshalling stack on
top of these sockets, it's all fully abstract, fully userspace. Several
processes and dbus daemons push/pull bytes into these sockets.
I think DCOP and Bonobo basically do the same thing, by
the way.

Binder on the other hand is a large kernel module:
http://android.git.kernel.org/?p=kernel/experimental.git;a=blob;f=drivers/staging/android/binder.c;h=e13b4c4834076eb64680457049832af0b92d88b9;hb=android-2.6.34-test2

It will do some serious reference counting, handshaking back-and-forth
and so on. Basically a lot of the stuff that other IPC mechanisms
also does, but in kernelspace. (OK I'm oversimplifying, binder
is far more lightweight for one.)

The bigger question behind it all is this:

Does the kernel provide the proper support for local IPC
transport, or is there more it could do in terms of interface, latency,
throughput?

A domain socket bitsink should be enough for everybody?

So I would really like to know from the Android people why the
binder is in the kernel, after all. Could it *theoretically* be in
userspace, on top of some unix domain sockets, running as a
real-time scheduled daemon or whatever, still yielding the same
performance? Or is there some discovered limitation with current
interfaces, that everybody ought to know? Especially authors of
D-Bus and TAO etc would be very interested in this I believe.

It's not like I don't understand that it would be hard to move this
thing to userspace, it's more that I'd like to know how you think it
would be impacted by that.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Brian Swetland on
On Mon, Jun 7, 2010 at 4:17 PM, Linus Walleij
<linus.ml.walleij(a)gmail.com> wrote:
>
> So I would really like to know from the Android people why the
> binder is in the kernel, after all. Could it *theoretically* be in
> userspace, on top of some unix domain sockets, running as a
> real-time scheduled daemon or whatever, still yielding the same
> performance? Or is there some discovered limitation with current
> interfaces, that everybody ought to know? Especially authors of
> D-Bus and TAO etc would be very interested in this I believe.
>
> It's not like I don't understand that it would be hard to move this
> thing to userspace, it's more that I'd like to know how you think it
> would be impacted by that.

Fundamentally, yes, you should be able to replicate the functionality
in userspace. We considered this during 1.0 development, but it ended
up being a lot of risk (at the point when it was discussed) compared
to using the existing driver that we had. You almost certainly would
need a central daemon to do some state and permission management as
well as track some of the refcounting, you could use EPIPE on local
sockets to detect remote process termination. You could even just use
local sockets for high level control and use shared memory for actual
message transport to avoid copy-in-copy-out overhead (another binder
driver feature).

That said, the userspace environment was built up around the binder,
relies on it heavily for all ipc (except for dbus which we use for
bluez because it just hasn't been worth the headache to maintain
alternate ipc patches for bluez), and is performance sensitive (it's
possible that you could achieve similar performance with a suitably
clever userspace implementation making use of shared memory, of
course), and the frameworks/apps folks are happy with it as is (so
talking them into replacing it may be a nontrivial exercise).

I wouldn't mind not having to maintain the kernel driver (well, not
having Arve have to maintain the kernel driver...) but building a
pure-userspace replacement would be a pretty huge undertaking,
especially given all the other work we have just with general kernel
development, bringup, etc.

Since all binder comms in userspace bottlenecks through two small
libraries (one C++, one lighter weight C), in theory you could build a
drop-in replacement and then prove it out, verify correctness and
performance, and make the argument for replacing the existing
implementation.

Debugging binder implementation issues under a full system using many
binder services and patterns like "client A calls service B which
returns and object in service C", is a bit of a nightmare. I try to
stay far away from it, myself.

Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arve Hjønnevåg on
2010/6/6 Thomas Gleixner <tglx(a)linutronix.de>:
> On Sat, 5 Jun 2010, Arve Hj�nnev�g wrote:
>> 2010/6/5 Thomas Gleixner <tglx(a)linutronix.de>:
>> >
>> > Can you please explain in a consistent way how the application stack
>> > and the underlying framework (which exists according to android docs)
>> > is handling events and how the separation of trust level works ?
>> >
>>
>> I don't think I can, since I only know small parts of it. I know some
>
> Sigh, thats the whole reason why this discussion goes nowhere.
>

Please keep in mind that we also have third party applications and
that it is not acceptable to break them. So even if I was able to tell
you everything our framework does, you still need to make sure your
solution does not break existing apps.

> How in heavens sake should we be able to decide whether suspend
> blockers are the right and only thing which solves a problem, when the
> folks advocating suspend blockers are not able to explain the problem
> in the first place ?
>
>> events like input event go though a single thread in our system
>> process, while other events like network packets (which are also
>> wakeup events) goes directly to the app.
>
> Yes, we know that already, but that's a completely useless information
> as it does not describe the full constraints and dependencies.
>
> Lemme summarize:
>
> �Android needs suspend blockers, because it works, but cannot explain
> �why it works and why it only works that way.
>
> A brilliant argument to merge them - NOT.
>

Your solution changes the programming model in a way that suspend does
not. Linux allow processes to communicate with each other, and if you
freeze individual processes this breaks. For the android framework
code a lack of a timely response from an application is treated as an
error, and the user is notified that the application is misbehaving.
It may be possible to change the framework to make sure that no
processes are frozen while it is waiting for a response, but this is a
major change and applications that receive wakeup events directly from
the kernel will still be broken.

--
Arve Hj�nnev�g
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arve Hjønnevåg on
2010/6/6 Alan Stern <stern(a)rowland.harvard.edu>:
> On Sat, 5 Jun 2010, Alan Stern wrote:
>
>> > If you are referring to the approach that we don't use suspend but
>> > freeze a cgroup instead, this only solves the problem of bad apps. It
>> > does not help pause timers in trusted user space code and in the
>> > kernel, so it does not lower our average power consumption.
>>
>> You can solve this problem if you restructure your "trusted" apps in
>> the right way. �Require a trusted app to guarantee that whenever it
>> doesn't hold any suspend blockers, it will do nothing but wait (in a
>> poll() system call for example) for a wakeup event. �When the event
>> occurs, it must then activate a suspend blocker.
>>
>> Better yet, make it more fine-grained. �Instead of trusted apps, have
>> trusted threads. �Freeze the untrusted threads along with everything
>> else, and require the trusted threads to satisfy this guarantee.
>>
>> In this way, while the system is idle no user timers will get renewed.
>> Kernel timers are another matter, but we should be able to handle them.
>> There's nothing Android-specific about wanting to reduce kernel timer
>> wakeups while in a low-power mode.
>
> In fact it's possible to do this with only minimal changes to the
> userspace, providing you can specify all your possible hardware wakeup
> sources. �(On the Android this list probably isn't very large -- I
> imagine it includes the keypad, the radio link(s), the RTC, and maybe
> a few switches, buttons, or other things.)
>
> Here's how you can do it. �Extend the userspace suspend-blocker API, so
> that each suspend blocker can optionally have an associated wakeup
> source.
>
> The power-manager process should keep a list of "active" wakeup
> sources. �A source gets removed from the list when an associated
> suspend blocker is activated.
>

How do you do this safely? If you remove the active wakeup only when
activating the suspend blocker, you will never unblock suspend if
another wakeup event happens after user-space blocked suspend but
before user-space read the events.

Also, I'm not sure we can easily associate a wakeup event with a user
space suspend blocker. For instance when an alarm triggers it is
sometimes because of a user-space alarm and sometimes because an
in-kernel alarm.

> When the "active" list is empty and no suspend blockers are activated,
> the power manager freezes ALL other processes, trusted and untrusted
> alike. �It then does a big poll() on all the wakeup sources. �When the
> poll() returns, its output is used to repopulate the "active" list and
> processes are unfrozen.
>
> (You can also include some error detection: If a source remains on the
> "active" list for too long then something has gone wrong.)
>
> To do all this you don't even need to use cgroups. �The existing PM
> implementation allows a user process to freeze everything but itself;
> that's how swsusp and related programs work.
>
> This is still a big-hammer sort of approach, but it doesn't require any
> kernel changes.
>
> Alan Stern
>
>



--
Arve Hj�nnev�g
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/