From: Greg KH on
On Tue, Apr 20, 2010 at 08:29:08PM -0500, Serge E. Hallyn wrote:
> This is a driver that adds Plan 9 style capability device
> implementation. See Documentation/p9auth.txt for a description
> of how to use this.

Hm, you didn't originally write this driver, so it would be good to get
some original authorship information in here to keep everything correct,
right?

> Documentation/p9auth.txt | 47 ++++
> drivers/char/Kconfig | 2 +
> drivers/char/Makefile | 2 +
> drivers/char/p9auth/Kconfig | 9 +
> drivers/char/p9auth/Makefile | 1 +
> drivers/char/p9auth/p9auth.c | 517 ++++++++++++++++++++++++++++++++++++++++++

Is this code really ready for drivers/char/? What has changed in it
that makes it ok to move out of the staging tree?

And who is going to maintain it? You? Or someone else?

> 6 files changed, 578 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/p9auth.txt
> create mode 100644 drivers/char/p9auth/Kconfig
> create mode 100644 drivers/char/p9auth/Makefile
> create mode 100644 drivers/char/p9auth/p9auth.c
>
> diff --git a/Documentation/p9auth.txt b/Documentation/p9auth.txt
> new file mode 100644
> index 0000000..14a69d8
> --- /dev/null
> +++ b/Documentation/p9auth.txt
> @@ -0,0 +1,47 @@
> +The p9auth device driver implements a plan-9 factotum-like
> +capability API. Tasks which are privileged (authorized by
> +possession of the CAP_GRANT_ID privilege (POSIX capability))
> +can write new capabilities to /dev/caphash. The kernel then
> +stores these until a task uses them by writing to the
> +/dev/capuse device. Each capability represents the ability
> +for a task running as userid X to switch to userid Y and
> +some set of groups. Each capability may be used only once,
> +and unused capabilities are cleared after two minutes.
> +
> +The following examples shows how to use the API. Shell 1
> +contains a privileged root shell. Shell 2 contains an
> +unprivileged shell as user 501 in the same user namespace. If
> +not already done, the privileged shell should create the p9auth
> +devices:
> +
> + majfile=/sys/module/p9auth/parameters/cap_major
> + minfile=/sys/module/p9auth/parameters/cap_minor
> + maj=`cat $majfile`
> + mknod /dev/caphash c $maj 0
> + min=`cat $minfile`
> + mknod /dev/capuse c $maj 1
> + chmod ugo+w /dev/capuse

That is incorrect, you don't need the cap_major/minor files at all, the
device node should be automatically created for you, right?

And do you really want to do all of this control through a device node?
Why?

> +DEFINE_MUTEX(cap_mutex); /* TODO fix up the locking one day */

One might hope that today would be that day...

Also, please run this through sparse. This is a variable that doesn't
need to be global.

> +struct cap_dev {
> + struct cdev cdev;
> +};

Do you really need to do it this way? A cdev for a single device node?
That's major overkill.

> +static int cap_major = CAP_MAJOR;
> +static int cap_minor;

Just always use a dynamic misc device, you never need a whole major for
this.

> +module_param(cap_major, int, S_IRUGO);
> +module_param(cap_minor, int, S_IRUGO);

Can be removed.

> +MODULE_AUTHOR("Ashwin Ganti");

Hm, who is going to maintain this, you, or Ashwin?

> +static void hexdump(unsigned char *buf, unsigned int len)
> +{
> + while (len--)
> + printk(KERN_DEBUG "%02x", *buf++);
> + printk(KERN_DEBUG "\n");
> +}

We have a built-in function for this already.

Oh, this function is also incorrect, which is a good reason to use the
built-in ones.

I'm going to stop now, this doesn't belong in drivers/char/ yet, it
needs work...

thanks,

greg k-h

> +/*
> + * read an entry. For now it is
> + * source_user(a)target_user@rand
> + * Next it will become
> + * source_user(a)target_user@target_group(a)numgroups@grp1..(a)grpn@rand
> + */

Hm, wait, one more.... The kernel/user api is going to change some time
in the future? Please fix this now before it gets merged.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Serge E. Hallyn on
Quoting Greg KH (greg(a)kroah.com):
> On Tue, Apr 20, 2010 at 08:29:08PM -0500, Serge E. Hallyn wrote:
> > This is a driver that adds Plan 9 style capability device
> > implementation. See Documentation/p9auth.txt for a description
> > of how to use this.
>
> Hm, you didn't originally write this driver, so it would be good to get
> some original authorship information in here to keep everything correct,
> right?

That's why I left the MODULE_AUTHOR line in there - not sure what
else to do for that. I'll add a comment in p9auth.txt, especially
pointing back to Ashwin's original paper.

> > Documentation/p9auth.txt | 47 ++++
> > drivers/char/Kconfig | 2 +
> > drivers/char/Makefile | 2 +
> > drivers/char/p9auth/Kconfig | 9 +
> > drivers/char/p9auth/Makefile | 1 +
> > drivers/char/p9auth/p9auth.c | 517 ++++++++++++++++++++++++++++++++++++++++++
>
> Is this code really ready for drivers/char/? What has changed in it
> that makes it ok to move out of the staging tree?

It was dropped from staging :) I don't particularly care to see it
go back into staging, as opposed to working out issues out of tree
(assuming they are solvable). For one thing, as you note below,
there is the question of whether it should be a device driver at
all.

> And who is going to maintain it? You? Or someone else?

If Ashwin doesn't want to maintain it, I'll do it. Either way.

> > 6 files changed, 578 insertions(+), 0 deletions(-)
> > create mode 100644 Documentation/p9auth.txt
> > create mode 100644 drivers/char/p9auth/Kconfig
> > create mode 100644 drivers/char/p9auth/Makefile
> > create mode 100644 drivers/char/p9auth/p9auth.c
> >
> > diff --git a/Documentation/p9auth.txt b/Documentation/p9auth.txt
> > new file mode 100644
> > index 0000000..14a69d8
> > --- /dev/null
> > +++ b/Documentation/p9auth.txt
> > @@ -0,0 +1,47 @@
> > +The p9auth device driver implements a plan-9 factotum-like
> > +capability API. Tasks which are privileged (authorized by
> > +possession of the CAP_GRANT_ID privilege (POSIX capability))
> > +can write new capabilities to /dev/caphash. The kernel then
> > +stores these until a task uses them by writing to the
> > +/dev/capuse device. Each capability represents the ability
> > +for a task running as userid X to switch to userid Y and
> > +some set of groups. Each capability may be used only once,
> > +and unused capabilities are cleared after two minutes.
> > +
> > +The following examples shows how to use the API. Shell 1
> > +contains a privileged root shell. Shell 2 contains an
> > +unprivileged shell as user 501 in the same user namespace. If
> > +not already done, the privileged shell should create the p9auth
> > +devices:
> > +
> > + majfile=/sys/module/p9auth/parameters/cap_major
> > + minfile=/sys/module/p9auth/parameters/cap_minor
> > + maj=`cat $majfile`
> > + mknod /dev/caphash c $maj 0
> > + min=`cat $minfile`
> > + mknod /dev/capuse c $maj 1
> > + chmod ugo+w /dev/capuse
>
> That is incorrect, you don't need the cap_major/minor files at all, the
> device node should be automatically created for you, right?

Hmm, where? Not in /dev on my SLES11 partition...

> And do you really want to do all of this control through a device node?
> Why?

Well...

At first I was thinking same as you were. So I was going to switch
to a pure syscall-based approach. But it just turned out more
complicated. The factotum server would call sys_grantid(), and
the target task would end up doing some huge sys_setresugid() or
else multiple syscalls using the granted id. It just was uglier.
I think there's an experimental patchset sitting somewhere I could
point to (if I weren't embarassed :).

Another possibility would be to use netlink, but that doesn't
appear as amenable to segragation by user namespaces. The pid
(presumably/hopefully global pid, as __u32) is available, so it
shouldn't be impossible, but a simple device with simple synchronous
read/write certainly has its appeal. Firing off a message hoping
that at some point our credentials will be changes, less so.

> > +DEFINE_MUTEX(cap_mutex); /* TODO fix up the locking one day */
>
> One might hope that today would be that day...

Well there's nothing actually wrong with the cap_mutex. I put in
this comment thinking I'd switch to rcu at some point, but it doesn't
seem worthwhile at the moment, by a long shot.

> Also, please run this through sparse. This is a variable that doesn't
> need to be global.
>
> > +struct cap_dev {
> > + struct cdev cdev;
> > +};
>
> Do you really need to do it this way? A cdev for a single device node?
> That's major overkill.
>
> > +static int cap_major = CAP_MAJOR;
> > +static int cap_minor;
>
> Just always use a dynamic misc device, you never need a whole major for
> this.

Right - will switch that over, assuming we don't nix the whole idea
first.

> > +module_param(cap_major, int, S_IRUGO);
> > +module_param(cap_minor, int, S_IRUGO);
>
> Can be removed.
>
> > +MODULE_AUTHOR("Ashwin Ganti");
>
> Hm, who is going to maintain this, you, or Ashwin?

I haven't asked Ashwin, and will be happy either way. Ashwin, did
you want to maintain it?

> > +static void hexdump(unsigned char *buf, unsigned int len)
> > +{
> > + while (len--)
> > + printk(KERN_DEBUG "%02x", *buf++);
> > + printk(KERN_DEBUG "\n");
> > +}
>
> We have a built-in function for this already.
>
> Oh, this function is also incorrect, which is a good reason to use the
> built-in ones.

Really I prefer to get rid of that altogether. Anyone who wants
to see that can use a kprobe hooked to the exit of cap_hash().

> I'm going to stop now, this doesn't belong in drivers/char/ yet, it
> needs work...

/me rolls up his sleeves.

> thanks,
>
> greg k-h
>
> > +/*
> > + * read an entry. For now it is
> > + * source_user(a)target_user@rand
> > + * Next it will become
> > + * source_user(a)target_user@target_group(a)numgroups@grp1..(a)grpn@rand
> > + */
>
> Hm, wait, one more.... The kernel/user api is going to change some time
> in the future? Please fix this now before it gets merged.

Huh, no that change already happened.

thanks,
-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ashwin Ganti on
On Tue, Apr 20, 2010 at 8:45 PM, Serge E. Hallyn <serue(a)us.ibm.com> wrote:
> Quoting Greg KH (greg(a)kroah.com):
>> Hm, who is going to maintain this, you, or Ashwin?
>
> I haven't asked Ashwin, and will be happy either way. �Ashwin, did
> you want to maintain it?

Well, since you have been driving a lot of implementation changes
recently, I feel it makes sense for you to be a maintainer. Honestly I
don't think I will be able to find time maintaining this in the future
although I can do the code reviews for you. I will be more than happy
if this feature continues to get traction.

- Ashwin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Serge E. Hallyn on
Quoting Kyle Moffett (kyle(a)moffetthome.net):
> On Wed, Apr 21, 2010 at 15:15, Eric W. Biederman <ebiederm(a)xmission.com> wrote:
> > "Serge E. Hallyn" <serue(a)us.ibm.com> writes:
> >
> >> Ignoring namespaces for a moment, I guess we could do something like
> >>
> >> struct credentials_pass {
> >> � � � pid_t global_pid;
> >> � � � unsigned long unique_id;
> >> � � � uid_t new_uid;
> >> � � � gid_t new_gid;
> >> � � � int num_aux_gids;
> >> � � � gid_t aux_gids[];
> >> }
> >
> > This looks surprising like what I am doing in passing uids and pids
> > through unix domain sockets.
> >
> > So if this looks like a direction we want to go it shouldn't be too
> > difficult.
>
> Hmm... for an alternative idea:
>
> We have this nice "kernel keyring" infrastructure that lets us stuff
> arbitrary things into "keys" and grant/revoke them between processes.
> What if we created a relatively generic way for processes to package
> up privileges (of whatever form) into a "key" that could be granted to
> another process (via UNIX-domain socket)? Then the other process
> would use a setuid()-ish syscall which would instead apply a specific
> key as your credentials, possibly including the audit context and/or
> namespaces it came from.
>
> By using the keyring system, such tokens could be kept around across
> multiple processes easily (as opposed to FDs), in the same style as a
> "sudo" ticket file, for example (even with an expiration time).
>
> Types of credentials you could pass around:
> * Capabilities
> * Filesystem UID/GID in a particular UID namespace (for FS operations)
> * Process UID/GID in a particular UID namespace (for kill(), etc)
> * Audit contexts
> * SELinux/etc security labels
>
> All of the above could be optionally limited to effectively require a
> bprm-secure-style exec() with specific args. So for example, instead
> of making "/usr/sbin/passwd" a setuid program, you could make it be an
> unprivileged helper. It would connect to a privileged daemon and ask
> for a password-change cookie for that particular user. The daemon
> would create what is essentially a "delayed exec" key which grants a
> specific UID and capabilities when that process performs an execkey().

Hi Kyle,

it's a neat idea in terms of flexibility - but in this case the
flexibility really scares me :)

In particular I don't like the ability to keep these tokens around
for later use, and your example of passing around capabilities is
something we've specifically rejected in the past.

That doesn't mean that it wouldn't be a good idea to take your
general approach, use it very inflexibly, and then slowly, as we
gain experience, add flexibility.

I think we are best off starting with two options:

1. pass audit credentials only
2. pass full credentials

credentials meaning uid/gid, though - and let capabilities be
sorted out by the setuid rules.

> So as an example, you could rewrite "sudo" as a partially-privileged
> daemon and an unprivileged helper. The unpriv helper would send
> across a request (optionally including the command and environment)
> which would be checked by the daemon. It would then issue a key to
> allow the unpriv helper to perform a limited exec.
>
> Another option would be to rewrite network login programs (eg OpenSSH)
> to use this for privilege separation. The listening process would get
> a non-expiring key to allow it to exec a partially-privileged
> password-checking program. If the password-checking program likes the
> password it generates a single-use key to pass back to the forked
> network process that allows it to exec a program as that user.

Note that the last two examples are possible with with the simple
p9auth driver, and are really their motivation. In fact, we only
need a single privileged back-end (factotum) server to server a bunch
of unprivileged front-end servers.

So over the next few days I will incorporate a bunch of the feedback
on the p9auth driver itself, turn it into a filesystem, and resend
it. That's not to say that I don't like the other suggestions - I
want to pursue those as well and we can see where we end up and which
approach we prefer. For a first step I look forward to Eric pushing
the patches to do uid and pid translation for unix sock credentials.

thanks,
-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Kyle Moffett on
On Wed, Apr 21, 2010 at 15:15, Eric W. Biederman <ebiederm(a)xmission.com> wrote:
> "Serge E. Hallyn" <serue(a)us.ibm.com> writes:
>
>> Ignoring namespaces for a moment, I guess we could do something like
>>
>> struct credentials_pass {
>>       pid_t global_pid;
>>       unsigned long unique_id;
>>       uid_t new_uid;
>>       gid_t new_gid;
>>       int num_aux_gids;
>>       gid_t aux_gids[];
>> }
>
> This looks surprising like what I am doing in passing uids and pids
> through unix domain sockets.
>
> So if this looks like a direction we want to go it shouldn't be too
> difficult.

Hmm... for an alternative idea:

We have this nice "kernel keyring" infrastructure that lets us stuff
arbitrary things into "keys" and grant/revoke them between processes.
What if we created a relatively generic way for processes to package
up privileges (of whatever form) into a "key" that could be granted to
another process (via UNIX-domain socket)? Then the other process
would use a setuid()-ish syscall which would instead apply a specific
key as your credentials, possibly including the audit context and/or
namespaces it came from.

By using the keyring system, such tokens could be kept around across
multiple processes easily (as opposed to FDs), in the same style as a
"sudo" ticket file, for example (even with an expiration time).

Types of credentials you could pass around:
* Capabilities
* Filesystem UID/GID in a particular UID namespace (for FS operations)
* Process UID/GID in a particular UID namespace (for kill(), etc)
* Audit contexts
* SELinux/etc security labels

All of the above could be optionally limited to effectively require a
bprm-secure-style exec() with specific args. So for example, instead
of making "/usr/sbin/passwd" a setuid program, you could make it be an
unprivileged helper. It would connect to a privileged daemon and ask
for a password-change cookie for that particular user. The daemon
would create what is essentially a "delayed exec" key which grants a
specific UID and capabilities when that process performs an execkey().

So as an example, you could rewrite "sudo" as a partially-privileged
daemon and an unprivileged helper. The unpriv helper would send
across a request (optionally including the command and environment)
which would be checked by the daemon. It would then issue a key to
allow the unpriv helper to perform a limited exec.

Another option would be to rewrite network login programs (eg OpenSSH)
to use this for privilege separation. The listening process would get
a non-expiring key to allow it to exec a partially-privileged
password-checking program. If the password-checking program likes the
password it generates a single-use key to pass back to the forked
network process that allows it to exec a program as that user.

Cheers,
Kyle Moffett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/