From: Serge E. Hallyn on
Quoting Alan Cox (alan(a)lxorguk.ukuu.org.uk):
> > sorry I thought I had cc:d you, bc I was pretty sure you'd have some
> > neat ideas. Like this one.
> >
> > One could try to argue that this makes every linux process susceptible
> > to a trojan making it grant its userid to other tasks, but of course
> > that's silly since the trojan could just fork. Well, what this would
> > buy the attacker is the ability to sit inconspicuously under his old
> > userid, holding on to the fd until the admin goes out to coffee before
> > switching userids.
>
> Possibly, could you put a timestamp in the passed object ? (Given the
> expiry of such a uid offer is kind of policy)

Yup, in fact I do that for the p9auth device driver anyway, to avoid
letting a user take up all kernel memory with unused setuid tokens.

> > The other thing is that offhand I think the server can't easily tell
> > from the socket which user namespace the client is in, as ucred only
> > has .uid. Though (1) we might need to create a 'struct puser' analogous
> > to 'struct pid' for signals anyway, (2) userspace can segragate with
> > fs or net_ns (if abstract sock), and (3) client in a container
> > presumably won't be able to authenticate itself to server on the
> > host anyway.
>
> That I think needs fixing anyway and now is a good time as any to do it.
> If you are going to be able to pass userids then you need to be sure you
> don't get passed a handle with a uid change on it that you didn't want so
> adding another object type and option of some kind to accept them has to
> occur anyway.

Ignoring namespaces for a moment, I guess we could do something like

struct credentials_pass {
pid_t global_pid;
unsigned long unique_id;
uid_t new_uid;
gid_t new_gid;
int num_aux_gids;
gid_t aux_gids[];
}

The unique_id is assigned by the kernel; the client reads the whole
struct credentials_pass - except global_pid - from the unix sock,
verifies the desired credentials, then does sys_acceptid(unique_id)
to accept.

Again, how to pass user namespace information is not obvious to me.

And it's probably debatable whether we want to also pass info about
posix capabilities and LSM security data.

Still, my main concern with this approach is that it makes the server
more complicated. Since I'm passing along my current credentials, that
means I have to construct just the right credentials, which if using
sock identified by pathname may mean I can no longer write to the
socket, right?

And of course one other important consideration is that the overall
p9auth API has been heavily discussed and documented, whereas
implementing our own entirely new approach is breaking new ground
and therefore a lot scarier.

> That also btw needs fixing for other reasons - more than one daemon has
> been written that generically uses recvmsg and so can be attacked with FD
> leaks >-)

Yup.

(By 'needs fixing' you just mean needs to be done right for this
service? Else I think I'm missing something...)

thanks,
-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric W. Biederman on
"Serge E. Hallyn" <serue(a)us.ibm.com> writes:

> Ignoring namespaces for a moment, I guess we could do something like
>
> struct credentials_pass {
> pid_t global_pid;
> unsigned long unique_id;
> uid_t new_uid;
> gid_t new_gid;
> int num_aux_gids;
> gid_t aux_gids[];
> }

This looks surprising like what I am doing in passing uids and pids
through unix domain sockets.

So if this looks like a direction we want to go it shouldn't be too
difficult.

>> That also btw needs fixing for other reasons - more than one daemon has
>> been written that generically uses recvmsg and so can be attacked with FD
>> leaks >-)
>
> Yup.
>
> (By 'needs fixing' you just mean needs to be done right for this
> service? Else I think I'm missing something...)

Remember my unix domain socket and the patch for converting struct cred
into a new context, from a month or so ago. I think that is what we
are talking about.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Serge E. Hallyn on
Quoting Eric W. Biederman (ebiederm(a)xmission.com):
> "Serge E. Hallyn" <serue(a)us.ibm.com> writes:
>
> > Ignoring namespaces for a moment, I guess we could do something like
> >
> > struct credentials_pass {
> > pid_t global_pid;
> > unsigned long unique_id;
> > uid_t new_uid;
> > gid_t new_gid;
> > int num_aux_gids;
> > gid_t aux_gids[];
> > }
>
> This looks surprising like what I am doing in passing uids and pids
> through unix domain sockets.
>
> So if this looks like a direction we want to go it shouldn't be too
> difficult.
>
> >> That also btw needs fixing for other reasons - more than one daemon has
> >> been written that generically uses recvmsg and so can be attacked with FD
> >> leaks >-)
> >
> > Yup.
> >
> > (By 'needs fixing' you just mean needs to be done right for this
> > service? Else I think I'm missing something...)
>
> Remember my unix domain socket and the patch for converting struct cred
> into a new context, from a month or so ago. I think that is what we
> are talking about.

Zoinks! After some digging I found it in my containers.mbox and at
https://lists.linux-foundation.org/pipermail/containers/2010-March/023405.html
and see you even called me out. Sorry! I see your tree at
http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/ebiederm/linux-2.6.33-nsfd-v5.git;a=summary
and commit "af_unix: Allow SO_PEERCRED to work across namespaces", and
it all looks good. Definately useful for a SO_PASSCRED or somesuch
implementation.

thanks,
-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Serge E. Hallyn on
Quoting Eric W. Biederman (ebiederm(a)xmission.com):
> "Serge E. Hallyn" <serue(a)us.ibm.com> writes:
>
> > Quoting Greg KH (greg(a)kroah.com):
> >> On Tue, Apr 20, 2010 at 08:29:08PM -0500, Serge E. Hallyn wrote:
> >> > This is a driver that adds Plan 9 style capability device
> >> > implementation. See Documentation/p9auth.txt for a description
> >> > of how to use this.
> >>
> >> Hm, you didn't originally write this driver, so it would be good to get
> >> some original authorship information in here to keep everything correct,
> >> right?
> >
> > That's why I left the MODULE_AUTHOR line in there - not sure what
> > else to do for that. I'll add a comment in p9auth.txt, especially
> > pointing back to Ashwin's original paper.
> >
> >> > Documentation/p9auth.txt | 47 ++++
> >> > drivers/char/Kconfig | 2 +
> >> > drivers/char/Makefile | 2 +
> >> > drivers/char/p9auth/Kconfig | 9 +
> >> > drivers/char/p9auth/Makefile | 1 +
> >> > drivers/char/p9auth/p9auth.c | 517 ++++++++++++++++++++++++++++++++++++++++++
> >>
> >> Is this code really ready for drivers/char/? What has changed in it
> >> that makes it ok to move out of the staging tree?
> >
> > It was dropped from staging :) I don't particularly care to see it
> > go back into staging, as opposed to working out issues out of tree
> > (assuming they are solvable). For one thing, as you note below,
> > there is the question of whether it should be a device driver at
> > all.
> >
> >> And who is going to maintain it? You? Or someone else?
> >
> > If Ashwin doesn't want to maintain it, I'll do it. Either way.
> >
> >> > 6 files changed, 578 insertions(+), 0 deletions(-)
> >> > create mode 100644 Documentation/p9auth.txt
> >> > create mode 100644 drivers/char/p9auth/Kconfig
> >> > create mode 100644 drivers/char/p9auth/Makefile
> >> > create mode 100644 drivers/char/p9auth/p9auth.c
> >> >
> >> > diff --git a/Documentation/p9auth.txt b/Documentation/p9auth.txt
> >> > new file mode 100644
> >> > index 0000000..14a69d8
> >> > --- /dev/null
> >> > +++ b/Documentation/p9auth.txt
> >> > @@ -0,0 +1,47 @@
> >> > +The p9auth device driver implements a plan-9 factotum-like
> >> > +capability API. Tasks which are privileged (authorized by
> >> > +possession of the CAP_GRANT_ID privilege (POSIX capability))
> >> > +can write new capabilities to /dev/caphash. The kernel then
> >> > +stores these until a task uses them by writing to the
> >> > +/dev/capuse device. Each capability represents the ability
> >> > +for a task running as userid X to switch to userid Y and
> >> > +some set of groups. Each capability may be used only once,
> >> > +and unused capabilities are cleared after two minutes.
> >> > +
> >> > +The following examples shows how to use the API. Shell 1
> >> > +contains a privileged root shell. Shell 2 contains an
> >> > +unprivileged shell as user 501 in the same user namespace. If
> >> > +not already done, the privileged shell should create the p9auth
> >> > +devices:
> >> > +
> >> > + majfile=/sys/module/p9auth/parameters/cap_major
> >> > + minfile=/sys/module/p9auth/parameters/cap_minor
> >> > + maj=`cat $majfile`
> >> > + mknod /dev/caphash c $maj 0
> >> > + min=`cat $minfile`
> >> > + mknod /dev/capuse c $maj 1
> >> > + chmod ugo+w /dev/capuse
> >>
> >> That is incorrect, you don't need the cap_major/minor files at all, the
> >> device node should be automatically created for you, right?
> >
> > Hmm, where? Not in /dev on my SLES11 partition...
> >
> >> And do you really want to do all of this control through a device node?
> >> Why?
> >
> > Well...
> >
> > At first I was thinking same as you were. So I was going to switch
> > to a pure syscall-based approach. But it just turned out more
> > complicated. The factotum server would call sys_grantid(), and
> > the target task would end up doing some huge sys_setresugid() or
> > else multiple syscalls using the granted id. It just was uglier.
> > I think there's an experimental patchset sitting somewhere I could
> > point to (if I weren't embarassed :).
> >
> > Another possibility would be to use netlink, but that doesn't
> > appear as amenable to segragation by user namespaces. The pid
> > (presumably/hopefully global pid, as __u32) is available, so it
> > shouldn't be impossible, but a simple device with simple synchronous
> > read/write certainly has its appeal. Firing off a message hoping
> > that at some point our credentials will be changes, less so.
>
> pid in the netlink context is the netlink port-id. It is a very
> different concept from struct pid. These days netlink calls to
> the kernel are synchronous, not that I would encourage netlink
> for anything except networking code.
>
> Can we make this a trivial filesystem? I expect that would match
> up better with whatever plan9 userspace apps already exist,
> remove the inode double translation, and would make it much more
> reasonable to do a user namespace aware version if and when

BTW, this current version is user namespace aware.

> that becomes necessary.

An fs actually seems overkill for two write-only files for
process-related information. Would these actually be candidates
for new /proc files?

/proc/grantcred - replaces /dev/caphash, for privileged
tasks to tell the kernel about new setuid
capabilities
/proc/self/usecred - replaces /dev/capuse for unprivileged
tasks to make use of a setuid capability

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: ron minnich on
On Fri, Apr 23, 2010 at 8:36 PM, Serge E. Hallyn <serue(a)us.ibm.com> wrote:

> An fs actually seems overkill for two write-only files for
> process-related information. �Would these actually be candidates
> for new /proc files?
>
> � � � �/proc/grantcred - replaces /dev/caphash, for privileged
> � � � � � � � �tasks to tell the kernel about new setuid
> � � � � � � � �capabilities
> � � � �/proc/self/usecred - replaces /dev/capuse for unprivileged
> � � � � � � � �tasks to make use of a setuid capability

An fs is fine.

To relate this to Plan 9, where it all began, might be useful. There's
no equivalent in Plan 9 to Linux/Unix devices of the major/minor
number etc. variety. In-kernel drivers and out-of-kernel servers both
end up providing the services (i.e. file name spaces) that we see in a
Linux file system. So the Plan 9 driver for the capability device
really does match closely in function and interface to a Linux
kernel-based file system.

Hence, making devcap a file system is entirely appropriate, because it
best fits the way it works in Plan 9: a kernel driver that provides
two files.

It's pretty easy to write a Linux VFS anyway, so it makes sense from
that point of view.

Eric, that was a great suggestion.

ron
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/