From: David Howells on
Linus Torvalds <torvalds(a)linux-foundation.org> wrote:

> The whole patch seems to be based on "nobody can ever use
> get_cred/put_cred, because concurrent use will then trigger the
> BUG_ON() in __put_cred()".

That's not the problem.

The problem is that task_state() accesses the target task's credentials whilst
only holding the RCU read lock. That means that the existence of the cred
struct so accessed can only be guaranteed up to the point that the RCU read
lock is released.

What we shouldn't do is increment the usage count on the credentials because
we're not holding a lock that will prevent the target task reducing the
refcount on those credentials to zero between us reading the cred pointer and
incrementing the count.

If we want to increment the usage count on the credentials, we need to prevent
the target task from modifying its own credentials whilst we do it. Currently,
we can't do that as, taking an example, sys_setuid() doesn't hold hold any sort
of lock. We would have to add a spinlock or something like that for
commit_creds() to take.

What we have to do instead is grab any values we want from the cred struct
before releasing the RCU read lock. The moment we drop the lock, the cred
struct may cease to exist, even if we did increment its count.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on
On Tue, Jul 27, 2010 at 9:46 AM, David Howells <dhowells(a)redhat.com> wrote:
>
> That's not the problem.
>
> The problem is that task_state() accesses the target task's credentials whilst
> only holding the RCU read lock. �That means that the existence of the cred
> struct so accessed can only be guaranteed up to the point that the RCU read
> lock is released.

Umm. In that case, get_task_cred() is actively misleading.

What you are saying is that you cannot do

rcu_read_lock()
__cred = (struct cred *) __task_cred((task));
get_cred(__cred);
rcu_read_unlock();

but that is _exactly_ what get_task_cred() does. And that
__task_cred() check checks that we have

rcu_read_lock_held() || lockdep_tasklist_lock_is_held()

and what you are describing would require us to have a '&&' rather
than a '||' in that test. Because it is _not_ sufficient to have just
the rcu_read_lock held.

So it looks like the validation is simply wrong. The __task_cred()
helper is buggy. It's used for two different cases, and they have
totally different locking requirements.

Case #1:
- you can do __task_cred() with just read-lock held, but then you
cannot add refs to it

Case #2:
- you can do __task_cred() with read-lock held _and_ guaranteeing
that the task doesn't go away, and then you can hold a ref to it as
long as you still guarantee the task is around.

And the comments are actively wrong. The comments talk about the "case
#2" thing only. Ignoring case #1, except for the fact that the _check_
allows case #1, so you never get a warning from the RCU "proving" code
even for incorrect code.

So presumably Jiri's patch is correct, but the reason the bug happened
in the first place is that all those accessor functions are totally
confused about how they supposed to be used, with incorrect comments
and incorrect access checks.

That should get fixed. Who knows how many other buggy users there are
due to the confusion?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jiri Olsa on
On Tue, Jul 27, 2010 at 10:56:20AM -0700, Linus Torvalds wrote:
> On Tue, Jul 27, 2010 at 9:46 AM, David Howells <dhowells(a)redhat.com> wrote:
> >
> > That's not the problem.
> >
> > The problem is that task_state() accesses the target task's credentials whilst
> > only holding the RCU read lock. �That means that the existence of the cred
> > struct so accessed can only be guaranteed up to the point that the RCU read
> > lock is released.
>
> Umm. In that case, get_task_cred() is actively misleading.
>
> What you are saying is that you cannot do
>
> rcu_read_lock()
> __cred = (struct cred *) __task_cred((task));
> get_cred(__cred);
> rcu_read_unlock();
>
> but that is _exactly_ what get_task_cred() does. And that
right, get_task_cred looks like source for similar bugs.. will check

> __task_cred() check checks that we have
>
> rcu_read_lock_held() || lockdep_tasklist_lock_is_held()
>
> and what you are describing would require us to have a '&&' rather
> than a '||' in that test. Because it is _not_ sufficient to have just
> the rcu_read_lock held.
>
> So it looks like the validation is simply wrong. The __task_cred()
> helper is buggy. It's used for two different cases, and they have
> totally different locking requirements.
>
> Case #1:
> - you can do __task_cred() with just read-lock held, but then you
> cannot add refs to it
>
> Case #2:
> - you can do __task_cred() with read-lock held _and_ guaranteeing
> that the task doesn't go away, and then you can hold a ref to it as
> long as you still guarantee the task is around.
>
> And the comments are actively wrong. The comments talk about the "case
> #2" thing only. Ignoring case #1, except for the fact that the _check_
> allows case #1, so you never get a warning from the RCU "proving" code
> even for incorrect code.
>
> So presumably Jiri's patch is correct, but the reason the bug happened
> in the first place is that all those accessor functions are totally
> confused about how they supposed to be used, with incorrect comments
> and incorrect access checks.
>
> That should get fixed. Who knows how many other buggy users there are
> due to the confusion?
I'll see if I can find some other places

thanks,
jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Howells on
Linus Torvalds <torvalds(a)linux-foundation.org> wrote:

> So it looks like the validation is simply wrong. The __task_cred()
> helper is buggy. It's used for two different cases, and they have
> totally different locking requirements.

I think part of my comment on __task_cred():

* The caller must make sure task doesn't go away, either by holding a
* ref on task or by holding tasklist_lock to prevent it from being
* unlinked.

may be obvious and perhaps unnecessary - anyone attempting to access a
task_struct should know that they need to prevent it from going away whilst
they do it - and I think this has led to Paul McKenny misunderstanding the
intent. What I was trying to convey is the destruction of the task_struct
involves discarding the credentials it points to.

Either I should insert the word 'also' into that comment after 'must' or just
get rid of it entirely.

I think I should remove lock_dep_tasklist_lock_is_held() from the stated
checks. It doesn't add anything, and someone calling __task_cred() on their
own process (perhaps indirectly) doesn't need the tasklist lock.

> Umm. In that case, get_task_cred() is actively misleading.
>
> What you are saying is that you cannot do
>
> rcu_read_lock()
> __cred = (struct cred *) __task_cred((task));
> get_cred(__cred);
> rcu_read_unlock();

Yeah. I think there are three alternatives:

(1) get_task_cred() could be done by doing { __task_cred(),
atomic_inc_not_zero() } in a loop until we manage to come up with the
goods. It probably wouldn't be all that inefficient as creds don't change
very often as a rule.

(2) Get a spinlock in commit_creds() around the point where we alter the cred
pointers.

(3) Try and get rid of get_task_cred() and use other means. I've gone through
all the users of get_task_cred() (see below) and this may be viable,
though restrictive, in places.

Any thoughts as to which is the best?

> So presumably Jiri's patch is correct, but the reason the bug happened
> in the first place is that all those accessor functions are totally
> confused about how they supposed to be used, with incorrect comments
> and incorrect access checks.

At some point in the past I think I discarded a lock from the code:-/

> That should get fixed. Who knows how many other buggy users there are
> due to the confusion?

Some.

warthog>grep get_task_cred `find . -name "*.[ch]"`
./kernel/auditsc.c: const struct cred *cred = get_task_cred(tsk);

This is audit_filter_rules() which is used by:

- audit_filter_task()
- audit_alloc() as called from copy_process() with the new process
- audit_filter_syscall()
- audit_get_context()
- audit_free() as called from copy_process() with the new, now dead process
- audit_free() as called from do_exit() with the dead process
- audit_syscall_exit() which passes current.
- audit_syscall_entry() which passes current.
- audit_filter_inodes()
- audit_update_watch() which passes current
- audit_get_context()
- see above.

which I think are all safe because when it's a new/dead process being accessed,
that process can't be modifying itself at that point, otherwise it's current
being accessed.

./kernel/cred.c: old = get_task_cred(daemon);

This is prepare_kernel_cred() which is used by:

- cachefiles_get_security_ID() which passes current.

so this is safe.

./include/linux/cred.h: * get_task_cred - Get another task's objective credentials
./include/linux/cred.h:#define get_task_cred(task) \

The header file stuff under discussion.

./security/security.c: cred = get_task_cred(tsk);
./security/security.c: cred = get_task_cred(tsk);

These are security_real_capable{,_noaudit}() if CONFIG_SECURITY=y, which could
be a problem since they're used by has_capability{,_noaudit}() and called by
things like the OOM killer with tasks other than current.

However, I'm not sure it's necessary for get_task_cred() to be used here (in
security/security.c) as it doesn't appear that the capable() LSM method sleeps
in the one LSM that implements it (SELinux) or in the commoncap code.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Howells on
David Howells <dhowells(a)redhat.com> wrote:

> Yeah. I think there are three alternatives:

There's a fourth alternative too:

(4) I could try and make it so that if the RCU cleanup routine sees it with a
non-zero usage count, then it just ignores it. This, however, would
require call_rcu() to be able to cope with requeueing.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/