From: Jiri Olsa on
On Wed, Jun 16, 2010 at 02:45:14PM +0200, Eric Dumazet wrote:
> Le mercredi 16 juin 2010 � 14:24 +0200, Jiri Olsa a �crit :
> > hi,
> >
> > BZ 591015 - kernel BUG at kernel/cred.c:168
> > https://bugzilla.redhat.com/show_bug.cgi?id=591015
> >
> > Above bugzilla reported bug during the releasing of
> > old cred structure.
> >
> > There is reproducer attached to the bugzilla.
> >
> > The issue is caused by releasing old cred struct while other
> > kernel path might be still using it. This leads to cred->usage
> > inconsistency inside the __put_cred and triggering the bug.
> >
> > Following kernel paths are affected:
> >
> > The CPU1 path is setting the new groups creds.
> > The CPU2 path is cat /proc/PID/status
> >
> >
> > CPU 1 CPU 2
> >
> > sys_setgroups proc_pid_status
> > set_current_groups task_state
> > commit_creds rcu_read_lock
> > put_cred ...
> > __put_cred get_cred
> > BUG_ON(usage != 0) ...
> > rcu_read_unlock
> >
> >
> >
> > If __put_cred got executed during the CPU2 holding the reference
> > the BUG_ON inside __put_cred is trigered.
> >
> > I put synchronize_rcu before put_cred call, so we are sure
> > all the rcu_read_lock blocks are finished, and if needed, using
> > new creds.
> >
> > Also I moved rcu_read_unlock below the put_cred for consistency.
> >
> >
> > wbr,
> > jirka
> >
> >
> > Signed-off-by: Jiri Olsa <jolsa(a)redhat.com>
> > ---
> > fs/proc/array.c | 3 ++-
> > kernel/cred.c | 6 ++++++
> > 2 files changed, 8 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/proc/array.c b/fs/proc/array.c
> > index 9b58d38..c0e60d1 100644
> > --- a/fs/proc/array.c
> > +++ b/fs/proc/array.c
> > @@ -199,7 +199,6 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
> > "FDSize:\t%d\n"
> > "Groups:\t",
> > fdt ? fdt->max_fds : 0);
> > - rcu_read_unlock();
> >
> > group_info = cred->group_info;
> > task_unlock(p);
> > @@ -208,6 +207,8 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
> > seq_printf(m, "%d ", GROUP_AT(group_info, g));
> > put_cred(cred);
> >
> > + rcu_read_unlock();
> > +
> > seq_printf(m, "\n");
> > }
> >
> > diff --git a/kernel/cred.c b/kernel/cred.c
> > index a2d5504..4872f12 100644
> > --- a/kernel/cred.c
> > +++ b/kernel/cred.c
> > @@ -510,6 +510,12 @@ int commit_creds(struct cred *new)
> > new->fsgid != old->fsgid)
> > proc_id_connector(task, PROC_EVENT_GID);
> >
> > + /*
> > + * New cred is set already, now let the old one
> > + * chance to disappear on other CPUs.
> > + */
> > + synchronize_rcu();
> > +
> > /* release the old obj and subj refs both */
> > put_cred(old);
> > put_cred(old);
> > --
>
> I respectfully suggest that every patch adding a synchronize_rcu() call
> be reviewed by Paul himself. This is really the last option that should
> be considered, since it adds a big delay.

no problem, I wasn't aware of this

>
> Can you explain why other solutions cannot be used ?

I'm not sure which ones do you mean.. I'm not RCU expert, and
this one occured to me as probable fix, but if there're other
ways, no problem ;)

thanks,
jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric Dumazet on
Le mercredi 16 juin 2010 à 14:24 +0200, Jiri Olsa a écrit :
> hi,
>
> BZ 591015 - kernel BUG at kernel/cred.c:168
> https://bugzilla.redhat.com/show_bug.cgi?id=591015
>
> Above bugzilla reported bug during the releasing of
> old cred structure.
>
> There is reproducer attached to the bugzilla.
>
> The issue is caused by releasing old cred struct while other
> kernel path might be still using it. This leads to cred->usage
> inconsistency inside the __put_cred and triggering the bug.
>
> Following kernel paths are affected:
>
> The CPU1 path is setting the new groups creds.
> The CPU2 path is cat /proc/PID/status
>
>
> CPU 1 CPU 2
>
> sys_setgroups proc_pid_status
> set_current_groups task_state
> commit_creds rcu_read_lock
> put_cred ...
> __put_cred get_cred
> BUG_ON(usage != 0) ...
> rcu_read_unlock
>
>
>
> If __put_cred got executed during the CPU2 holding the reference
> the BUG_ON inside __put_cred is trigered.
>
> I put synchronize_rcu before put_cred call, so we are sure
> all the rcu_read_lock blocks are finished, and if needed, using
> new creds.
>
> Also I moved rcu_read_unlock below the put_cred for consistency.
>
>
> wbr,
> jirka
>
>
> Signed-off-by: Jiri Olsa <jolsa(a)redhat.com>
> ---
> fs/proc/array.c | 3 ++-
> kernel/cred.c | 6 ++++++
> 2 files changed, 8 insertions(+), 1 deletions(-)
>
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index 9b58d38..c0e60d1 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -199,7 +199,6 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
> "FDSize:\t%d\n"
> "Groups:\t",
> fdt ? fdt->max_fds : 0);
> - rcu_read_unlock();
>
> group_info = cred->group_info;
> task_unlock(p);
> @@ -208,6 +207,8 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
> seq_printf(m, "%d ", GROUP_AT(group_info, g));
> put_cred(cred);
>
> + rcu_read_unlock();
> +
> seq_printf(m, "\n");
> }
>
> diff --git a/kernel/cred.c b/kernel/cred.c
> index a2d5504..4872f12 100644
> --- a/kernel/cred.c
> +++ b/kernel/cred.c
> @@ -510,6 +510,12 @@ int commit_creds(struct cred *new)
> new->fsgid != old->fsgid)
> proc_id_connector(task, PROC_EVENT_GID);
>
> + /*
> + * New cred is set already, now let the old one
> + * chance to disappear on other CPUs.
> + */
> + synchronize_rcu();
> +
> /* release the old obj and subj refs both */
> put_cred(old);
> put_cred(old);
> --

I respectfully suggest that every patch adding a synchronize_rcu() call
be reviewed by Paul himself. This is really the last option that should
be considered, since it adds a big delay.

Can you explain why other solutions cannot be used ?

Thanks


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric Dumazet on
Le mercredi 16 juin 2010 à 14:57 +0200, Jiri Olsa a écrit :
> On Wed, Jun 16, 2010 at 02:45:14PM +0200, Eric Dumazet wrote:

> > I respectfully suggest that every patch adding a synchronize_rcu() call
> > be reviewed by Paul himself. This is really the last option that should
> > be considered, since it adds a big delay.
>
> no problem, I wasn't aware of this
>
> >
> > Can you explain why other solutions cannot be used ?
>
> I'm not sure which ones do you mean.. I'm not RCU expert, and
> this one occured to me as probable fix, but if there're other
> ways, no problem ;)

To be fair, I am not RCU expert either, since I make mistakes on each
RCU related patch I post. Welcome to the club :)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jiri Olsa on
On Wed, Jun 16, 2010 at 03:10:37PM +0200, Eric Dumazet wrote:
> Le mercredi 16 juin 2010 � 14:57 +0200, Jiri Olsa a �crit :
> > On Wed, Jun 16, 2010 at 02:45:14PM +0200, Eric Dumazet wrote:
>
> > > I respectfully suggest that every patch adding a synchronize_rcu() call
> > > be reviewed by Paul himself. This is really the last option that should
> > > be considered, since it adds a big delay.
> >
> > no problem, I wasn't aware of this
> >
> > >
> > > Can you explain why other solutions cannot be used ?
> >
> > I'm not sure which ones do you mean.. I'm not RCU expert, and
> > this one occured to me as probable fix, but if there're other
> > ways, no problem ;)
>
> To be fair, I am not RCU expert either, since I make mistakes on each
> RCU related patch I post. Welcome to the club :)
>
>

hm,

maybe I have better solution...

I think there's no need to get the cred refference as long as
the 'cred' handling stays inside the rcu_read_lock block.

And the condition 'make sure task doesn't go away', is done
by proc_single_show as this is the proc file.

I tested this succesfully with the reproducer from
https://bugzilla.redhat.com/show_bug.cgi?id=591015#c19

wbr,
jirka


Signed-off-by: Jiri Olsa <jolsa(a)redhat.com>
---
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 9b58d38..ac3b3a4 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -176,7 +176,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
if (tracer)
tpid = task_pid_nr_ns(tracer, ns);
}
- cred = get_cred((struct cred *) __task_cred(p));
+ cred = __task_cred(p);
seq_printf(m,
"State:\t%s\n"
"Tgid:\t%d\n"
@@ -199,15 +199,14 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
"FDSize:\t%d\n"
"Groups:\t",
fdt ? fdt->max_fds : 0);
- rcu_read_unlock();

group_info = cred->group_info;
task_unlock(p);

for (g = 0; g < min(group_info->ngroups, NGROUPS_SMALL); g++)
seq_printf(m, "%d ", GROUP_AT(group_info, g));
- put_cred(cred);

+ rcu_read_unlock();
seq_printf(m, "\n");
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Howells on
Jiri Olsa <jolsa(a)redhat.com> wrote:

> maybe I have better solution...
>
> I think there's no need to get the cred refference as long as
> the 'cred' handling stays inside the rcu_read_lock block.

I think this is right. There should be no need for a synchronize_rcu() call
to be added in commit_creds() with this as commit_creds() calls put_cred()
which will defer the destruction until the RCU grace period is up anyway.

Whilst I'd prefer to call get_cred() in task_state(), as you point out (and I
hadn't considered), this may see an cred struct that has been detached from
its pointer on another CPU and had its usage count reduced to 0.

In such a case, we can't simply increment the count and then decrement it
again later as it's already on the RCU destruction queue and can't necessarily
be removed so that it can be added back in.

What could be done, though I'm not sure it's worth it, is to use
atomic_inc_not_zero() and loop around if the cred struct has gone out of
service when we try and access it and reread the pointer.

The advantage of this would be that we could manage to hold the RCU read lock
for as little time as possible.

Acked-by: David Howells <dhowells(a)redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/