From: Stanislaw Gruszka on
On Mon, Mar 29, 2010 at 08:13:29PM +0200, Oleg Nesterov wrote:
> - change __exit_signal() to do __unhash_process() before we accumulate
> the counters in ->signal
>
> - add a couple of barriers into thread_group_cputime() and __exit_signal()
> to make sure thread_group_cputime() can never account the same thread
> twice if it races with exit.
>
> If any thread T was already accounted in ->signal, next_thread() or
> pid_alive() must see the result of __unhash_process(T).

Memory barriers prevents to account times twice, but as we write
sig->{u,s}time and sig->sum_sched_runtime on one cpu and read them
on another cpu, without a lock, this patch make theoretically possible
that some accumulated values of struct task_cputime may include exited
task values and some not. In example times->utime will include values
from 10 threads, and times->{stime,sum_exec_runtime} values form 9
threads, because local cpu updates sig->utime but not two other values.
This can make scaling in thread_group_times() not correct. I'm not sure
how big drawback is that.

Stanislaw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Oleg Nesterov on
On 03/30, Stanislaw Gruszka wrote:
>
> On Mon, Mar 29, 2010 at 08:13:29PM +0200, Oleg Nesterov wrote:
> > - change __exit_signal() to do __unhash_process() before we accumulate
> > the counters in ->signal
> >
> > - add a couple of barriers into thread_group_cputime() and __exit_signal()
> > to make sure thread_group_cputime() can never account the same thread
> > twice if it races with exit.
> >
> > If any thread T was already accounted in ->signal, next_thread() or
> > pid_alive() must see the result of __unhash_process(T).
>
> Memory barriers prevents to account times twice, but as we write
> sig->{u,s}time and sig->sum_sched_runtime on one cpu and read them
> on another cpu, without a lock, this patch make theoretically possible
> that some accumulated values of struct task_cputime may include exited
> task values and some not. In example times->utime will include values
> from 10 threads, and times->{stime,sum_exec_runtime} values form 9
> threads, because local cpu updates sig->utime but not two other values.

Yes sure. From 4/4 changelog:

This is a user visible change. Without ->siglock we can't get the "whole"
info atomically, and if we race with exit() we can miss the exiting thread.

this also means that it is possible that we don't miss the exiting thread
"completely", but we miss its, say, ->stime.

> This can make scaling in thread_group_times() not correct.

Again, see above. The changelog explicitely explains that this patch
assumes it is OK to return the info which is not atomic/accurate (as
it seen under ->siglock).

> I'm not sure
> how big drawback is that.

Neither me, that is why I am asking for comments.

To me this looks acceptable, but I can't judge.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/