From: Oleg Nesterov on
On 03/24, Peter Zijlstra wrote:
>
> On Wed, 2010-03-24 at 21:45 +0100, Oleg Nesterov wrote:
> > Nowadays ->siglock is overloaded, it would be really nice to change
> > do_task_stat() to walk through the list of threads lockless. And note
> > that we are doing while_each_thread() twice!
> >
> > while_each_thread() is rcu-safe, but thread_group_times() also needs
> > ->siglock to serialize the modifications of signal_struct->prev_Xtime
> > members.

First of all, let me reply to myself. I see that I wasn't clear at all.

This patch does the first step to remove one reason for ->siglock
(modification of ->prev_Xtime). But this is very minor, I guess we
could change thread_group_times() to take signal->cputimer->lock.

The goal was to call thread_group_cputime() lockless under rcu lock
(either directly, or via thread_group_times(), this doesn't matter)
to avoid while_each_thread() under ->siglock.

And in this case /proc/pid/stat can't report utime/stime atomically.
Whatever we do we can race with exit, so it doesn't make sense to
play with ->prev_Xtime.

> Right, so from what I remember the issue is that, yes top et al rely on
> that monotonicity,

Really? So, do you think the change above will break user-space?

How sad :/

> but more importantly I think
> clock_gettime(CLOCK_PROCESS_CPUTIME_ID) should indeed use ->siglock to
> ensure it serializes against do_exit() so that either we iterate the
> thread or get the accumulated runtime from signal_struct but not both
> (or neither).

Oh. I forgot everything I knew about posix-cpu-timers... But, it seems,
posix_cpu_clock_get() calls thread_group_cputime() under tasklist and
thus can't race with exit.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Stanislaw Gruszka on
On Thu, Mar 25, 2010 at 01:12:50PM +0100, Oleg Nesterov wrote:
> > but more importantly I think
> > clock_gettime(CLOCK_PROCESS_CPUTIME_ID) should indeed use ->siglock to
> > ensure it serializes against do_exit() so that either we iterate the
> > thread or get the accumulated runtime from signal_struct but not both
> > (or neither).
>
> Oh. I forgot everything I knew about posix-cpu-timers... But, it seems,
> posix_cpu_clock_get() calls thread_group_cputime() under tasklist and
> thus can't race with exit.

We assure thread_group_cputime() is called with one of: tasklist_lock
or ->siglock to avoid races with __exit_signal. Except oom killer and
elf core-dump code where is no lock, where we assume exit is not called
or we don't care of inaccurate results.

Stanislaw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/