while_each_thread() under rcu_read_lock() is broken? [Kernel]

Prev: [PATCH] x86: Fix vsyscall on gcc 4.5 with -Os
Next: [PATCH 10/12] scsi: megaraid_sas - Add input parameter for max_sectors

From: Paul E. McKenney on 8 Jul 2010 20:50

On Thu, Jul 08, 2010 at 04:59:25PM -0700, Roland McGrath wrote:
> > We can't do call_rcu(release_task),
> > we can't take tasklist for writing in the softirq context. But even
> > if we could, this can't help in fact or I missed something.
>
> Ah, I had missed that constraint of call_rcu. (It's not mentioned in the
> kerneldoc comment directly, and is only buried in a couple of brief
> mentions in Documentation/RCU/.) I was thinking that would suffice because
> it does it between rcu_read_lock critical sections, but I guess it actually
> only does it after the last one and doesn't prevent a new one from having
> started on another CPU. (And we can't use it anyway.)

Hmmm... The kerneldoc for rcu_read_lock() looks like it says this
pretty unambiguously. Then again, that is easy for me to say, given
that I was the one who made most of this stuff up. ;-)

But it has been awhile since I looked after the kerneldoc. Please see
below for the changes I would make if left to myself.

Any other suggestions for improvement to the kerneldoc?

Thanx, Paul

------------------------------------------------------------------------

commit b12483332f1a7ee2db11f4bdc3c4e61edb608b01
Author: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
Date: Thu Jul 8 17:38:59 2010 -0700

rcu: improve kerneldoc for rcu_read_lock(), call_rcu(), and synchronize_rcu()

Make it explicit that new RCU read-side critical sections that start
after call_rcu() and synchronize_rcu() start might still be running
after the end of the relevant grace period.

Signed-off-by: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 508ebba..27b44b3 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -444,7 +444,7 @@ extern int rcu_my_thread_group_empty(void);
* until after the all the other CPUs exit their critical sections.
*
* Note, however, that RCU callbacks are permitted to run concurrently
- * with RCU read-side critical sections. One way that this can happen
+ * with new RCU read-side critical sections. One way that this can happen
* is via the following sequence of events: (1) CPU 0 enters an RCU
* read-side critical section, (2) CPU 1 invokes call_rcu() to register
* an RCU callback, (3) CPU 0 exits the RCU read-side critical section,
@@ -602,11 +602,13 @@ extern void wakeme_after_rcu(struct rcu_head *head);
/**
* call_rcu() - Queue an RCU callback for invocation after a grace period.
* @head: structure to be used for queueing the RCU updates.
- * @func: actual update function to be invoked after the grace period
+ * @func: actual callback function to be invoked after the grace period
*
- * The update function will be invoked some time after a full grace
- * period elapses, in other words after all currently executing RCU
- * read-side critical sections have completed. RCU read-side critical
+ * The callback function will be invoked some time after a full grace
+ * period elapses, in other words after all pre-existing RCU read-side
+ * critical sections have completed. However, the callback function
+ * might well execute concurrently with RCU read-side critical sections
+ * that started after call_rcu() was invoked. RCU read-side critical
* sections are delimited by rcu_read_lock() and rcu_read_unlock(),
* and may be nested.
*/
@@ -616,9 +618,9 @@ extern void call_rcu(struct rcu_head *head,
/**
* call_rcu_bh() - Queue an RCU for invocation after a quicker grace period.
* @head: structure to be used for queueing the RCU updates.
- * @func: actual update function to be invoked after the grace period
+ * @func: actual callback function to be invoked after the grace period
*
- * The update function will be invoked some time after a full grace
+ * The callback function will be invoked some time after a full grace
* period elapses, in other words after all currently executing RCU
* read-side critical sections have completed. call_rcu_bh() assumes
* that the read-side critical sections end on completion of a softirq
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 9906f85..63bb771 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -546,9 +546,11 @@ EXPORT_SYMBOL_GPL(call_rcu);
*
* Control will return to the caller some time after a full grace
* period has elapsed, in other words after all currently executing RCU
- * read-side critical sections have completed. RCU read-side critical
- * sections are delimited by rcu_read_lock() and rcu_read_unlock(),
- * and may be nested.
+ * read-side critical sections have completed. Note, however, that
+ * upon return from synchronize_rcu(), the caller might well be executing
+ * concurrently with new RCU read-side critical sections that began while
+ * synchronize_rcu() was waiting. RCU read-side critical sections are
+ * delimited by rcu_read_lock() and rcu_read_unlock(), and may be nested.
*/
void synchronize_rcu(void)
{
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Roland McGrath on 8 Jul 2010 21:00

> That we don't cleanup that zombie leaders is unfortunate really, it
> means we have the entire de_thread special case. But short fixing
> libpthread to not make bad assumptions there is little we can do about
> it really.

To be fair (perish the thought!), the semantics for these things were
clearly specified in POSIX-1996 years before we had made any attempt in
Linux at implementing them in the current fashion or any other, so we have
no one but ourselves to blame for our current implementation choices.

There are no required semantics about "zombie leaders". The semantics are
that wait* calls refer to the whole process (thread group) and those wait
events don't occur until all threads (the initial thread or others) are
dead (or all stopped).

Nothing really says we have to keep the leader's task_struct around, we
just need to keep the tgid assigned. We could have the signal_struct be
the thing that holds the tgid and is on the children list that wait looks
at, and let dead leaders "self-reap" like other threads do.

Since we've already separated the ptrace list and pseudo-wait from the
proper children/wait list, it's only some of these existing implementation
assumptions and perhaps some /proc vagaries that prevent us doing that.

> I'm only half following this conversation.

Going for a third here.

Thanks,
Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Roland McGrath on 8 Jul 2010 21:10

> Hmmm... The kerneldoc for rcu_read_lock() looks like it says this
> pretty unambiguously. Then again, that is easy for me to say, given
> that I was the one who made most of this stuff up. ;-)

Yeah, well, I thought I knew what rcu_read_lock did and the only new thing
I had in mind was using call_rcu, so that's the only place I looked. It
wasn't entirely unreasonable to have to read through Documentation/RCU/ a
bit for a subtlety of this kind, and I did find it quickly enough once
Oleg's attentiveness alerted me to be freshly afraid of the subtleties.

> But it has been awhile since I looked after the kerneldoc. Please see
> below for the changes I would make if left to myself.

Those look good to me!

> Any other suggestions for improvement to the kerneldoc?

Oh, I don't know, send over someone to read it to me in a sultry voice
so I get appropriately motivated to cast about for all of it I can find
and listen closely to every word? It already looks like if I'd actually
read all of it twice that day, I would have had the information I needed.

Thanks,
Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Paul E. McKenney on 9 Jul 2010 12:20

On Thu, Jul 08, 2010 at 06:01:37PM -0700, Roland McGrath wrote:
> > Hmmm... The kerneldoc for rcu_read_lock() looks like it says this
> > pretty unambiguously. Then again, that is easy for me to say, given
> > that I was the one who made most of this stuff up. ;-)
>
> Yeah, well, I thought I knew what rcu_read_lock did and the only new thing
> I had in mind was using call_rcu, so that's the only place I looked. It
> wasn't entirely unreasonable to have to read through Documentation/RCU/ a
> bit for a subtlety of this kind, and I did find it quickly enough once
> Oleg's attentiveness alerted me to be freshly afraid of the subtleties.
>
> > But it has been awhile since I looked after the kerneldoc. Please see
> > below for the changes I would make if left to myself.
>
> Those look good to me!

Very good, I have them queued. I have some messy dependencies in my
tree, so it will take some time for them to appear.

> > Any other suggestions for improvement to the kerneldoc?
>
> Oh, I don't know, send over someone to read it to me in a sultry voice
> so I get appropriately motivated to cast about for all of it I can find
> and listen closely to every word? It already looks like if I'd actually
> read all of it twice that day, I would have had the information I needed.

Hmmm... Despite being happily married for well over two decades, I do
feel the need to advise you to be careful what you wish for. ;-)

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev |
Pages: 1 2 3 4 5 6
Prev: [PATCH] x86: Fix vsyscall on gcc 4.5 with -Os
Next: [PATCH 10/12] scsi: megaraid_sas - Add input parameter for max_sectors