From: Ingo Molnar on

* Salman <sqazi(a)google.com> wrote:

> A program that repeatedly forks and waits is susceptible to having the
> same pid repeated, especially when it competes with another instance of the
> same program. This is really bad for bash implementation. Furthermore, many shell
> scripts assume that pid numbers will not be used for some length of time.
>
> Thanks to Ted Tso for the key ideas of this implementation.
>
> Signed-off-by: Salman Qazi <sqazi(a)google.com>
> ---
> kernel/pid.c | 11 ++++++++++-
> 1 files changed, 10 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/pid.c b/kernel/pid.c
> index e9fd8c1..8cedeab 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -153,8 +153,17 @@ static int alloc_pidmap(struct pid_namespace *pid_ns)
> if (likely(atomic_read(&map->nr_free))) {
> do {
> if (!test_and_set_bit(offset, map->page)) {
> + int prev;
> atomic_dec(&map->nr_free);
> - pid_ns->last_pid = pid;
> +
> + do {
> + prev = last;
> + last = cmpxchg(&pid_ns->last_pid,
> + prev, pid);
> + if (last >= pid)
> + break;
> + } while (prev != last);
> +
> return pid;
> }
> offset = find_next_offset(map, offset);

Looks rather interesting. (Cleanliness-wise i'd suggest to split out the while
loop into a helper function.)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: tytso on
On Tue, Jun 08, 2010 at 11:24:38PM -0700, Salman wrote:

> A program that repeatedly forks and waits is susceptible to having
> the same pid repeated, especially when it competes with another
> instance of the same program. This is really bad for bash
> implementation. Furthermore, many shell scripts assume that pid
> numbers will not be used for some length of time.
>
> Thanks to Ted Tso for the key ideas of this implementation.
>
> Signed-off-by: Salman Qazi <sqazi(a)google.com>

Here's a slightly more succint way of expressing it. Others will have
decide if it's easier to understand. (It is for me, but I wrote it. :-P)

- Ted

kernel/pid.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/kernel/pid.c b/kernel/pid.c
index e9fd8c1..c51f413 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -154,8 +154,13 @@ static int alloc_pidmap(struct pid_namespace *pid_ns)
do {
if (!test_and_set_bit(offset, map->page)) {
atomic_dec(&map->nr_free);
- pid_ns->last_pid = pid;
- return pid;
+ while (1) {
+ i = cmpxchg(&pid_ns->last_pid,
+ last, pid);
+ if (i == last || i >= pid)
+ return pid;
+ last = i;
+ }
}
offset = find_next_offset(map, offset);
pid = mk_pid(pid_ns, map, offset);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: tytso on
On Wed, Jun 09, 2010 at 04:49:03AM -0700, Michel Lespinasse wrote:
> You should make sure to handle pid wrap-around for this last/pid comparison.
> I think proper way to do that would be:

Good point! I forgot about checking for pid wrap-around.

> The last_read == pid case is also interesting - it means another
> thread found the same pid, forked a child with that pid, and the
> child exited already (since the bitmap was cleared). However we
> don't need to handle that case - first, that race is much less
> likely to happen, and second, the duplicate pid would be returned in
> two separate tasks - so this would not cause problems in bash as in
> your example.

In order for that to happen, all of this would have to happen between
the time that last_pid was initially sampled at the very beginning of
alloc_pidmap(). Could that happen? I think it could, if kzalloc()
took a very long time to run, and the scheduler was _very_ unfair. We
could try to guard against that by checking to see if last_pid has
changed after allocating map->page (in the unlikely case of !map->page
being NULL in the first place) and if so, restarting alloc_pidmap() by
jumping back to the beginning of the function.

Could that happen with bash? I'm not as confident as you that it
could never happen. The fact that we saw the race in the first place
in bash means that it could still happen in this case, I think. In
any case, if we have two processes getting the same pid in the absence
of a fork, that would be a bad thing and could lead to all sorts of
confusion, so it's probably a good thing to guard against even if it
is a rare case.

BTW, Salman, I haven't seen it, but I vaguely remember in the mail
exchange with the person who reported this that we have a C/C++
reproduction case? If it's small enough, it might be a good idea to
include it in the patch description.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Wed, 9 Jun 2010, Ingo Molnar wrote:

>
> * Salman <sqazi(a)google.com> wrote:
> >
> > A program that repeatedly forks and waits is susceptible to having the
> > same pid repeated, especially when it competes with another instance of the
> > same program. This is really bad for bash implementation. Furthermore, many shell
> > scripts assume that pid numbers will not be used for some length of time.
> >
> > Thanks to Ted Tso for the key ideas of this implementation.
>
> Looks rather interesting. (Cleanliness-wise i'd suggest to split out the while
> loop into a helper function.)

I have to say that usually I can look at a patch and see what it does.

This time I had absolutely no clue.

There was a whole big context missing: that original load of "last_pid"
into "last" at the top of the function, far outside the patch. And in
this case I don't think splitting out the trivial cmpxchg() loop would
help that problem - that would just make the "load original" part of the
big picture be even further away from the final "replace if same" part,
and I think _that_ is a much more critical part of the subtleties there.

So I had to read the patch _and_ go read the code it patched, in order to
at all understand what it did. I think the patch explanation should have
done it, and I also think this would need a bit comment at the top.

[ In fact, I'd argue that the _old_ code would have needed a big comment
at the top about last_pid usage, but i somebody had done that, they'd
probably also have seen the race while explaning how the code worked ;]

So having looked at the patch and the code, I agree with the patch, but
I'd like some more explanation to go with it.

[ Or Ted's version: as mentioned, I don't think the complexity is actually
in the final cmpxchg loop itself, but in the bigger picture, so I don't
think the differences between Ted's and Salman's versions are that big ]

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: tytso on
On Wed, Jun 09, 2010 at 08:39:00AM -0700, Linus Torvalds wrote:
>
> So I had to read the patch _and_ go read the code it patched, in order to
> at all understand what it did. I think the patch explanation should have
> done it, and I also think this would need a bit comment at the top.
>
> [ In fact, I'd argue that the _old_ code would have needed a big comment
> at the top about last_pid usage, but i somebody had done that, they'd
> probably also have seen the race while explaning how the code worked ;]
>

Salman had created a very nice ASCII art diagram of the race in the
mail thread with the internal bug reporter who noticed the problem.
We could include that, if you don't mind the commit description
growing by 30-40 lines. :-) I agree though that better documentation
n the source code about _how_ alloc_pidmap was supposed to avoid all
possible races would have probably been a good idea.

> [ Or Ted's version: as mentioned, I don't think the complexity is actually
> in the final cmpxchg loop itself, but in the bigger picture, so I don't
> think the differences between Ted's and Salman's versions are that big ]

Yah, I had been staring at the code for a while, so I had the feeling
that my intuition of which patch would be clearer was probably biased.

We do need to deal with pid wrap possibility just to be completely
correct, although the chance of hitting _that_ are pretty remote.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/