From: Linus Torvalds on


On Wed, 9 Jun 2010, tytso(a)mit.edu wrote:
>
> Salman had created a very nice ASCII art diagram of the race in the
> mail thread with the internal bug reporter who noticed the problem.
> We could include that, if you don't mind the commit description
> growing by 30-40 lines. :-)

I don't horribly mind, but I also don't think it's necessarily all that
useful to go that far. For the commit message, there are two real reasons:

- explaining the patch itself upstream to make people like me understand
_why_ it needs to be applied.

But then a denser explanation than a 30-40 line ASCII art diagram would
probably suffice.

- explaning the code after-the-fact to people who end up seeing it in
log/blame output

And then we don't care so much about the old bug per se, as about how
the code is supposed to work - so a lot of verbiage about what used to
happen is much less important than describing just what's going on with
the cmpxchg (where the "loop" is all the way from the original value
load, especially in this case where we don't have to loop all the way
back).

In fact, conceptually the cmpxchg loop is pretty much the whole function,
it's just that if we know that any racing elements will be idempotent wrt
anything but the final assignment of 'last_pid'. So we can end up just
looping over the final part. I think _that_ is the clever part, and I
think that is the part that needs explanation.

(And that is also why the diff itself doesn't include the "early part of
the loop", and why I couldn't understand it purely as a patch - because it
in this case the patch is "artificially small" because of how we don't
need to loop all the way up)

> We do need to deal with pid wrap possibility just to be completely
> correct, although the chance of hitting _that_ are pretty remote.

That seems to be purely an artifact of the cleverness of avoiding re-doing
the whole loop, no? If we _had_ re-done the whole loop (the "normal"
cmpxchg model), we would have re-started the whole pid search and handled
the overflow in the existing overflow handling code.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: tytso on
On Wed, Jun 09, 2010 at 09:06:37AM -0700, Linus Torvalds wrote:
>
> That seems to be purely an artifact of the cleverness of avoiding re-doing
> the whole loop, no? If we _had_ re-done the whole loop (the "normal"
> cmpxchg model), we would have re-started the whole pid search and handled
> the overflow in the existing overflow handling code.

This brings up a question. If we're going to use a cmpxchg() loop, is
there any point to doing the test-and-set game with the bitmap? It
should be just as fast to just use cmpxchg as it is to do the
test-and-set, and isn't it more likely that various CPU architectures
will have the cmpxchg than test-and-set-bit?

We might be able to radically simplify alloc_pidmap(). Would that
cause a huge problem on some non-Intel architectures, though?

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Wed, 9 Jun 2010, Linus Torvalds wrote:
>
> Otherwise you have three threads, two of which pick the same pid (because
> the test-and-set isn't atomic), and a third of which picks a new one.

In fact, I don't think you need three threads at all. It's perfectly ok to
just have two threads, and they'd both end up picking the same 'pid'
without the atomicity guarantees of that 'test_and_set()' bitmap access.

And they'd both be perfectly fine setting last_pid to that (shared) pid if
I read that cmpxchg loop right. No?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: tytso on
On Wed, Jun 09, 2010 at 10:25:50AM -0700, Linus Torvalds wrote:
>
>
> On Wed, 9 Jun 2010, Linus Torvalds wrote:
> >
> > Otherwise you have three threads, two of which pick the same pid (because
> > the test-and-set isn't atomic), and a third of which picks a new one.
>
> In fact, I don't think you need three threads at all. It's perfectly ok to
> just have two threads, and they'd both end up picking the same 'pid'
> without the atomicity guarantees of that 'test_and_set()' bitmap access.
>
> And they'd both be perfectly fine setting last_pid to that (shared) pid if
> I read that cmpxchg loop right. No?

Well, I was thinking about something like this:

while (1) {
last = pid_ns->last_pid;
pid = last + 1;
if (pid >= pid_max)
pid = RESERVED_PIDS;
if (cmpxchg(&pid_ns->last_pid, last, pid) == last)
return pid;
}

Which I don't think is racy, unless I'm missing something. Both might
end up picking the same pid, but only one will successfully set
last_pid, and the other will just loop and try again.

There appears to be some interesting uses of the bitmap by
find_ge_pid() and next_pidmap() that I haven't completely grokked yet,
especially as to why they're needed, though. Assuming they are
needed, we might end up needing the bitmap after all, though.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: tytso on
On Wed, Jun 09, 2010 at 10:43:33AM -0700, Linus Torvalds wrote:
>
> We need that bitmap to handle the overflow max_pid case. We are _not_
> returning just increasing pid numbers.

Doh! I knew I was forgetting something obvious. I was hoping we
could get rid of the bitmap entirely, but I guess not....

(Unless users would stand for 64-bit pid numbers... no? Dang. :-)

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/