From: Pavel Machek on
Hi!

> > > Distros don't want to take a patch that adds a new boot param that is
> > > not accepted upstream, otherwise they will be stuck forward porting it
> > > from now until, well, forever :)
> >
> > So for an obscure IA64 specific problem you want the upstream kernel to
> > port it forward forever instead ?
>
> Ehh. Nobody does ia64 any more. It's dead, Jim.
>
> This is x86. SGI finally long ago gave up on the Intel/HP clusterf*ck.
>
> Which I'm not entirely sure makes the case for the kernel parameter much
> stronger, though. I wonder if it's not more appropriate to just have a
> total hack saying
>
> if (max_pids < N * max_cpus) {
> printk("We have %d CPUs, increasing max_pids to %d\n");
> max_pids = N*max_cpus;
> }
>
> where "N" is just some random fudge-factor. It's reasonable to expect a
> certain minimum number of processes per CPU, after all.

Issue with max_pids is that it can break userspace, right?

At that point it seems saner to require a parameter --- just adding
cpus to the system should not do it...

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Sun, 25 Apr 2010, Pavel Machek wrote:
>
> Issue with max_pids is that it can break userspace, right?

Iirc, some _really_ old code used 'short' for pid_t, and we wanted to be
really safe when we raised the limits.

I seriously doubt we need to worry about old binaries like that on any 16+
CPU machines, though.

The other issue is just the size of the pidmap[] array. Instead of walking
all the processes to see "is this pid in use" (like I think the original
Linux kernel did), we have a bitmap of used pids. When you raise pid_max,
that bitmap obviously still needs to be big enough. Right now we allocate
that statically (rather than growing it dynamically), so we end up having
a _hard_ limit of PID_MAX_LIMIT too.

On 32-bit, I think that still maximum limit ends up being basically 32767.
So again, on a _legacy_ system, you end up being limited in the number of
pid_t entries.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on

On Sun, 25 Apr 2010, Linus Torvalds wrote:
>
> Iirc, some _really_ old code used 'short' for pid_t, and we wanted to be
> really safe when we raised the limits.

... I dug into the history, and this is from August 2002..

We used to limit it to sixteen bits, but that was too tight even then for
some people, so first we did this:

Author: Linus Torvalds <torvalds(a)home.transmeta.com>
Date: Thu Aug 8 03:57:42 2002 -0700

Make pid allocation use 30 of the 32 bits, instead of 15.

diff --git a/include/linux/threads.h b/include/linux/threads.h
index 880b990..6804ee7 100644
--- a/include/linux/threads.h
+++ b/include/linux/threads.h
@@ -19,6 +19,7 @@
/*
* This controls the maximum pid allocated to a process
*/
-#define PID_MAX 0x8000
+#define PID_MASK 0x3fffffff
+#define PID_MAX (PID_MASK+1)

#endif
diff --git a/kernel/fork.c b/kernel/fork.c
index d40d246..017740d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -142,7 +142,7 @@ static int get_pid(unsigned long flags)
return 0;

spin_lock(&lastpid_lock);
- if((++last_pid) & 0xffff8000) {
+ if((++last_pid) & ~PID_MASK) {
last_pid = 300; /* Skip daemons etc. */
goto inside;
}
@@ -157,7 +157,7 @@ inside:
p->tgid == last_pid ||
p->session == last_pid) {
if(++last_pid >= next_safe) {
- if(last_pid & 0xffff8000)
+ if(last_pid & ~PID_MASK)
last_pid = 300;
next_safe = PID_MAX;
}

which just upped the limits. That, in turn, _did_ end up breaking some
silly old binaries, so then a month later Ingo did a "pid-max" patch
that made the maximum dynamic, with a default of the old 15-bit limit,
and a sysctl to raise it.

And then a couple of weeks later, Ingo did another patch to fix the
scalability problems we had with lots of pids (avoiding the whole
"for_each_task()" crud to figure out which pids were ok, and using a
'struct pid' instead).

So the whole worry about > 15-bit pids goes back to 2002. I think we're
pretty safe now.

Linus

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Pavel Machek on
Hi!

> > Iirc, some _really_ old code used 'short' for pid_t, and we wanted to be
> > really safe when we raised the limits.
>
> .. I dug into the history, and this is from August 2002..
>
> We used to limit it to sixteen bits, but that was too tight even then for
> some people, so first we did this:
>
> Author: Linus Torvalds <torvalds(a)home.transmeta.com>
> Date: Thu Aug 8 03:57:42 2002 -0700
>
> Make pid allocation use 30 of the 32 bits, instead of 15.
....
> which just upped the limits. That, in turn, _did_ end up breaking some
> silly old binaries, so then a month later Ingo did a "pid-max" patch
> that made the maximum dynamic, with a default of the old 15-bit limit,
> and a sysctl to raise it.
>
> And then a couple of weeks later, Ingo did another patch to fix the
> scalability problems we had with lots of pids (avoiding the whole
> "for_each_task()" crud to figure out which pids were ok, and using a
> 'struct pid' instead).
>
> So the whole worry about > 15-bit pids goes back to 2002. I think we're
> pretty safe now.

From principle of least surprise PoV: breaking old userspace when you
pass special config option is less surpising than breaking old
userspace when you add more CPUs.

Whether the breakage will be common enough that this matters is other
question.
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/