From: Timo Kunze on
Hi,

we've an app that may create up to 1000 threads. If it does so, it
usually doesn't perform any work anymore. I'm sure that there's a
deadlock in there, but while discussing this problem with my colleagues,
we came up with the question whether Windows can manage this many
threads without becoming significantly slower.
In my opinion, even if the scheduler works with O(1) complexity, the
overall management (thread switching and so on) of this many threads
produces enough overhead to slow down the system significantly. I always
try to keep the number of threads that my apps start lower than the
number of logical processors multiplied by 2.
Is my assumption right and is keeping the number of threads per CPU low
a good practice to get the best performance? Or am I concerning about
problems that don't exist?

Regards
Timo
--
www.TimoSoft-Software.de - Unicode controls for VB6
"Those who sacrifice freedom for safety deserve neither."
"Demokratie ist per Definition unsicher. Ihr Schutz entsteht aus der
Überzeugung, dass die demokratischen Kräfte überwiegen und sich – auf
demokratischem Wege – durchsetzen."
From: Arny on


On 22.04.2010 20:47, Timo Kunze wrote:
> Hi,
>
> we've an app that may create up to 1000 threads. If it does so, it
> usually doesn't perform any work anymore. I'm sure that there's a
> deadlock in there, but while discussing this problem with my colleagues,
> we came up with the question whether Windows can manage this many
> threads without becoming significantly slower.
> In my opinion, even if the scheduler works with O(1) complexity, the
> overall management (thread switching and so on) of this many threads
> produces enough overhead to slow down the system significantly. I always
> try to keep the number of threads that my apps start lower than the
> number of logical processors multiplied by 2.
> Is my assumption right and is keeping the number of threads per CPU low
> a good practice to get the best performance? Or am I concerning about
> problems that don't exist?
>
> Regards
> Timo

Usually, the limiting factor is the stack size. Default stack size per
thread is 1MB (which can be changed). User-mode programs are limited to
2GB address space for 32bit Windows. So upper limit is about 2000
threads. As far as I recall, the rule is to never go beyond 12 threads
at most. If you require more, it's better to use the 1-object-per-client
model instead of 1-thread-per-client model. Shifting the responsibility
of the scheduling.

- RaZ
From: David Schwartz on
On Apr 22, 11:47 am, Timo Kunze <TKunze71...(a)gmx.de> wrote:

> we've an app that may create up to 1000 threads. If it does so, it
> usually doesn't perform any work anymore. I'm sure that there's a
> deadlock in there, but while discussing this problem with my colleagues,
> we came up with the question whether Windows can manage this many
> threads without becoming significantly slower.

Windows, with sufficient memory, can handle 1,000 threads with no
problem. It is, however, atrocious design to create this many threads.

> In my opinion, even if the scheduler works with O(1) complexity, the
> overall management (thread switching and so on) of this many threads
> produces enough overhead to slow down the system significantly. I always
> try to keep the number of threads that my apps start lower than the
> number of logical processors multiplied by 2.

That may be a bit extreme, but yes, keeping the number of threads down
provides a large number of benefits. The biggest one is that it tends
to lead to fewer context switches.

> Is my assumption right and is keeping the number of threads per CPU low
> a good practice to get the best performance? Or am I concerning about
> problems that don't exist?

Keeping the number of threads down will give you the best performance
and also tends to automatically keep you away from other problems
(such as thundering herds). However, the scheduler on a Windows
machine can handle 1,000 threads without a problem.

DS
From: Leo Davidson on
On Apr 22, 7:47 pm, Timo Kunze <TKunze71...(a)gmx.de> wrote:
> deadlock in there, but while discussing this problem with my colleagues,
> we came up with the question whether Windows can manage this many
> threads without becoming significantly slower.

Active threads or inactive ones?

If most of them are in a Wait* call then I doubt they'll have much, if
any, effect other than using up some address space and swap file. If
they're all active then that seems very bad as they'll be competing
for disk and cache on top of all the context switches.
From: Ulrich Eckhardt on
Timo Kunze wrote:
> we've an app that may create up to 1000 threads. If it does so, it
> usually doesn't perform any work anymore.

Not surprising, as the overhead of switching and cache misses grows.

> I'm sure that there's a deadlock in there,

A deadlock doesn't explain bad performance, that rather is a hard error.

> but while discussing this problem with my colleagues, we came up
> with the question whether Windows can manage this many threads
> without becoming significantly slower.

Yes, the OS can, though of course it presents overhead. If your program can
depends on how it uses the computer.

> In my opinion, even if the scheduler works with O(1) complexity, the
> overall management (thread switching and so on) of this many threads
> produces enough overhead to slow down the system significantly.

If you have two threads, and those perpetually run, the system will preempt
just as often as when you have 100 threads.

> I always try to keep the number of threads that my apps start lower
> than the number of logical processors multiplied by 2.

Why? I'd use the number of shared resources as base, plus one or two. The
resources are CPUs and IOs (HD and NIC), provided the tasks to do actually
depend on them. Add one or two to make sure that you keep the CPUs busy,
even if two threads happen to wait for IO.

> Is my assumption right and is keeping the number of threads per CPU low
> a good practice to get the best performance?

Yes, this allows threads to run without being preempted for a longer time,
assuming the task actually is CPU-bound. If all it does is handle traffic
going through a single NIC using multiple CPUs won't help.

Uli

--
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932