From: David Schwartz on
On May 9, 6:12 pm, phil-news-nos...(a)ipal.net wrote:

> How is it that a scheduler has anything to do with this?

The scheduler is specifically designed to allocate timeslices to ready
to run threads such that the number of context switches is low enough
that they don't impact performance.

This design will work quite well unless you do something stupid. For
example, if you use a process-per-connection design and need to do a
tiny bit of work for each of 150 connections, you will need about 150
context switches. But that's because you did something stupid.

So long as you don't do something stupid like that, the cost of
context switches is lost in the noise because the scheduler will not
make very many of them.

> If you have
> different tasks under different thread or different processes, then the
> scheduler is forced to make a context switch when something different
> has to be done, and that's in a different thread/process.

There is no such thing as "tasks under different thread". Threads
share all memory, all file descriptors, everything. There is no way
something can be "stuck in the wrong thread" unless you specifically
design things that way.

Assuming a sane designer, he would only allow tasks to get "stuck to a
thread" where that didn't harm performance. And, of course, you can
always shoot yourself in the foot. However, if the process that holds
a file descriptor is not running, no forward progress can be made
without a context switch.

> The win for
> threads is the context switch to another thread within the same process
> is cheaper than a context switch between processes.

No. That is not why threads are a win. That is, as I've tried to
explain, a common misconception. It's like saying jet planes are a win
over bicycles because you don't have to pedal them.

Threads are a win over processes because it makes no difference which
thread runs. The process makes forward progress so long as any ready-
to-run thread gets the CPU. That is, in a properly designed multi-
threaded application, the amount of work done before a context switch
will be needed will be much higher.

DS
From: Scott Lurndal on
phil-news-nospam(a)ipal.net writes:
>On 09 May 2010 23:08:21 GMT Scott Lurndal <scott(a)slp53.sl.home> wrote:
>
>| [*] Up to 22 memory references when using nested page tables, depending on
>| processor page directory cache hit rate; this can be reduce to 11 if the
>| nested page table uses 1GB pages sizes (vice 4 or less without using SVM).
>
>Is the page table also stored in cache, even if also in the TLB?

Depends on:

1) how recently used and
2) The cache eviction behavior.

Generally, I wouldn't count on any of the PTE entries being present
in the processor cache, you might find one or more of the intermediate
entries (PML4, PDP, PD) in a shared L3 cache of sufficient size, but
I wouldn't count on it.

Both AMD and Intel have special PML4/PDP/PD caches in the processor to
help make TLB fills a bit more efficient. The PML4 entry will likely
be cached (since there are only two entries available in the PML4, one
for the lower 512GB, and one for the uppert 512GB) and the PML4 is the
first reference on a table walk.

Consider a page table mapping 1TB of memory with 4k pages. This requires
two gigabytes of memory just for the page tables. Consider then, that
each process must have it's own page table, and you'll see that the processor
cache has little benefit for TLB fills.

scott
From: Scott Lurndal on
David Schwartz <davids(a)webmaster.com> writes:
>On May 9, 4:08=A0pm, sc...(a)slp53.sl.home (Scott Lurndal) wrote:
>
>> Threads are a performance win because they don't need to flush the TLB's
>> on context switches between threads in the same process.
>
>Nope. That's like saying that cars are faster than bicycles because
>they don't have pedals. While it's true that threads are a performance
>win and it's true that context switches between threads of the same
>process are faster than context switches between threads from
>different processes, the latter does not cause the former.
>
>> A thread context switch is enormously less
>> expensive than a process context switch. =A0 The larger the page size,
>> the better.
>
>It doesn't matter. In any sensible threaded application, there will be
>so few context switches that making them faster will be lost in the
>noise.

I've never seen a thread that doesn't require a context switch, aside
from the user-level M-N threads in the old SVR4.2MP threads library, and
that was also a context switch, just done in the library rather than the
kernel.

If you degenerate your system to a single thread per core, and only have
one process (i.e. a real-time embedded) system, then there won't be many
context switches between threads.

However, in real-world threaded applications there _are_ context switches,
and there are _many_ context switches, and a thread context switch is
more efficient than a process context switch.

scott
From: Scott Lurndal on
phil-news-nospam(a)ipal.net writes:
>On 09 May 2010 23:15:08 GMT Scott Lurndal <scott(a)slp53.sl.home> wrote:
>| David Schwartz <davids(a)webmaster.com> writes:
>|>On May 9, 12:37=A0am, Golden California Girls <gldncag...(a)aol.com.mil>
>|>wrote:
>|>
>|>> That depends upon what you call a context switch. =A0Somehow I think to
>|>> switch threads you have to somehow save and restore a few registers, the
>|>> Program Counter for sure, unless you have more cores than threads. =A0The
>|>> more registers that have to be exchanged the longer the switching time.
>|>
>|>Compared to blowing out the code and data caches, the time it takes to
>|>save and restore a few registers is meaningless.
>|>
>|>DS
>|
>| It's not the caches, so much, as it is the TLB's. The caches (at least
>| on physically indexed architectures like Intel/AMD's) are not flushed on a
>| context switch; either a thread context switch or process context switch
>| may or may not result in a subsequent cache miss - that depends on many
>| factors. A thread switch is less likely to see a subsequent cache miss,
>| however.
>
>However, once the context switch to a new VM does take place, the cache that
>pointed to the previous process is useless (except for shared parts since
>this is a physical/real address caching architecture).

Indeed. The shared parts are key. Context switches between VM's are a special
case, and AMD has some help for the TLB's in this case by associating a ASID
with the VM. However, this is orthogonal to the point I made above about
thread switches within a process being more efficient than thread switches
between processes.

scott
From: Rainer Weikusat on
David Schwartz <davids(a)webmaster.com> writes:
> On May 9, 4:08�pm, sc...(a)slp53.sl.home (Scott Lurndal) wrote:

[...]

>> A thread context switch is enormously less
>> expensive than a process context switch. � The larger the page size,
>> the better.
>
> It doesn't matter. In any sensible threaded application, there will be
> so few context switches that making them faster will be lost in the
> noise.

Dedicating threads to particular subtasks of something which is
supposed to be done is also a sensible way to design 'a threaded
application', just one which is rather geared towards simplicity of
the implementation than maximum performance. Because a thread context
switch is cheaper than a process context switch, such simple designs
are useful for a wider range of tasks when using threads instead of
processes.