From: phil-news-nospam on
On Sat, 8 May 2010 11:52:48 -0700 (PDT) David Schwartz <davids(a)webmaster.com> wrote:
| On May 8, 6:22�am, phil-news-nos...(a)ipal.net wrote:
|
|> excessive emphasis on threads compared to processes
|
| Process-pool designs are not really realistic yet. Nobody's done the
| work needed to make them useful.
|
| I keep hoping somebody will, since I think that's a phenomenal design
| approach. You would need to allocate lots of memory address space
| before you fork off the child processes (64-bit OSes make this easy),
| and have a special "shared allocator" to allocate shared memory. You'd
| need a library that made it easy to register file descriptors as
| shared and hand them from process to process. You'd also need a "work
| pool" implementation that only accepted references to shared resources
| to identify a work item.

I've seen few application purposes ... so few I can't even think of one at
the moment ... although I know I though about one many years back ... which
would need to hand file descriptors between tasks (I'm using this as a
generic term for a unit of work that could be a thread or a process), other
than the simplistic case of a master listener for a daemon that hands off
to workers for each arriving connection (probably best done as part of the
creation of that task ... which is usually what we see happening).

The 64-bit VM does give us space to do some memory structuring to deal with
the issues like having memory that is shared, and having private memory that
still needs to be distinguished (e.g. two tasks that cannot see each other's
stack, but we still want the addresses to be different for some reason).

Well, it will until we waste too much of it and eventually exhaust it.


| Ideally, a process could register what it was messing with. So if it
| crashed/failed, the system would know what was potentially corrupt.

I'd prefer that registration of it be a part of getting it. That is, when
the task gets the resource, it already is registered. Think descriptors
and processes. Kill the process and the descriptors close (aside for a
few glitches in the design such as stuck devices ... but that's another
whole rant for another day) and go away. Sharing the resources would have
to be considered, too.


|> What is really needed is a whole NEW threading concept where individual
|> threads can have private-to-that-thread resources, like file descriptors
|> (but done without giving up the ability to choose to share them). �Then
|> you can spread the descriptors and other resources out in ways that allow
|> them to be managed better.
|
| I'm not sure how that would be any better. Currently, if you want a
| file descriptor to only be accessed by one thread, just only access it
| from that one thread.

But it's still silly to have to deal with the issues of a descriptor space
that exceeds some large value just because all the other descriptors are
visible together in at least some descriptor space. Think of 40,000 HTTP
tasks working at once. You need a couple descriptors each, at least.
Why do they all need to share the same descriptor space, even if there is
some need to share the same virtual memory space (which may or may not be
a good thing).

Threads often are a performance win ... NOT because they allow faster
sharing between tasks (not always needed) ... BUT just because context
switches between threads of the same virtual memory space are faster.
You won't need to switch the segment structure. You won't need to flush
the VM translation cache. You won't even need to discard memory cache
in many cases (depending on architecture in some).

But threads are also often a risk ... they are not padded cells, for example.
And then there is the issue of having enough separate stacks, file descriptor
spaces, etc.

A lot of this is an issue of architectural design (and not just architecture
of the CPU) ... architecture of how process and thread contexts are organized
and how information flows where it needs to go. If it's a web server that
just delivers static files (for example all those button images and such),
then it is mostly very simple. But if it needs to keep state for each user,
or share information between distinct users in real time, especially if in
faster time than storing it in a database can do, then the architecture of
the server/service needs to go into this. That design needs to consider the
effects of threads, processes, shared resources (which ones are needed and
which ones are not), and even distinct hardware. For example, users accessing
a web based mail system might be best redirected to the same machine each time
during a login session, allowing their state cache to be kept in one place,
even while thousands of machines and tens of millions of total processes or
threads are running to service them.

There's really no general purpose solution. There won't be until we get to
a level where the "loose fit" of "one size fits all" won't matter (this will
require machines that would look to us today much as today's machines would
impress people from the 1980's).

--
-----------------------------------------------------------------------------
| Phil Howard KA9WGN | http://linuxhomepage.com/ http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/ http://ka9wgn.ham.org/ |
-----------------------------------------------------------------------------
From: David Schwartz on
On May 8, 1:11 pm, phil-news-nos...(a)ipal.net wrote:

> Threads often are a performance win ... NOT because they allow faster
> sharing between tasks (not always needed) ... BUT just because context
> switches between threads of the same virtual memory space are faster.

This is a common misconception. Threads are rarely, if ever, a
performance win because they make context switches faster. Threads are
primarily a performance win because they minimize the need for context
switches. A threaded web server can do a little bit of work for each
of a thousand clients without even a single context switch.

If you are having so many context switches that the cost of a context
switch shows up on your performance radar, you are doing something
horribly wrong. Schedulers are specifically designed to ensure that
context switches are infrequent, and you would have to be putting some
disastrous pressures on them for that design to fail to do its job.

DS
From: Golden California Girls on
David Schwartz wrote:
> On May 8, 1:11 pm, phil-news-nos...(a)ipal.net wrote:
>
>> Threads often are a performance win ... NOT because they allow faster
>> sharing between tasks (not always needed) ... BUT just because context
>> switches between threads of the same virtual memory space are faster.
>
> This is a common misconception. Threads are rarely, if ever, a
> performance win because they make context switches faster. Threads are
> primarily a performance win because they minimize the need for context
> switches. A threaded web server can do a little bit of work for each
> of a thousand clients without even a single context switch.

That depends upon what you call a context switch. Somehow I think to
switch threads you have to somehow save and restore a few registers, the
Program Counter for sure, unless you have more cores than threads. The
more registers that have to be exchanged the longer the switching time.


From: David Schwartz on
On May 9, 12:37 am, Golden California Girls <gldncag...(a)aol.com.mil>
wrote:

> That depends upon what you call a context switch.  Somehow I think to
> switch threads you have to somehow save and restore a few registers, the
> Program Counter for sure, unless you have more cores than threads.  The
> more registers that have to be exchanged the longer the switching time.

Compared to blowing out the code and data caches, the time it takes to
save and restore a few registers is meaningless.

DS
From: Scott Lurndal on
David Schwartz <davids(a)webmaster.com> writes:
>On May 8, 1:11=A0pm, phil-news-nos...(a)ipal.net wrote:
>
>> Threads often are a performance win ... NOT because they allow faster
>> sharing between tasks (not always needed) ... BUT just because context
>> switches between threads of the same virtual memory space are faster.
>
>This is a common misconception. Threads are rarely, if ever, a
>performance win because they make context switches faster. Threads are
>primarily a performance win because they minimize the need for context
>switches. A threaded web server can do a little bit of work for each
>of a thousand clients without even a single context switch.

Threads are a performance win because they don't need to flush the TLB's
on context switches between threads in the same process.

A thread context switch is enormously less
expensive than a process context switch. The larger the page size,
the better.

TLB misses are expensive. TLB misses are _really_ expensive in virtual
machines[*].

scott

[*] Up to 22 memory references when using nested page tables, depending on
processor page directory cache hit rate; this can be reduce to 11 if the
nested page table uses 1GB pages sizes (vice 4 or less without using SVM).