From: Terje Mathisen on
MitchAlsup wrote:
> Nobody has solved the synchronization problem that will embolden cores
> and threads to really deliver on their promise
> The closest solution to synchronization has been lobotomized by its
> implementation

Hmmm..., do I hear the bitter sound of an architect twarthed? :-)

> Thus the way forward is threads and cores with incresingly small gains
> as the count increases

Yes indeed.

> To a very large extent:
> There is no need for new instructions sets--extensions will occur
> naturally
> There is no need for new system organizations--software has enough
> trouble with the ones it already has
> There is no need for new storage organizations--DRAM and disks have a
> natural growth path
> So, evolution is bound to take place, and has been for a while.
>
> You see, the problem is not in the instruction sets, processor
> organization, cores and threads, nor system organization: The problem
> is in the power wall, memory wall, and synchronization wall. Until
> solutions to these are found, little can be done except squeeze
> another drop of blood from the stone.

Personally I am very particular to XADD, i.e. you can use this as a
building block to return a unique result to each of a bunch of competing
cores.

One idea would be to use the 0->1 transition as the Go! signal, and all
the others would go into a scaled (exponential?) backoff, depending upon
the return value they got.

I.e. if you have code that first tries to read the variable, then uses
LOCK XADD only after seeing a zero, the backoff path would require
seeing zero X times, with X a function of the previous XADD result.

This would at least guarantee forward progress, but you still have the
problem with N cores all trying to gain ownership of the same cache
line. :-(
>
> But what application is so compelling that it will actually change
> which stone is getting squeezed?

Graphics is the only one that comes to mind, and here you can mostly
program your way around the problem, at least up to ~1K fp lanes.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: hanukas on
On Sep 7, 11:54 pm, Robert Myers <rbmyers...(a)gmail.com> wrote:
> On Sep 7, 3:38 pm, n...(a)cam.ac.uk wrote:
>
> > Only a complete loon would
> > expect current software to do anything useful on large numbers of
> > processors, let alone with a new architecture!
>
> You've such a way with words. Nick.  Browsers, which are probably the
> OS of the future, are already multi-threaded or soon to be.  No longer
> does the browser freeze because of some java script in an open tab.
> Browsers that don't seize that advantage will fall by the wayside.
> The same will happen all over  software, and at increasing levels of
> fineness of division of labor.
>
> Robert.

So true. The fool that I am, had dual-core PentiumII system back in
1997 and yes, wrote multi-threaded software back then. Sure it's more
difficult to debug and trace problems, but the actual design and
implementation can be very simple and painless process.

This has to be emphasized: In My Opinion, there haven't really been
any great pressure to go multi-threaded for most things if the largest
benefit has been just more problems to development process. The reason
is simple: most target hardware (for desktop x86) has been single-core
until the few recent years. There just haven't been that much
incentive (or requirement) to do that sort of development except for
some very specific software with embarassingly parallelizable problems
(video, rendering and that sort of thing).

With a single-core system, the biggest reason to go multi-threaded has
been possibility of atomizing operations that take a long time to
finish. If you have user interface, you don't want it to freeze for
100-300 ms periodically when the program just wants to pre-cache some
resources in the background. Sure, you can split the task (=atomize)
into small chunks manually, but why, when the OS is well capable of
doing that for you (=launch a worker thread or have work queue and one
worker thread, whatever).

That's the kind of things that you want multi-threading for, even on
single-core machine. Now that the path of least resistance for
increased computation power is adding units to do computation, Intel,
AMD, SUN, IBM and the others have roadmaps full of junk using this
paradigm.

At this point someone should ask this question from themselves: what's
wrong, when some tasks take uncomfortably long time to complete while
only fraction of the computational power is utilized? If the mr.
software developer is smart, he finds ways to make his software more
responsive. Step 1: find smarter ways to do things: don't do stuff you
don't need. Do stuff in order that the user gets his feedback as soon
as possible (reduce latency). The latency alone won't help if there
isn't enough bandwidth to back it up, so.. let's see what multi-core
can do for us:

Increasing bandwidth is easier: it's easier to for example read 4
jpeg's in their own threads than parallize reading one jpeg. While the
throughput is same for both cases, the parallel reading approach is
much simpler to program but has in worst case 4x the latency. To fix
this a cache with a prefetcher will do nicely. A cache is needed
anyway for this kind of application, and prefetcher isn't a bad idea
either: trade some memory for perceived latency compensation. Problem
solved.

IMO; utilizing more cores isn't difficult at all - there just haven't
been much incentive to do so in the mainstream software development.
It doesn't help that much that there really isn't much need for that
kind of "solutions" for most software. But, whenever the developer
sees the hourclass icon when testing their own software.. the question
must be asked: how to get rid of this waiting? Going multi-threaded
shouldn't be the anwer. First do things smart. Reaching out for many
cores as first rection to poor performance is for the weak. =)

Maybe there is a brute-force for-loop iterating through all
combinations when doing a search from data set of hundreds of
thousands of objects? Oops! Simple algorithm change will cut
processing time from minutes to milliseconds. Throw 16 cores at the
problem is waste.

Just things everything thinks probably.

From: nmm1 on
In article <0c2a8ab3-630b-48b7-9e61-26e819e133f7(a)o10g2000yqa.googlegroups.com>,
MitchAlsup <MitchAlsup(a)aol.com> wrote:
>Reality check::(V 1.0)
>
>A handful (maybe two) of people are doing architecture as IBM defined
>the term with System/360
> . . .

Yes.

>All of the ILP that can be extracted with reasonable power has been
>extracted from the instruction sets and programming langauges in vouge
>The memory wall and the power wall have made it impossible to scale as
>we have before
>Threads scale to a handful per core

Yes.

>Cores scale to the limits of pin bandwidth (and power envelope) Likely
>to be close to a handful of handfuls

Actually, not really, because of your next point.

>Nobody has solved the synchronization problem that will embolden cores
>and threads to really deliver on their promise
>The closest solution to synchronization has been lobotomized by its
>implementation
>Thus the way forward is threads and cores with incresingly small gains
>as the count increases

Yes.

>To a very large extent:
>There is no need for new instructions sets--extensions will occur
>naturally
>There is no need for new system organizations--software has enough
>trouble with the ones it already has
>There is no need for new storage organizations--DRAM and disks have a
>natural growth path
>So, evolution is bound to take place, and has been for a while.

Here I disagree. The current designs are blocking progress in the
most promising directions, which leads to an increasing dependence
on the communicating sequential process model and (God help us)
globally coherent shared memory.

Evolution in the natural world is notorious for heading into dead
ends, and major improvement tends to come from extinction and eventual
replacement.

>You see, the problem is not in the instruction sets, processor
>organization, cores and threads, nor system organization: The problem
>is in the power wall, memory wall, and synchronization wall. Until
>solutions to these are found, little can be done except squeeze
>another drop of blood from the stone.

The point here is that beating your head against a brick wall is not
productive; if you can't climb over it (and we can't), the solution
is to go round it. And the current architectures (software AND
hardware) are blocking that.

>But what application is so compelling that it will actually change
>which stone is getting squeezed?

THAT'S the right question, all right. No, I don't have an answer.


Regards,
Nick Maclaren.
From: MitchAlsup on
On Sep 10, 5:39 am, Terje Mathisen <Terje.Mathi...(a)tmsw.no> wrote:
> MitchAlsup wrote:
> > Nobody has solved the synchronization problem that will embolden cores
> > and threads to really deliver on their promise
> > The closest solution to synchronization has been lobotomized by its
> > implementation
>
> Hmmm..., do I hear the bitter sound of an architect twarthed? :-)

More like the sound of the architect ignored--with no actual
bitterness.

But then again, I could have been talking about transactional
memory.....

Mitch
From: Gavin Scott on
Mayan Moudgill <mayan(a)bestweb.net> wrote:
> Thats completely different than working in a field like theoretical
> physics. When I look at the standard model and the mathematics involved
> (non-abelian gauge theory with *3* symmetry groups)....argghhh!!!

So I've been meaning for a while to make a post about this architecture
book I've been reading on and off for a couple months. It describes a
system architecture and its implementations which aren't quite your
typical comp.arch fare, but it's an area that has many parallels to
hardware and software systems in computing.

The architecture described is rather old, but new applications are now
becoming perhaps the leading area of technology growth in this century.

And unlike theoretical physics, it's *surprisingly* accessible to any
halfway intelligent reader.

I highly recommend this work to everyone here in comp.arch as it gives
fascinating insights into very different solutions to the same type of
information handling problems that comp.arch normally discusses.

I can pretty much guarantee you at least one mind-blowing revelation
per chapter based on what I've read so far.

http://www.amazon.com/dp/0815341059

G.