From: Quadibloc on
On Nov 9, 12:32 am, Robert Myers <rbmyers...(a)gmail.com> wrote:
> Most of what has been said here and elsewhere
> about the uselessness of multiple cores/ multi-threading has been "all
> computing is like the computing I'm used to, and it always will be."

Indeed, I feel this thread has been useful in making explicit my
hidden assumptions.

John Savard
From: nmm1 on
In article <5c7e1d81-06e6-4b50-a16e-82ece380cb2a(a)f20g2000prn.googlegroups.com>,
Quadibloc <jsavard(a)ecn.ab.ca> wrote:
>On Nov 9, 12:32=A0am, Robert Myers <rbmyers...(a)gmail.com> wrote:
>> Most of what has been said here and elsewhere
>> about the uselessness of multiple cores/ multi-threading has been "all
>> computing is like the computing I'm used to, and it always will be."
>
>Indeed, I feel this thread has been useful in making explicit my
>hidden assumptions.

However, almost everything that has been said about the inevitability
of the current semi-independent core designs is also along the lines
of "all hardware is like the hardware I'm used to, and it always will
be." There ARE alternatives, when people stop thinking that way, but
it's extremely unclear which would be generally worthwhile (which also
applies to the current designs).

We live in interesting times ....


Regards,
Nick Maclaren.
From: Quadibloc on
On Nov 9, 8:09 am, n...(a)cam.ac.uk wrote:

> However, almost everything that has been said about the inevitability
> of the current semi-independent core designs is also along the lines
> of "all hardware is like the hardware  I'm used to, and it always will
> be."  There ARE alternatives, when people stop thinking that way, but
> it's extremely unclear which would be generally worthwhile (which also
> applies to the current designs).

The hardware that people are used to isn't the same as the hardware
people are using even now.

What people are used to, of course, is one processor that gets more
powerful by being made faster. So what would be desired as the
successor to a single-core 3 GHz Pentium IV would be a single-core 6
GHz processor of the same type. That would be the most general and the
most useful way to double performance, since the doubling would be
applicable even to a program which can't be made to use more than a
single thread, and in which every instruction is dependent on the
last.

Such a program, though, wouldn't even make full use of the potential
performance of a Pentium IV, because that processor is pipelined - the
early parts of some instructions, *including parts that do arithmetic,
not just memory fetch and opcode decoding, which are unlikely to
depend on previous instructions unless they're branches*, execute in
parallel with the later parts of preceding instructions.

So we could speak of NUMA or we could speak of vector processing.

But I don't see the issue being that people assume "all hardware is
like the kind of hardware that I am used to", but "any hardware that
isn't like the kind of hardware that I am used to will largely go to
waste". The problem is not that people don't realize parallel hardware
exists. It is that they don't see how any method of improving the
coupling between parallel processors will be beneficial - except in a
few specialized cases, additional power that is only available in
parallel and not serially is something people don't feel they can make
use of.

Of course, except for number-crunching or even video game playing,
it's highly unclear that we _need_ vastly more computing power. Yes,
we are being sold more and more bloated operating systems to encourage
us to buy new PCs more often than we really want to... Windows 3.1
will let you turn out really nice business letters on your laser
printer, as far as that goes, if it's an old enough one to have a
driver for it.

More computing power is, of course, useful. But it is of value only
insofar as it is appllied to doing useful work - although I define
"useful" liberally, not just meaning practical and serious work: a
video game entertains those who play it, and even that is useful for
the purpose of my statement here. Creating employment in Redmond - or
at Intel, for that matter - is all I seek to exclude.

Eventually, Moore's Law will peter out, but new developments will let
us achieve a slower rate of progress but of a more useful kind. Thus,
one promising recent development is a process developed at MIT for
using Germanium Nitride in such a way to make chips using established
silicon processes, and yet give them a faster kind of transistor.

And perhaps we will make use of massive parallelism in new ways -
neural nets that can be implanted into the brains of stroke victims -
or even used to allow us to achieve immortality through uploading.

All kinds of useful things may happen. And Windows 7 is said to be
good at making use of the extra power of multicore chips. What I don't
see happening, necessarily, is for the ordinary PC market to lead to
one particular parallel topology becoming seen as the solution; it
could happen, if one happens to be a good fit, but I think it's
perfectly possible that none of them will be helpful, and interesting
things will still happen of other kinds.

John Savard
From: Terje Mathisen on
Chris Gray wrote:
> Terje Mathisen<Terje.Mathisen(a)tmsw.no> writes:
>
>> The usual programming paradigm for such a system is to have many
>> threads running the same algorithm, which means that training
>> information from one thread is likely to be useful for another, or at
>> least not detrimental.
>
> That doesn't mean that the ability to have multiple predictor states
> is bad. You need a way for the OS to tell the CPU "thread with key Y
> is going to be very similar to thread with key X". That means that
> the key Y state should be seeded with the key X state, or that the
> two states can be merged into one larger, more detailed state.
>
> I guess it comes down to the question of just how much value is a
> more accurate predictor - how many gates can you afford, and is it
> worthwhile to need a few extra instructions to initialize it?

Right.

I'm somewhat in love with the idea of a multi-level table:

A large shared but simple (2-bit counter or similar) augmented with a
small exception table per core which only stores branch info for
branches that have missed in the large table.

Is this even feasible? :-)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: nmm1 on
In article <1555953f-4376-4ad9-bac1-3e8812f8f79f(a)t11g2000prh.googlegroups.com>,
Quadibloc <jsavard(a)ecn.ab.ca> wrote:
>
>> However, almost everything that has been said about the inevitability
>> of the current semi-independent core designs is also along the lines
>> of "all hardware is like the hardware =A0I'm used to, and it always will
>> be." =A0There ARE alternatives, when people stop thinking that way, but
>> it's extremely unclear which would be generally worthwhile (which also
>> applies to the current designs).
>
>The hardware that people are used to isn't the same as the hardware
>people are using even now.
>
>What people are used to, of course, is one processor that gets more
>powerful by being made faster. ...

Grrk. Maybe I have spent too long on the bleeding edge. That is
a viewpoint that many of us gave up on 20 years ago ....

In particular, when you are dealing with programs limited by memory
latency, as so many area, there has been very little increase over
the years. 15%, if one is feeling generous, less if not.


Regards,
Nick Maclaren.