How Many Processor Cores Are Enough? [Computer Architecture]

Prev: Trying to design low level hard disk manipulation program
Next: New information on POWER6

From: Nick Maclaren on 28 Sep 2006 05:40

In article <%CESg.422$fP5.194(a)news.cpqcorp.net>,
Rick Jones <rick.jones2(a)hp.com> writes:
|> Casper H.S. Dik <Casper.Dik(a)sun.com> wrote:
|> > It all depends on the bandwidth. (Which means it ain't a pretty
|> > picture for Intel as long as they keep the FSB)
|>
|> Is it really just a question of bandwidth? I would have thought that
|> application (I'm assuming the system vendors deal with the OSes)
|> behaviour would be equally important.

Yes and no. The bandwidth is definitely the leading bottleneck.

|> How different is having an FSB for a single socket with N cores on the
|> chip than having a "link" for a single-socket with N cores on the
|> chip?

Not at all.

|> I would think that as the cores per chip increase, the issues that the
|> folks selling large SMP's deal with will become known to the
|> single-socket crowd.

Yup. They are already hitting them.

But, to answer the question:

For most workstations, the answer is probably 4 (at least in the near
future - see later), because few workloads have more than a few genuinely
active threads. Very fancy graphics is another matter.

For servers and embarassingly parallel HPC, the answer is until you have
saturated the bandwidth (modified by the demands of the applications).
Multiple cores is merely a cheaper form of multiple sockets.

For genuinely parallel, high communication, applications, the answer is
how parallelisable is your application? And the answer to THAT (outside
HPC) is generally "2 way, when I am lucky".

The last is not a law of nature, but isn't going to change any time soon,
as it is caused by the programming paradigms that people use.

Regards,
Nick Maclaren.

From: Joe Seigh on 28 Sep 2006 07:47

Jon Forrest wrote:
> Today I read that we're going to get quad-core processors
> in 2007, and 80-core processors in 5 years. This has
> got me to wondering where the point of diminishing returns
> is for processor cores.
....
> Where do you think the point of diminishing returns might
> be?
>

Rethinking this, the question should be what would you do
with an unlimited number of processors?

For one thing, the operating system would change. Interrupt
handlers for asynchronous interrupts would go away. You'd have
dedicated, possibly special purpose, processors to handle devices.
They're already talking about this with "coprocessors".

The scheduler would go away. No need for it when every thread has
it's own dedicated hardware thread. This would affect realtime
programming. No need to play games with thread priorities and
any of the timeouts that could be caused by not being scheduled
quickly enough, i.e. no dispatch latency.

Polling and IPC mechanisms would have to be worked on a bit. E.g.
make things like MONITOR/MWAIT efficient. Possibly some new
instructions. The hw architects would have to be a little more
proactive here. The latest proposals from Intel seem to be a little
lacking here. What's with architectual extensions? It seems to be
a "ready to fight the last war" kind of thing. Who cares if you can
run a 20 year old application real fast.

Distributed algorithms would become more important. How do you
coordinate threads and how do you do it efficiently from a hw
point of view.

Etc... (more stuff when I think of it)

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

From: Bill Todd on 28 Sep 2006 08:58

Terje Mathisen wrote:
> Casper H.S. Dik wrote:
>> Jon Forrest <forrest(a)ce.berkeley.edu> writes:
>>
>>> Today I read that we're going to get quad-core processors
>>> in 2007, and 80-core processors in 5 years. This has
>>> got me to wondering where the point of diminishing returns
>>> is for processor cores.
>>
>> Sun has been shipping 8 core CPUs since, I think, late last year.
>>
>> It all depends on the bandwidth. (Which means it ain't a pretty
>> picture for Intel as long as they keep the FSB)
>
> That 80-core Intel demo chip has a vertically mounted SRAM chip as well,
> providing 20 MB (afair) directly to each code.
>
> For any problem where those 20 MB * 80 = 1.6 GB of SRAM can hold
> everything in a nicely distributed manner, you're going to see _very_
> impressive performance indeed, particularly since they also have a
> (presumably very fast) mesh network connecting the individual cores.

Well, since IIRC the processing cores are running at a princely 1.91 MHz
(allegedly not a typo) I'm not sure how truly impressive that demo's
performance would be: perhaps better to wait for the real thing in
around 5 years' time.

As for the SRAM, I rather suspect that the 20 MB is the *total* figure
shared among the 80 cores: if Intel could really get 1.6 GB of SRAM on
anything like a single chip, we'd be seeing a lot more cache in Itanics.

- bill

From: Terje Mathisen on 28 Sep 2006 11:25

Joe Seigh wrote:
> Jon Forrest wrote:
>> Today I read that we're going to get quad-core processors
>> in 2007, and 80-core processors in 5 years. This has
>> got me to wondering where the point of diminishing returns
>> is for processor cores.
> ...
>> Where do you think the point of diminishing returns might
>> be?
>>
>
> Rethinking this, the question should be what would you do
> with an unlimited number of processors?
>
> For one thing, the operating system would change. Interrupt
> handlers for asynchronous interrupts would go away. You'd have
> dedicated, possibly special purpose, processors to handle devices.
> They're already talking about this with "coprocessors".

You still need some what to handle async inter-core communication! I.e.
I believe that you really don't have any choice here, except to make
most of your cores interruptible.

This leads back to the old thread about having multiple cores which are
compatible but not symmetrical: I.e. some of them are optimized for long
timeslots doing stream/HPC/serious number crunching, using a
microachitecture like the P4 which really doesn't like to be interrupted.

Other cores could be much more Pentium-like: Possibly superscalar, but
in-order, with very low branch miss penalty, and optimized for
twisty/branchy/hard to predict code.

As long as these cpus are compatible, an OS which knows that some
processes prefer to run on a given kind of cpu could do quite well, and
the programming task becomes _much_ easier than for a disjoint set as
used in the PPC/Cell combination.

> The scheduler would go away. No need for it when every thread has
> it's own dedicated hardware thread. This would affect realtime
> programming. No need to play games with thread priorities and
> any of the timeouts that could be caused by not being scheduled
> quickly enough, i.e. no dispatch latency.

I believe you'd still need it, but not for anything that's timecritical.
I.e. after sufficient time with tens of cores/hundreds of threads
available, programming patterns to use/abuse them all will turn up, and
you'll run out of resources anyway. :-(

Terje

--
- <Terje.Mathisen(a)hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

From: Terje Mathisen on 28 Sep 2006 11:29

Bill Todd wrote:
> Terje Mathisen wrote:
>> That 80-core Intel demo chip has a vertically mounted SRAM chip as
>> well, providing 20 MB (afair) directly to each code.
>>
>> For any problem where those 20 MB * 80 = 1.6 GB of SRAM can hold
>> everything in a nicely distributed manner, you're going to see _very_
>> impressive performance indeed, particularly since they also have a
>> (presumably very fast) mesh network connecting the individual cores.
>
> Well, since IIRC the processing cores are running at a princely 1.91 MHz
> (allegedly not a typo) I'm not sure how truly impressive that demo's
> performance would be: perhaps better to wait for the real thing in
> around 5 years' time.

I don't think we have other option than to wait, no matter what the
current speed is.

However, if they really run at 2 MHz, then the claimed TB/s total
bandwidth seems totally bogus, even if they also include 4 sets of
cpu-cpu mesh links.
>
> As for the SRAM, I rather suspect that the 20 MB is the *total* figure
> shared among the 80 cores: if Intel could really get 1.6 GB of SRAM on
> anything like a single chip, we'd be seeing a lot more cache in Itanics.

Oops, you're almost certainly right. :-(

Oh, well. It was fun as long as the fantasy lasted. :-)

Terje

--
- <Terje.Mathisen(a)hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
Prev: Trying to design low level hard disk manipulation program
Next: New information on POWER6