How Many Processor Cores Are Enough? [Computer Architecture]

Prev: Trying to design low level hard disk manipulation program
Next: New information on POWER6

From: Bill Todd on 29 Sep 2006 15:19

Thomas Womack wrote:
> In article <7IKdnUrZObRC_4HYnZ2dnUVZ_oKdnZ2d(a)metrocastcablevision.com>,
> Bill Todd <billtodd(a)metrocast.net> wrote:
>> Tommy Thorn wrote:
>>> Bill Todd wrote:
>>>> Well, since IIRC the processing cores are running at a princely 1.91 MHz
>>> Well, you recall incorrectly. It's 3 GHz, cf.
>>> http://www.tomshardware.com/2006/09/27/idf_fall_2006/page2.html
>> You really need to work on your reading comprehension: there is nothing
>> on the page that you cite above that remotely suggests that the
>> prototype runs at 3 GHz (the only mention of that clock rate is in
>> reference to a current P4's FP performance).
>
> Go to the source

I don't have time to scrounge around looking for 'the source' for every
random comment I happen to encounter on the Internet: what I did was
look at the *reference* that was cited, and its content was exactly what
I reported it to be.

>
> http://www.intel.com/pressroom/kits/events/idffall_2006/pdf/IDF%2009-26-06%20Justin%20Rattner%20Keynote%20Transcript.pdf
>
> includes the paragraph
>
> 'We just got the silicon back earlier this week of our Terascale
> Research prototype. And as you can see in the accompanying diagram,
> each one of the 80 cores on this die consists of a simple
> processor. It has a simple instruction set -- not IA compatible; it's
> just has a simple instruction set that lets us do simple computations
> in floating point and basically push data across the on-die fabric
> that connects the 80-cores on this die together. Now, in aggregate,
> those 80 cores produce one teraflop of computing performance, so a
> single one of these Terascale Research prototypes is a one
> teraflop-class processor. It delivers an energy efficiency of 10
> gigaflops per watt at its nominal operating frequency of 3.1
> gigahertz. That's an order of magnitude better than anything available
> today.'

My limited experience with 'first silicon' of a product suggests that
the actual device is then likely running (if indeed it runs yet at all)
at a rather small fraction of 3.1 GHz (3.1 GHz being its 'nominal'
target, which anyone who remembers Itanic's early clock-rate targets
will understand sometimes bears rather little resemblance to reality).

>
> and
>
> 'For each core on that die, there's a 256-kilobyte static RAM, and the
> aggregate bandwidth between all of the cores and all of those SRAM
> arrays is one trillion bytes per second, truly an astonishing amount
> of memory bandwidth.'

An interesting comment, since the 256 KB of SRAM per core is quite
comparable to today's amount of per-core L2 cache, which has per-core
bandwidth comparable to that of each of those mini-cores.

I.e., sounds kind of uninspiring. One relevant observation is that,
while stacking the SRAM chip on top of the processor chip is
interesting, it only effectively doubles the total chip area (i.e., is
equivalent to about one process shrink). If one could stack multiple
layers of SRAM this might start to become more interesting (assuming
that the stack remained coolable and the additional layers didn't
increase access latency over-much).

>
>> The reference that gives the 1.91 MHz figure is
>> http://www.theinquirer.net/default.aspx?article=34623
>
> That's talking about a completely different project; if nothing else,
> it's explicitly described as IA-compatible (and is running WinXP),
> whilst the terascale chip is explicitly described as not.

And now that you've actually provided a source with such information in
it, that is clear - but it certainly wasn't earlier. In particular, the
Inq article describing the IA-compatible 'mini-core' effort does so
explicitly in the context of Intel's 'Tera-Scale' effort.

- bill

From: Eugene Miya on 29 Sep 2006 17:07

>|> >|> PIMs.
>|> >When are we going to see them, then?
>|> We? "What do you mean 'we?' white man?" --Tonto

In article <efil5m$r5j$1(a)gemini.csx.cam.ac.uk>,
Nick Maclaren <nmm1(a)cus.cam.ac.uk> wrote:
>Well, actually, many people have. The ICL DAP (and, I believe, the BBN
>Butterfly) could well be classified as prototypes. The issue is when
>(and if!) they will be available openly enough and cheaply enough for
>a wide range of people to experiment with. And 20+ years from being
>the next great thing to mere NDA isn't exactly rapid progress ....

I saved a DAP for the CHM, and I used the BBN and I think I have
succeeded to locating one surviving representative sitting out in a
field near Denver. No, the Butterfly, Monarch, and TC2000 could not be
classified as PIMs. Their 88Ks, etc. were much more heavy heavy weight
processors. Similarly, while the DAP's where bit serial, their number
were/are comparatively small and with a wider address space.

Jon sits in a unique position that he can go an talk to Dave who is on a
different floor. I have to ensure that despite whatever happens to one
of the real PIMs, that a representative machines gets preserved even if
20 years or more after the fact. They aren't general purpose (yet, if
ever), and they aren't going to run Fortran or other conventional
language just yet, but they are popular where they sit.

>|> >Seriously, they have been talked about as imminent for 20 years, so
>|> >either there is a major problem or the IT industry is suffering a
>|> >collective failure of nerve. Or both.
>|>
>|> You have to locate the knowledgeable in your country.
>
>Eh? Delivery is as delivery does. Damn the claims - let's see the
>products.

Go talk to your friends in that big, circular, round building NW of London
near that town of Ch.*m..... They surely must have asked for some.

--

From: Chris Thomasson on 29 Sep 2006 17:58

"Joe Seigh" <jseigh_01(a)xemaps.com> wrote in message
news:us2dnfMMPJUnGYHYnZ2dnUVZ_sWdnZ2d(a)comcast.com...
> Chris Thomasson wrote:
>> "Joe Seigh" <jseigh_01(a)xemaps.com> wrote in message
>> news:e7KdnXEkRJml04HYnZ2dnUVZ_tydnZ2d(a)comcast.com...

[...]

>> Any thoughts' on my PDR w/ hardware assist design? My idea can scale to
>> any number of processors. Lock-free reader patterns can scale. Period.
>
> I don't know. I haven't event seen McKenney file any hardware patents in
> that area and he would have been the likely one to do that kind of stuff.

No kidding; he already has tons of RCU patents... In one of his bibliography
pages he even has links to your initial RCU+SMR hybrid idea. I wonder when
we are going to see patents for it...

> The IPC would be more than just PDR. The whole memory model could change
> and they go to something like Occam style message passing.

Not good. I don't want to be forced to use message passing. Especially when
we can create highly efficient virtually zero-overhead message passing
paradigms' already:

http://groups.google.com/group/comp.programming.threads/msg/6c24995ab986d410

http://groups.google.com/group/comp.programming.threads/msg/301e9153bcecf97c
(this is a good one*)

http://appcore.home.comcast.net/

;)

I hope they don't implement a model that gets away from shared memory! The
current thinking seems to be that threading and shared memory in general is
just way to complicated for any programmer to even begin to grasp:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/b192c5ffe9b47926

http://groups.google.com/group/comp.programming.threads/msg/c3a0416b829b5dc4

What a shame! If people actually start to listen to non-sense like this,
they will always be cutting themselves short... The argument that threading
and shared memory is too complex/fragile is utterly false!

http://groups.google.com/group/comp.lang.c++.moderated/msg/d07b79e9633f3e52

:)

> Because I don't
> think the current strongly coherent cache scheme will scale up.

I agree. Why do you think the current trend seems to involve strong cache? I
could just imagine what the cache coherence protocol will look like for HTM!
It will probably have to be a bit stronger that what they have now:

http://groups.google.com/group/comp.programming.threads/msg/bacc295093eeb1fd

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/f6399b3b837b0a40

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/9c572b709248ae64

> Of course that PDR supports a more relaxed cache/memory model doesn't hurt
> things.

Yup. It is a major plus, IMHO... I would support partially hardware assisted
PDR over some hardware based message passing. Like I said before, we can
create our own forms of scaleable IPC right now, we don't need hardware for
that... Do we? Na...

:)

From: Chris Thomasson on 29 Sep 2006 18:02

"Chris Thomasson" <cristom(a)comcast.net> wrote in message
news:lsqdnXA_RfTjCYDYnZ2dnUVZ_q-dnZ2d(a)comcast.com...
> "Joe Seigh" <jseigh_01(a)xemaps.com> wrote in message
> news:us2dnfMMPJUnGYHYnZ2dnUVZ_sWdnZ2d(a)comcast.com...
>> Chris Thomasson wrote:
>>> "Joe Seigh" <jseigh_01(a)xemaps.com> wrote in message
>>> news:e7KdnXEkRJml04HYnZ2dnUVZ_tydnZ2d(a)comcast.com...

[...]

> http://groups.google.com/group/comp.programming.threads/msg/301e9153bcecf97c
> (this is a good one*)

This particular message passing scheme works very well. It out performs many
of the existing message passing designs; by wide margins... The simple trick
is to augment unbounded virtually zero-overhead single-produce/consumer
queuing with an implementation of Petersons Algorithm...

You can't really beat this setup...

Any thoughts?

From: rohit.nadig@gmail.com on 29 Sep 2006 23:10

I want to ask you guys, if it makes sense to take a slightly holistic
perspective to the question.

Will software vendors ALWAYS build applications that harness the
horse-power of a CPU?

Microsoft's latest OS (Vista) is quite bulky, and I asked myself if I
needed the extra bells-and-whistles in the OS. The answer is an
astounding yes.

The number of people that spend 8 or more hours on a computer has gone
up. Workers in banks and businesses use a computer as the "tool of
their trade".

My own personal example is a testimony to the CPU. I use google search
for EVERYTHING. Information on Rashes or allergies, quotations,
technical articles, news...

On the typical week-night, I am using my 1.8ghz laptop to do all of the
following simultaneously:

- Reading news articles
- Working on a remote unix host using VNC
- Playing a video on youtube in a minimized window (I listen to the
video if its a talk show)
- logged on google talk and yahoo messenger
- Responding to work email on Microsoft Outlook

I probably have 500 threads- 600 threads on my laptop. I think we are
still far away from the utopia of computing. There's plenty of CPU
hungry applications that havent been born yet.

-Rohit

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Prev: Trying to design low level hard disk manipulation program
Next: New information on POWER6