From: MitchAlsup on
On Mar 1, 12:12 pm, Terje Mathisen <"terje.mathisen at tmsw.no">
wrote:
> Even with a very non-bleeding edge gpu, said gpu is far larger than any
> of those x86 cores which many people here claim to be too complicated.

A large number of pipelines all doing the same kinds of work are
actually simpler than a medium number of pipelines all doing different
kinds of work.

Mitch
From: Robert Myers on
On Mar 1, 8:05 pm, Del Cecchi <delcecchinospamoftheno...(a)gmail.com>
wrote:
> Robert Myers wrote:
> > On Mar 1, 1:12 pm, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
> >> MitchAlsup wrote:
> >>> Packages can be made in just about any aspect ratio without any
> >>> (unuseful) change in the cost structure of the package.
> >>> The other thing to note is that the graphics core is 3X as big as the
> >>> x86 core.
> >> That was the one really interesting thing on that die photo:
>
> >> Even with a very non-bleeding edge gpu, said gpu is far larger than any
> >> of those x86 cores which many people here claim to be too complicated.
>
> > Given the bandwidth wall, which, unlike the latency wall, can't be
> > fudged, what *can* you do with these chips besides graphics where you
> > pound the living daylights out of a small dataset.
>
> > Network and disk-bound applications can use many cores in server
> > applications, but that's obviously not what this chip is aimed at.
> .
>
> Isn't that backwards?  bandwidth costs money, latency needs miracles.

Those miracles (aggressive prefetch, out of order, huge cache) have
been being served up for years. We crashed through the so-called
memory wall long ago. It was such a relatively minor problem that
Intel could keep the memory controller off the die for years after
alpha had proven the enormous latency advantage of putting it on die.
More than a decade later, Intel had to use up that "miracle."

There are no bandwidth-hiding tricks. Once the pipe is full, that's
it. That's as fast as things will go. And, as one of the architects
here commented, once you have all the pins possible and you wiggle
them as fast as you can, there is no more bandwidth to be had.

Robert.

From: nik Simpson on
On 3/1/2010 12:12 PM, Terje Mathisen wrote:
> MitchAlsup wrote:
>> Packages can be made in just about any aspect ratio without any
>> (unuseful) change in the cost structure of the package.
>>
>> The other thing to note is that the graphics core is 3X as big as the
>> x86 core.
>
> That was the one really interesting thing on that die photo:
>
> Even with a very non-bleeding edge gpu, said gpu is far larger than any
> of those x86 cores which many people here claim to be too complicated.
>
> Terje
>
Isn't the GPU core still on a 45nm process, vs 32nm for the CPU and cache?

--
Nik Simpson
From: Terje Mathisen "terje.mathisen at on
MitchAlsup wrote:
> On Mar 1, 12:12 pm, Terje Mathisen<"terje.mathisen at tmsw.no">
> wrote:
>> Even with a very non-bleeding edge gpu, said gpu is far larger than any
>> of those x86 cores which many people here claim to be too complicated.
>
> A large number of pipelines all doing the same kinds of work are
> actually simpler than a medium number of pipelines all doing different
> kinds of work.

Mitch, I do know that. :-)

Taken to the logical endpoint, and you get SRAMs before logic in the
same process, new cpu models which are basically cache size increases on
more or less the same core, as well as Larrabee-style heaps of identical
(smallish) cores.

Just like in sw, it is the amount of unique logic that pushes the
complexity limits of the engineers designing it.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: Terje Mathisen "terje.mathisen at on
nik Simpson wrote:
> On 3/1/2010 12:12 PM, Terje Mathisen wrote:
>> Even with a very non-bleeding edge gpu, said gpu is far larger than any
>> of those x86 cores which many people here claim to be too complicated.
>>
>> Terje
>>
> Isn't the GPU core still on a 45nm process, vs 32nm for the CPU and cache?
>
That would _really_ amaze me, if they employed two different processes
on the same die!

Wouldn't that have implications for a lot of other stuff as well, like
required voltage levels?

Have you seen any kind of documentation for this?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"