From: nedbrek on
Hello all,

"Robert Myers" <rbmyersusa(a)gmail.com> wrote in message
news:66eeb001-7f72-4ad6-afbb-7bdcb5a0275b(a)y11g2000yqh.googlegroups.com...
> On Mar 1, 8:05 pm, Del Cecchi <delcecchinospamoftheno...(a)gmail.com>
> wrote:
>> Isn't that backwards? bandwidth costs money, latency needs miracles.
>
> There are no bandwidth-hiding tricks. Once the pipe is full, that's
> it. That's as fast as things will go. And, as one of the architects
> here commented, once you have all the pins possible and you wiggle
> them as fast as you can, there is no more bandwidth to be had.

Robert is right. Bandwidth costs money, and the budget is fixed (no more
$1000 CPUs, well, not counting the Extreme/Expensive Editions).

Pin count is die size limited, and die stopped growing at Pentium 4. If we
don't keep using all the transistors, it will start shrinking (cf. Atom).

That's all assuming CPUs available to mortals (Intel/AMD). If you're IBM,
then you can have all the bandwidth you want.

Ned


From: nmm1 on
In article <hmiqg4$ads$1(a)news.eternal-september.org>,
nedbrek <nedbrek(a)yahoo.com> wrote:
>"Robert Myers" <rbmyersusa(a)gmail.com> wrote in message
>news:66eeb001-7f72-4ad6-afbb-7bdcb5a0275b(a)y11g2000yqh.googlegroups.com...
>> On Mar 1, 8:05 pm, Del Cecchi <delcecchinospamoftheno...(a)gmail.com>
>> wrote:
>
>>> Isn't that backwards? bandwidth costs money, latency needs miracles.
>>
>> There are no bandwidth-hiding tricks. Once the pipe is full, that's
>> it. That's as fast as things will go. And, as one of the architects
>> here commented, once you have all the pins possible and you wiggle
>> them as fast as you can, there is no more bandwidth to be had.
>
>Robert is right. Bandwidth costs money, and the budget is fixed (no more
>$1000 CPUs, well, not counting the Extreme/Expensive Editions).

Actually, Del is. Robert is right that, given a fixed design, bandwidth
is pro rata to budget.

>Pin count is die size limited, and die stopped growing at Pentium 4. If we
>don't keep using all the transistors, it will start shrinking (cf. Atom).
>
>That's all assuming CPUs available to mortals (Intel/AMD). If you're IBM,
>then you can have all the bandwidth you want.

Sigh. Were I given absolute powers over Intel, I could arrange to
have a design produced with vastly more bandwidth (almost certainly
10x, perhaps much more), for the same production cost.

All that is needed is four things:

1) Produce low-power (watts) designs for the mainstream, to
enable the second point.

2) Put the memory back-to-back with the CPU, factory integrated,
thus releasing all existing memory pins for I/O use. Note that this
allows for VASTLY more memory pins/pads.

3) Lean on the memory manufacturers to deliver that, or buy up
one and return to that business.

4) Support a much simpler, fixed (large) block-size protocol to
the first-level I/O interface chip. Think HiPPI, taken to extremes.

The obstacles are mainly political and marketdroids. Intel is big
enough to swing that, if it wanted to. Note that I am not saying
it should, as it is unclear that point (4) is the right direction
for a general-purpose chip. However, points (1) to (3) would work
without point (4).

Also, note that I said "arrange to have a design produced". The
hardware experts I have spoken to all agree that is technically
feasible. Intel is big enough to do this, IF it wanted to.


Regards,
Nick Maclaren.
From: nik Simpson on
On 3/2/2010 1:17 AM, Terje Mathisen wrote:
> nik Simpson wrote:
>> On 3/1/2010 12:12 PM, Terje Mathisen wrote:
>>> Even with a very non-bleeding edge gpu, said gpu is far larger than any
>>> of those x86 cores which many people here claim to be too complicated.
>>>
>>> Terje
>>>
>> Isn't the GPU core still on a 45nm process, vs 32nm for the CPU and
>> cache?
>>
> That would _really_ amaze me, if they employed two different processes
> on the same die!
>
> Wouldn't that have implications for a lot of other stuff as well, like
> required voltage levels?
>
> Have you seen any kind of documentation for this?
>
> Terje
>
That's certainly the case for the Clarksdale/Westmere parts with
integrated graphics...

http://www.hardocp.com/article/2010/01/03/intel_westmere_32nm_clarkdale_core_i5661_review/

--
Nik Simpson
From: Stephen Fuld on
On 3/1/2010 7:15 PM, Robert Myers wrote:
> On Mar 1, 8:05 pm, Del Cecchi<delcecchinospamoftheno...(a)gmail.com>
> wrote:
>> Robert Myers wrote:
>>> On Mar 1, 1:12 pm, Terje Mathisen<"terje.mathisen at tmsw.no"> wrote:
>>>> MitchAlsup wrote:
>>>>> Packages can be made in just about any aspect ratio without any
>>>>> (unuseful) change in the cost structure of the package.
>>>>> The other thing to note is that the graphics core is 3X as big as the
>>>>> x86 core.
>>>> That was the one really interesting thing on that die photo:
>>
>>>> Even with a very non-bleeding edge gpu, said gpu is far larger than any
>>>> of those x86 cores which many people here claim to be too complicated.
>>
>>> Given the bandwidth wall, which, unlike the latency wall, can't be
>>> fudged, what *can* you do with these chips besides graphics where you
>>> pound the living daylights out of a small dataset.
>>
>>> Network and disk-bound applications can use many cores in server
>>> applications, but that's obviously not what this chip is aimed at.
>> .
>>
>> Isn't that backwards? bandwidth costs money, latency needs miracles.

I'm with Del here, though the way I head it is that bandwidth is only
money, but latency is forever.

> Those miracles (aggressive prefetch, out of order, huge cache) have
> been being served up for years.

Yes, but they are running out of steam. Caches are a diminishing
returns game, and there seem to be limits on the others.

> We crashed through the so-called
> memory wall long ago.

No, we just moved it out some.

> It was such a relatively minor problem that
> Intel could keep the memory controller off the die for years after
> alpha had proven the enormous latency advantage of putting it on die.
> More than a decade later, Intel had to use up that "miracle."
>
> There are no bandwidth-hiding tricks. Once the pipe is full, that's
> it. That's as fast as things will go. And, as one of the architects
> here commented, once you have all the pins possible and you wiggle
> them as fast as you can, there is no more bandwidth to be had.

But there are lots of things we could do given enough money. For
example, we could integrate the memory on chip or on an MCM to eliminate
the pin restrictions. We are also not near the limit of pin wiggling speed.

I cite as a counter example, that if we had wanted more bandwidth and
were willing to pay some more, and sacrifice some latency, we would all
be using more banks of FB-DIMMs.


--
- Stephen Fuld
(e-mail address disguised to prevent spam)
From: Stephen Fuld on
On 3/2/2010 6:45 AM, nik Simpson wrote:
> On 3/2/2010 1:17 AM, Terje Mathisen wrote:
>> nik Simpson wrote:
>>> On 3/1/2010 12:12 PM, Terje Mathisen wrote:
>>>> Even with a very non-bleeding edge gpu, said gpu is far larger than any
>>>> of those x86 cores which many people here claim to be too complicated.
>>>>
>>>> Terje
>>>>
>>> Isn't the GPU core still on a 45nm process, vs 32nm for the CPU and
>>> cache?
>>>
>> That would _really_ amaze me, if they employed two different processes
>> on the same die!
>>
>> Wouldn't that have implications for a lot of other stuff as well, like
>> required voltage levels?
>>
>> Have you seen any kind of documentation for this?
>>
>> Terje
>>
> That's certainly the case for the Clarksdale/Westmere parts with
> integrated graphics...
>
> http://www.hardocp.com/article/2010/01/03/intel_westmere_32nm_clarkdale_core_i5661_review/


I think the confusion here is that Clarkdale is a multi-chip module (CPU
chip plus graphics chip in one package) whereas Sandy Bridge is a single
chip.


--
- Stephen Fuld
(e-mail address disguised to prevent spam)