From: Tim McCaffrey on
In article <4B8029F9.8020501(a)patten-glew.net>,
ag-news(a)patten-glew.net says...
>
>Tim McCaffrey wrote:
>> Some years ago I read an article in (Electronic Design?)
about a stacked DRAM
>> device. They contended that DRAM is basically an analog
device, and >> if the digital/analog interface was put on a
seperate chip the
>> performance could be improved considerably. >> Their
product/proof-of-concept was a device with one
>> "interface" chip and up to 32 stacked DRAM chips.
>> They claimed 4ns access time (IIRC).
>>
>> Now, put that stack on a processor, actually put several
>> so that you can have multiple memory channels.
>> The CDC 6000 series used 32 banks (now called
>> channels) of memory, the 7000 series used 16 banks,
>> and the 750/760 Cybers had 8.
>> The systems took a 15% performance hit going from 16 to 8.
>>
>> If you have 8 channels, with 4ns access, 64 bit wide,
>> you don't need L2 or L3 cache.
>>
>> - Tim
>
>
>I really wanted to talk briefly about point (0) below,
> and then the neat technical point (
>1). But since (0) grew, I'll do them backwards.
>
>(1) Now, if you don't have the L2 or L3 cache,
> what do you do with the extra silicon area
> on the CPU chip?
>1a) More CPUs? But we may be past the point
> of diminishing returns for all except GPUs.
>1b) Smaller chips? But for small systems,
> there is a minimum economic chip size.
>1c) How about, take these suddenly smaller CPUs,
> and put them on the DRAM digital/analog
> controller chip? But then we
> have to respin this chip every time the
> analog stack is tweaked.
>

You could put more CPUs on (of course), you could put
specialized processors on it (like a bunch of ARMs for I/O
processors (something like the PPs on the CDC machines)) or you
could use the space to enable architectures which really aren't
practical without lots of low latency/high bandwidth memory
(VLIW, stream processors, etc.)

If you have lots (like Larrabee lots) of CPUs, then L1/L2 cache
is still useful in cutting down the memory chatter. Having
faster main memory cuts down the window size for contention.
All big wins.

>
>(0) This saying "if memory is fast enough, you don't
> need L2 or L3 cache" has been the kiss of death
> for many advanced memory projects. People were
> saying this for RAMbus or RAMbus-like memory
> wrt the '486.
>
> Yes, that early, and with more detail:
> if we have a new faster memory, not only do we not
> need the L2$ (and maybe not
> the L1$), but we may not even need fancy
> OOO stuff like P6.
>

It is interesting the EDO memory basically removed the need for
L2 cache (for the 486), and wasn't even that big a change in
the technology of the time. Unfortunately, it showed up in the
middle of the Pentium generation.

>Or perhaps not the kiss of death, since such projects
>seem to get funded. And last years, even decades.
>They just eventually die.
>Perhaps "kiss of eventual moribundity"? Zombie-dom?
>
>Trouble is, the saying is true. Like motherhood and
>apple pie. The problem is the conclusions that you
>take from the saying: an advanced memory project that
>only makes sense if it allows you to eliminate many
>levels of the cache hierarchy. Which doesn't make sense
>if you still have the cache. My takeaway is that you
>should try to create a new advanced memory technology
>that makes sense even with the cache, but which might
>make even more sense if it enables
>getting rid of the cache. Hedge your bets.

I agree with you (I think).

It would seem to me to be a weak argument to spend the
resources to develop a product that had only one selling point.
I could see several advantages (to the company making the
product) to layering the memory on the CPU (or GPU) chip, such
that performance would be "freebie", if it materilized at all.

After you get the new configuration accepted, then you work on
optimizing for performance/power/etc. Note that Intel appears
to be using this approach with Atom.

- Tim