From: Paul Wallich on
Andy "Krazy" Glew wrote:
[...]
> However, some of my informants talk more favorably about lift-off: 3D
> VLSI achieved not by TSV, but by literally taking the "skin" off the top
> of a wafer, and placing it on another chip. Or sticking two wafers or
> chips face to face, and then pulling them apart, with all the good stuff
> on the second water sticking to the first. Multiple times.
>
> Blows my mind.I have trouble imagining good yields, but apparently...
>
> I've been going to a conference where a guy has been reporting on this
> for a decade or so. Started out as a special, GaAs or SiGe on top of Si
> for RF, etc., but approaching mass market over the years.
>
> - - -
>
> Hey, here's a comp-arch topic: "obviously" this greatly increases
> volumetric density - of devices, wires, heat, and, I suspect, flaws.
>
> How much more dense?
>
> Do you give up anything in terms of reliability? Maybe not: with
> multiple layers of metal you have something like a support scaffold. But
> heat...
>
> Does the increase in density provide enough redundancy to offset any
> loss in reliability?
>
> Personally I would bet on DRAM or other memory first. Already being
> stacked. Unclear if need more than edges... but if you can increase BW
> 16X you may be able to reduce power 4X.
>
> What applications (not SW, but systems) would benefit from such
> increased density? Nanobots sailing through your bloodstream?
>
> Really wonderful tiny computer/storage systems can be built. In a few
> years, maybe decades. But I suspect the boxes they go into will not look
> like PCs or laptops or even iPod or iPads.

Does this solve the problems of not being able to fab RAM and logic
optimally in the same process? How expensive are the connections? If
they're expensive, you could just get mind boggling improvements in
bandwidth, as you say, but if they're cheap you're talking about
circuits where the longest wire has become shorter by a factor of 100 or so.

I'd be willing to settle for nanobots embedded in all my physical tools
and materials, and not even nanbots, just nanosensors. First up would be
datagathering: imagine a fleet of car impregnated with bots and memory
to find out what happens to those finite-element models in the real
world. Or a pile of 2x4s and sheetrock similarly set up, so that you
could build a bunch of instrumented houses. No, really instrumented.

The surveillance/memory prosthesis posssibilites are obvious, but we
still need another set of breakthroughs for these brilliant pebbles
(ahem) or whatever to be able to act on the world.

paul
From: Del Cecchi on
Paul Wallich wrote:
> Andy "Krazy" Glew wrote:
> [...]
>> However, some of my informants talk more favorably about lift-off: 3D
>> VLSI achieved not by TSV, but by literally taking the "skin" off the
>> top of a wafer, and placing it on another chip. Or sticking two wafers
>> or chips face to face, and then pulling them apart, with all the good
>> stuff on the second water sticking to the first. Multiple times.
>>
>> Blows my mind.I have trouble imagining good yields, but apparently...
>>
>> I've been going to a conference where a guy has been reporting on this
>> for a decade or so. Started out as a special, GaAs or SiGe on top of
>> Si for RF, etc., but approaching mass market over the years.
>>
>> - - -
>>
>> Hey, here's a comp-arch topic: "obviously" this greatly increases
>> volumetric density - of devices, wires, heat, and, I suspect, flaws.
>>
>> How much more dense?
>>
>> Do you give up anything in terms of reliability? Maybe not: with
>> multiple layers of metal you have something like a support scaffold.
>> But heat...
>>
>> Does the increase in density provide enough redundancy to offset any
>> loss in reliability?
>>
>> Personally I would bet on DRAM or other memory first. Already being
>> stacked. Unclear if need more than edges... but if you can increase BW
>> 16X you may be able to reduce power 4X.
>>
>> What applications (not SW, but systems) would benefit from such
>> increased density? Nanobots sailing through your bloodstream?
>>
>> Really wonderful tiny computer/storage systems can be built. In a few
>> years, maybe decades. But I suspect the boxes they go into will not
>> look like PCs or laptops or even iPod or iPads.
>
> Does this solve the problems of not being able to fab RAM and logic
> optimally in the same process? How expensive are the connections? If
> they're expensive, you could just get mind boggling improvements in
> bandwidth, as you say, but if they're cheap you're talking about
> circuits where the longest wire has become shorter by a factor of 100 or
> so.
>
> I'd be willing to settle for nanobots embedded in all my physical tools
> and materials, and not even nanbots, just nanosensors. First up would be
> datagathering: imagine a fleet of car impregnated with bots and memory
> to find out what happens to those finite-element models in the real
> world. Or a pile of 2x4s and sheetrock similarly set up, so that you
> could build a bunch of instrumented houses. No, really instrumented.
>
> The surveillance/memory prosthesis posssibilites are obvious, but we
> still need another set of breakthroughs for these brilliant pebbles
> (ahem) or whatever to be able to act on the world.
>
> paul
This is one way to enhance embedded dram stuff. However I might note
that IBM's embedded DRAM is pretty good, although of course not
"optimal" and the logic part is excellent. see the power7 stuff for example
From: "Andy "Krazy" Glew" on
Del Cecchi wrote:
>> Andy "Krazy" Glew wrote:
>> [...]
>>> However, some of my informants talk more favorably about lift-off: 3D
....
>>>
>>> Personally I would bet on DRAM or other memory first. Already being
>>> stacked.

> This is one way to enhance embedded dram stuff. However I might note
> that IBM's embedded DRAM is pretty good, although of course not
> "optimal" and the logic part is excellent. see the power7 stuff for
> example

AFAICT Power7 eDRAM (embedded DRAM - not to be confused
with the various other EDRAMs, e.g. Enhanced DRAM)
is basically a cache. Possibly configurable as some locked down
memory. But certainly not even close to being a contender for
a full DRAM system.

Inevitably, this comes first. Because it is easier.

But as for "embedded" DRAM - if you believe the slides presented
by Shekhar Borkhar of Intel, and Tom Pawlowski of Micron
at SC 09, then we are well on the way to the amount opf memory
dwarfing the amount of processor / logic - in all metrics:
area, transistors, volumes, number of chips, etc. Possibly even power.

It may not be DRAM, however.

This tempts me to say "It's not memory embedded on processor,
but processor embedded in memory". At some point the processor
will be only a corner of a die full of memory.

However, I think that is still a long way off. Most likely, we will
pack memory together as tightly as possible. Stacked.

Configurations such as 2, 4, 16 ... DRAM and other memory chips stacked
with a single processor / logic chip will be attractive, not just because
of the need to have separate processor and memory process technology.
But also because 16 memory chips to 1 processor logic chip is about
the right ratio. (Or maybe it needs to be higher.)

The processor will probably flower into more multicores than anyone knows
how to use.

And at that point it will become interesting: when there's no value
in growing the processor die, but where we still need more memory chips.
What will happen to the processor? We'll be looking for stuff to cram
onto the processor, just because it's there.

Pawlowski of Micron also described the desire to have an intelligent
or abstract interface to the DRAM and other memory chips. Not RAS/CAS,
but a protocol that can hide a more intelligent implementation.
Which will inevitably mean logic. But will that logic live on
the DRAM chips, or in some other chip in the stack?
The abstract DRAM interface chip?

If so, you might then have two logic chips in the stack:
the high performance logic chip containing processors from Intel,
and the probably lower performance logic chip containing the abstract
interface. And they will probably compete. Inevitably somebody
will consider putting a processor on the logic interface chip.
Inevitably somebody will want to avoid the middleman by having
the main processor directly control the DRAM chips,
eliminating the abstract interface in favor of low level control.
Just not necessarily RAS/CAS.

Having two logic chips seems wasteful.

But it is probably desirable so long as the CPU and DRAM manufacturers
are separate companies. E.g. Intel vs. Samsung. (Note that Samsung
does both logic and DRAM, whereas Intel does not currently do DRAM,
except via proxies.)

This argument leads me to doubt Pawlowski's advocacy of an abstract DRAM
interface. Unless such an interface is simple enough to live on the DRAM
chips, it will be hard to justify an extra logic chip.

(Think about the AMB on FBDIMM.)

But... I have also learned that, in computers, we often arrive at a point
where performance is outweighed by other concerns. E.g. a good OS can
make a disk fly better when it knows the physical layout - but SCSI et al
won, with an abstract disk interface, because the physical layout was
changing too quickly, and the reliability issues evolving too quickly,
for OSes to manage in software.

Q: are we ever going to get to a similar point, where an abstract DRAM
interface makes sense? Certainly, already have for flash/PCM.
But for DRAM?

I suspect that we will. For servers, certainly. But for consumer PCs?


Y'know, every few years I update a chart of the number of DRAM chips per system.
It was trending down for a very long time. Pawlowski and Borkhar tend to
imply that it has trended up. Has it?
From: MitchAlsup on
On Feb 8, 11:14 pm, "Andy \"Krazy\" Glew" <ag-n...(a)patten-glew.net>
wrote:
> Q: are we ever going to get to a similar point, where an abstract DRAM
> interface makes sense?   Certainly, already have for flash/PCM.
> But for DRAM?

With the understanding that any RAM has three necessary* ports, a port
to insert addresses, a port to insert write data, and a port to remove
read data; I think that an abstract interface is undesirable, what is
desired is more direct access to these three necessarily existing
ports. See USPTO 5367494.

> I suspect that we will.  For servers, certainly.  But for consumer PCs?

In the server space, more DRAM buys performance, since the job of this
DRAM is essentially a cache of that portion of the database that has
been resently accessed. The cruel part of the equation is that it is
easier to make a 1-4TB DRAM system than it is to make the 1-4TB memory
system coherent across the number of server-motherboards that want
access to it.

> Y'know, every few years I update a chart of the number of DRAM chips per system.
> It was trending down for a very long time.  Pawlowski and Borkhar tend to
> imply that it has trended up.  Has it?

It has actually gone up resently. MS is "perhaps" the biggest part of
the problem. My previous PC was perfectly happy with 0.5GB running XP,
however my new PC needs at least 4GB to run Vista to the same
"happiness" level (for me), even after I turned off many of the new
features in Vista. Conversely, an even older PC running Debian is
perfectly happy with 0.25GB.

Mitch
From: Tim McCaffrey on
In article <4B70EF27.6050105(a)patten-glew.net>, ag-news(a)patten-glew.net says...
>
>Del Cecchi wrote:
>>> Andy "Krazy" Glew wrote:
>>> [...]
>>>> However, some of my informants talk more favorably about lift-off: 3D
>...
>>>>
>>>> Personally I would bet on DRAM or other memory first. Already being
>>>> stacked.
>
>> This is one way to enhance embedded dram stuff. However I might note
>> that IBM's embedded DRAM is pretty good, although of course not
>> "optimal" and the logic part is excellent. see the power7 stuff for
>> example
>
>AFAICT Power7 eDRAM (embedded DRAM - not to be confused
>with the various other EDRAMs, e.g. Enhanced DRAM)
>is basically a cache. Possibly configurable as some locked down
>memory. But certainly not even close to being a contender for
>a full DRAM system.
>
>Inevitably, this comes first. Because it is easier.
>
>But as for "embedded" DRAM - if you believe the slides presented
>by Shekhar Borkhar of Intel, and Tom Pawlowski of Micron
>at SC 09, then we are well on the way to the amount opf memory
>dwarfing the amount of processor / logic - in all metrics:
>area, transistors, volumes, number of chips, etc. Possibly even power.
>
>It may not be DRAM, however.
>
>This tempts me to say "It's not memory embedded on processor,
>but processor embedded in memory". At some point the processor
>will be only a corner of a die full of memory.
>
>However, I think that is still a long way off. Most likely, we will
>pack memory together as tightly as possible. Stacked.
>
>Configurations such as 2, 4, 16 ... DRAM and other memory chips stacked
>with a single processor / logic chip will be attractive, not just because
>of the need to have separate processor and memory process technology.
>But also because 16 memory chips to 1 processor logic chip is about
>the right ratio. (Or maybe it needs to be higher.)
>
>The processor will probably flower into more multicores than anyone knows
>how to use.
>
>And at that point it will become interesting: when there's no value
>in growing the processor die, but where we still need more memory chips.
>What will happen to the processor? We'll be looking for stuff to cram
>onto the processor, just because it's there.
>
>Pawlowski of Micron also described the desire to have an intelligent
>or abstract interface to the DRAM and other memory chips. Not RAS/CAS,
>but a protocol that can hide a more intelligent implementation.
>Which will inevitably mean logic. But will that logic live on
>the DRAM chips, or in some other chip in the stack?
>The abstract DRAM interface chip?
>
>If so, you might then have two logic chips in the stack:
>the high performance logic chip containing processors from Intel,
>and the probably lower performance logic chip containing the abstract
>interface. And they will probably compete. Inevitably somebody
>will consider putting a processor on the logic interface chip.
>Inevitably somebody will want to avoid the middleman by having
>the main processor directly control the DRAM chips,
>eliminating the abstract interface in favor of low level control.
>Just not necessarily RAS/CAS.
>
>Having two logic chips seems wasteful.
>
>But it is probably desirable so long as the CPU and DRAM manufacturers
>are separate companies. E.g. Intel vs. Samsung. (Note that Samsung
>does both logic and DRAM, whereas Intel does not currently do DRAM,
>except via proxies.)
>
>This argument leads me to doubt Pawlowski's advocacy of an abstract DRAM
>interface. Unless such an interface is simple enough to live on the DRAM
>chips, it will be hard to justify an extra logic chip.
>
>(Think about the AMB on FBDIMM.)
>
>But... I have also learned that, in computers, we often arrive at a point
>where performance is outweighed by other concerns. E.g. a good OS can
>make a disk fly better when it knows the physical layout - but SCSI et al
>won, with an abstract disk interface, because the physical layout was
>changing too quickly, and the reliability issues evolving too quickly,
>for OSes to manage in software.
>
>Q: are we ever going to get to a similar point, where an abstract DRAM
>interface makes sense? Certainly, already have for flash/PCM.
>But for DRAM?
>
>I suspect that we will. For servers, certainly. But for consumer PCs?
>
>
>Y'know, every few years I update a chart of the number of DRAM chips per
system.
>It was trending down for a very long time. Pawlowski and Borkhar tend to
>imply that it has trended up. Has it?


Some years ago I read an article in (Electronic Design?) about a stacked DRAM
device. They contended that DRAM is basically an analog device, and if the
digital/analog interface was put on a seperate chip the performance could be
improved considerably. Their product/proof-of-concept was a device with one
"interface" chip and up to 32 stacked DRAM chips. They claimed 4ns access
time (IIRC).

Now, put that stack on a processor, actually put several so that you can have
multiple memory channels. The CDC 6000 series used 32 banks (now called
channels) of memory, the 7000 series used 16 banks, and the 750/760 Cybers had
8. The systems took a 15% performance hit going from 16 to 8.

If you have 8 channels, with 4ns access, 64 bit wide, you don't need L2 or L3
cache.

- Tim