Synthesise latest CPU design in an old library. [Computer Architecture]

Prev: CFP: Informatics 2010: new date - until 15 March 2010
Next: Call for papers: HPCS-10, USA, July 2010

From: jgd on 10 Mar 2010 17:51

In article
<0d23d5f4-3ea2-4976-9eff-4b0d2b2e3089(a)g10g2000yqh.googlegroups.com>,
alertjean(a)rediffmail.com (Jean) wrote:

> I was trying to see the impact of computer architecture improvements
> on CPU performance compared to technology.

Sadly, making this kind of study practical is not high on the priorities
of chip designers. The effects of the programmer-visible architecture
have become fairly unimportant in the last 10-15 years; the quality of
the implementation of superscalar execution, speculative execution, and
such things that programmers prefer not to worry about too much, has
become far more important. Along with cache and memory subsystem design.

> Has anyone studied/analyzed the performance impact when a latest CPU
> design is synthesized in an old design library (Say a 45nm in 1um ?) ?

I'll venture a guess of 100% performance loss. That is, it would not
work at all. You're increasing the feature size by a factor of 20 and
thus the area by a factor of 400, in round numbers. Making something
that large which works, even on an old process, sounds Much Too Hard for
practicality.

> Can this be simulated by running a latest CPU at lower clock frequency
> (Few MHz)?

Not really, because the memory-to-cpu speed ration will almost certainly
be different to the old chip.

--
John Dallman, jgd(a)cix.co.uk, HTML mail is treated as probable spam.

From: Ken Hagan on 11 Mar 2010 04:53

On Wed, 10 Mar 2010 22:51:34 -0000, <jgd(a)cix.compulink.co.uk> wrote:

>> Has anyone studied/analyzed the performance impact when a latest CPU
>> design is synthesized in an old design library (Say a 45nm in 1um ?) ?
>
> [...] Making something that large which works, even on an old process,
> sounds Much Too Hard for practicality.

Would it be easier to make an old design in a new process?

>> Can this be simulated by running a latest CPU at lower clock frequency
>> (Few MHz)?
>
> Not really, because the memory-to-cpu speed ration will almost certainly
> be different to the old chip.

That might be one of the things the OP wanted to investigate.

To wit: how many of the changes between the old and new designs are simply
coping with the difference in memory speeds, how many are taking advantage
of the new speeds to do something that wasn't previously possible, and how
many of the changes are truly clever things that could have been done 20
years ago if only someone had thought of trying it.

From: "Andy "Krazy" Glew" on 11 Mar 2010 10:12

jgd(a)cix.compulink.co.uk wrote:
> In article
> <0d23d5f4-3ea2-4976-9eff-4b0d2b2e3089(a)g10g2000yqh.googlegroups.com>,
> alertjean(a)rediffmail.com (Jean) wrote:
>
>> I was trying to see the impact of computer architecture improvements
>> on CPU performance compared to technology.
>>
>> Has anyone studied/analyzed the performance impact when a latest CPU
>> design is synthesized in an old design library (Say a 45nm in 1um ?) ?
>
> I'll venture a guess of 100% performance loss. That is, it would not
> work at all. You're increasing the feature size by a factor of 20 and
> thus the area by a factor of 400, in round numbers. Making something
> that large which works, even on an old process, sounds Much Too Hard for
> practicality.
>
>> Can this be simulated by running a latest CPU at lower clock frequency
>> (Few MHz)?
>
> Not really, because the memory-to-cpu speed ration will almost certainly
> be different to the old chip.

I think that it would be more practical to take the old design and simulate it a new design library.

E.g. evaluate an 8086 with modern parameters. Or a i486, if that's what you care about.

Take the 8086. If I remember correctly, circa 1 instruction, 1 data memory reference per instruction. No cache.
Difdn't matter much when memory was one cycle away, but now memory is circa 100 cycles away.

If I have done the math right, an 8086 on a modern process would be (a) tiny, less than 1 thousandth the size of modern
machines, but also (b) slow, less than 1/100th the performance.

The difference is architecture. However, the biggest architectural benefits are "obvious": increase the bus width;
read a cache line of instructions at a time, rather than one instruction at a time; add caches. All of this is
architecture; some of it is architecture that is now obvious, but it was not necessarily obvious at the time.

The old "architecture vs technology" debate is really not about architecture, but about what parts of architecture are
so obvious that even a process technologist or logic designer with little training in computer architecture would get
right - and what parts are not so obvious. It's a moving boundary.

From: Stephen Fuld on 11 Mar 2010 11:42

On 3/11/2010 7:12 AM, Andy "Krazy" Glew wrote:

snip

> Take the 8086. If I remember correctly, circa 1 instruction, 1 data
> memory reference per instruction. No cache. Difdn't matter much when
> memory was one cycle away, but now memory is circa 100 cycles away.
>
> If I have done the math right, an 8086 on a modern process would be (a)
> tiny, less than 1 thousandth the size of modern machines, but also (b)
> slow, less than 1/100th the performance.
>
> The difference is architecture. However, the biggest architectural
> benefits are "obvious": increase the bus width; read a cache line of
> instructions at a time, rather than one instruction at a time; add
> caches. All of this is architecture; some of it is architecture that is
> now obvious, but it was not necessarily obvious at the time.

"Not obvious at the time", or "obviously not needed with the technology
of the time"? That is, if memory is only one cycle away, no one would
want to spend a lot of silicon on a cache because a rational analysis
would say that even if you knew ow to do the cache, it wouldn't buy you
anything.

I view the near ubiquitous inclusion of caches as not a "new invention"
that wasn't obvious at the time, but a good reaction to changing
technology of processors getting faster more quickly than memories
getting faster.

Similarly, going to say a 32 bit wide interface on an 8086 would be a
waste of money. More pins, and the requirement for more memory chips in
return for essentially no performance gain. It was a good engineering
trade off analysis, not lack of knowledge that drove the decision.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

From: Terje Mathisen "terje.mathisen at on 11 Mar 2010 13:11

Andy "Krazy" Glew wrote:
> I think that it would be more practical to take the old design and
> simulate it a new design library.
>
> E.g. evaluate an 8086 with modern parameters. Or a i486, if that's what
> you care about.

A 486 is still relevant: Each half of a Pentium is quite similar to a
486 in performance, and both Atom and Larrabee have brought back a lot
of the original Pentium architecture: Simple in-order pipelines, very
low power used for housekeeping issues.
>
> Take the 8086. If I remember correctly, circa 1 instruction, 1 data
> memory reference per instruction. No cache. Difdn't matter much when
> memory was one cycle away, but now memory is circa 100 cycles away.

Memory was 4 cycles away, per byte, including both opcode bytes and any
data reads and/or writes.

I.e. a complicated memory-updating instruction could be 5 instruction
bytes reading 2 bytes and writing the same two bytes back: Since the
total was 5+2+2=9 bytes, the absolute minimum running time was 36 cycles.

The original IBM PC and clones all ran at 4.77 MHz, but dram refresh
stole a few percent of available bandwidth, so for most practical
considerations you could guess that your code would take 1 us per byte.

It was only when you used the few _really_ slow instructions like DIV
(and MUL) that the 6-byte opcode prefetch buffer would have time to fill up.

If you implement the exact same process today, without adding any caches
or increasing the bus width, the cpu would be about an order of
magnitude faster, still limited by the speed of dram.

> If I have done the math right, an 8086 on a modern process would be (a)
> tiny, less than 1 thousandth the size of modern machines, but also (b)
> slow, less than 1/100th the performance.
>
> The difference is architecture. However, the biggest architectural
> benefits are "obvious": increase the bus width; read a cache line of
> instructions at a time, rather than one instruction at a time; add
> caches. All of this is architecture; some of it is architecture that is
> now obvious, but it was not necessarily obvious at the time.
>
> The old "architecture vs technology" debate is really not about
> architecture, but about what parts of architecture are so obvious that
> even a process technologist or logic designer with little training in
> computer architecture would get right - and what parts are not so
> obvious. It's a moving boundary.

Indeed.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

| Next | Last
Pages: 1 2 3 4 5
Prev: CFP: Informatics 2010: new date - until 15 March 2010
Next: Call for papers: HPCS-10, USA, July 2010