RISC load-store verses x86 Add from memory. [Computer Architecture]

Prev: Call for benchmarks: proposals by 30 June
Next: Vaporizing dust during chip manufacturing ?

From: Terje Mathisen "terje.mathisen at on 17 Jun 2010 07:02

Brett Davis wrote:
> RISC load-store verses x86 Add from memory.
> t = a->x + a->y;
>
> RISC
> load x,a[0]
> load y,a[1]
> add t = x,y
>
> x86
> load x,a[0]
> add t = x,a[1]

This should really be:

mov eax,[a.x]
add eax,[a.y]

> Same number of loads, so dont fall into the trap of saying
> more x86 instructions cause slow loads...
>
> What am I missing.

On an OoO machine, there is absolutely no difference between the two
methods: They both perform two load operations, allocate three (renamed)
target registers (one for each basic/micro operation), and have the same
total latency.

Assuming a single 1-cycle L1 load/cycle gives 3 cycles of total latency
for both models, with a 2-cycle L1 we end up with 4 cycles.

If the L1 cache can supply two load ops/cycle, then we save one cycle of
latency for both cpu types.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

From: jacko on 17 Jun 2010 07:14

You also imply RISC has to use multiporting. http://nibz.googlecode.com

From: MitchAlsup on 17 Jun 2010 10:34

On Jun 17, 12:33 am, Brett Davis <gg...(a)yahoo.com> wrote:
> What am I missing.

The measure of performance is not instructions, but the <lack of> time
it takes to execute the semantic contents of the program. Given the
program at hand, there will be more gate delays of logic to execute
the x86 program than to execute the RISC program. x86 instructions
take more gates to parse and decode, the x86 data path has an
additional operand formatting multiplexer in the integer data path,
and some additional logic after integer computations to deal with
partial word writes in the register files. So, instead of having about
an 80 gate delay per instruction pipeline, the x86 has about a 100
gate delay instruction pipeline. An extra pipe stage and you can make
this added complexity almost vanish.

Where x86 wins, is that they (now Intel; AMD used to do this too) can
throw billions at FAB development technology (i.e. making faster
transistors and faster interconnect wire).

Secondarily, once you microarchitect a fully out-of-order processor,
it really does not mater what the native instruction set is. ReRead
that previous sentance until you understand. The native instruction
set no longer maters once the microarchitecture has gone fully OoO!

Mitch

From: Anton Ertl on 17 Jun 2010 11:24

Stephen Sprunk <stephen(a)sprunk.org> writes:
>On 17 Jun 2010 00:33, Brett Davis wrote:
>> RISC load-store verses x86 Add from memory.
>> t = a->x + a->y;
>>
>> RISC
>> load x,a[0]
>> load y,a[1]
>> add t = x,y
>
>load r1, a[0]
>load r2, a[1]
>add r3, r1, r2
>store t, r3
>
>> x86
>> load x,a[0]
>> add t = x,a[1]
>
>load r1, a[0]
>add r1, a[1]
>store t, r1

If t is a local variable, decent C compilers will usually allocate it
into a register, and no store is needed.

>> RISC shows its superiority by being 50% more instructions and 50% slower...

It's just as easy to find an example where IA-32 and AMD64 have 100%
more instructions:

x = y+z;

where x, y, and z are locals that live in registers, and y and z are
alive after this statement. On RISC:

add x<-y+z;

On IA-32/AMD64:

mov x<-y
add x<-x+z

So looking at one particular code fragment proves nothing.

As for speed, that depends on the actual CPU. Both the 386 and the
Phenom II can execute IA-32 code, yet they do it at vastly different
speeds; likewise, MIPS R2000 and Power7 are two RISC processors with
very different speeds.

>You are missing that a modern x86 chip is not a CISC chip; it is a RISC
>chip with a CISC decoder slapped on the front,

That statement does not make sense. CISC and RISC are instruction-set
styles. Modern IA-32/AMD64 chips only execute the IA-32, AMD64, and
maybe 8086 instruction sets, all of which are CISC instruction sets.

What you may be thinking of is that the microarchitectures of current
high-performance CISC and RISC CPUs are relatively similar, and quite
different from the microarchitectures of CISC and RISC CPU when RISCs
were introduced.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton(a)mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

From: Stephen Sprunk on 18 Jun 2010 01:08

On 17 Jun 2010 10:24, Anton Ertl wrote:
> Stephen Sprunk <stephen(a)sprunk.org> writes:
>> On 17 Jun 2010 00:33, Brett Davis wrote:
>>> RISC load-store verses x86 Add from memory.
>>> t = a->x + a->y;
>>>
>>> RISC
>>> load x,a[0]
>>> load y,a[1]
>>> add t = x,y
>>
>> load r1, a[0]
>> load r2, a[1]
>> add r3, r1, r2
>> store t, r3
>>
>>> x86
>>> load x,a[0]
>>> add t = x,a[1]
>>
>> load r1, a[0]
>> add r1, a[1]
>> store t, r1
>
> If t is a local variable, decent C compilers will usually allocate it
> into a register, and no store is needed.

True, but if you're going to talk about compiler optimizations, then
odds are the code is unlikely to resemble what you wrote in a HLL in the
first place except for the most trivial of programs.

The point I was trying to make is that x86 has no 3-operand add
instruction like the one he used in his example, nor does RISC allow a
memory address as the destination of an add instruction as he did in his
example. I corrected both to show a fairer comparison.

>> You are missing that a modern x86 chip is not a CISC chip; it is a RISC
>> chip with a CISC decoder slapped on the front,
>
> That statement does not make sense. CISC and RISC are instruction-set
> styles. Modern IA-32/AMD64 chips only execute the IA-32, AMD64, and
> maybe 8086 instruction sets, all of which are CISC instruction sets.
>
> What you may be thinking of is that the microarchitectures of current
> high-performance CISC and RISC CPUs are relatively similar, and quite
> different from the microarchitectures of CISC and RISC CPU when RISCs
> were introduced.

Alternately, one can look at a modern x86 chip as a core that runs a
model-specific RISC ISA hidden behind a decoder that translates x86 CISC
instructions into that ISA. That may offend purists, but IMHO it's
accurate enough for those of us who don't actually design CPUs.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

| Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: Call for benchmarks: proposals by 30 June
Next: Vaporizing dust during chip manufacturing ?