RISC load-store verses x86 Add from memory. [Computer Architecture]

Prev: Call for benchmarks: proposals by 30 June
Next: Vaporizing dust during chip manufacturing ?

From: George Neuner on 22 Jun 2010 14:03

On Tue, 22 Jun 2010 07:35:30 -0700, Andy 'Krazy' Glew
<ag-news(a)patten-glew.net> wrote:

>More commonly: load with sign extension is usually slower than
>loading without sign extension [*], since in normal
>representation it involves waiting for the MSB and smearing it
>over many upper bits. So many new instruction proposals have
>proposed doing away with signed loads.

I guess the question is: is a sign-extended load faster than code that
zeros the register, performs a short load, tests the high bit of the
value and possibly ANDs the value with (2's complement) -1?

Depending on the ISA that's 4-7 instructions vs 1.

George

From: Anton Ertl on 22 Jun 2010 13:55

MitchAlsup <MitchAlsup(a)aol.com> writes:
>On Jun 22, 7:34=A0am, n...(a)cam.ac.uk wrote:
>> From the viewpoint of a high-level language, that is insane behaviour.
>> And, for better or worse, ISO C attempts to be a high-level language.
>
>This is one of those "for the worse" results.
>C is and was supposed to be a portable assembler.

I would put only a minor part of the blame on the standard itself.

This kind of standard necessarily only standardizes a subset of the
language. And they also try to identify what's portable to funny
kinds of machines that are irrelevant for most programmers (such as
machines with sign-magnitude representation for integers), and define
a subset of the language that's portable even to that kind of machine.

The mistake is when other people, especially compiler writers (in
particular gcc maintainers), see such a standard as defining the whole
of the language, and feel free to miscompile everything outside that
subset.

Example: The behaviour on integer overflow is different for machines
with 2s-complement, 1s-complement, and sign-magnitude integers, so the
C standard does not define what happens on overflow for signed
integers. Now the gcc maintainers take this as excuse to miscompile
"x-1>x" into "0", even for programs that were only ever intended to
run on machines with wraparound on overflow, and on machines that most
naturally work that way.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton(a)mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

From: Tim McCaffrey on 22 Jun 2010 14:12

In article
<dd4f0201-da56-4baa-acfd-9798fa72e059(a)x27g2000yqb.googlegroups.com>,
MitchAlsup(a)aol.com says...
>
>On Jun 17, 12:33=A0am, Brett Davis <gg...(a)yahoo.com> wrote:
>> What am I missing.
>
>The measure of performance is not instructions, but the <lack of> time
>it takes to execute the semantic contents of the program. Given the
>program at hand, there will be more gate delays of logic to execute
>the x86 program than to execute the RISC program. x86 instructions
>take more gates to parse and decode, the x86 data path has an
>additional operand formatting multiplexer in the integer data path,
>and some additional logic after integer computations to deal with
>partial word writes in the register files. So, instead of having about
>an 80 gate delay per instruction pipeline, the x86 has about a 100
>gate delay instruction pipeline. An extra pipe stage and you can make
>this added complexity almost vanish.
>
>Where x86 wins, is that they (now Intel; AMD used to do this too) can
>throw billions at FAB development technology (i.e. making faster
>transistors and faster interconnect wire).
>
>Secondarily, once you microarchitect a fully out-of-order processor,
>it really does not mater what the native instruction set is. ReRead
>that previous sentance until you understand. The native instruction
>set no longer maters once the microarchitecture has gone fully OoO!
>

You say ISA doesn't matter, but you note several cases where extra gates were
added to handle x86ism's.

If an ISA was designed to make the underlying micro-architecture fast/easy
what would be its characteristics? I would think something like the
following:

1) Reduce code size: Increases the I-cache hit rate (which Andy noted reduces
power consumption because off-chip accesses cost).

2) Easy to decode: reduces gate count, which reduces power consumption, and
potentially removes a pipeline stage (maybe). AFAICT, every x86 has a
limitation of only being able to decode/issue one instruction if it hasn't
been executed before. It appears all x86 implementations use the I-cache to
mark instruction boundaries for parallel decoding on the following passes.

3) No PSW, remove the need to merge flag values and the consequent logic.

4) No partial register updates.

(I'm sure there are more).

In general, if you reduce the number of gates, you reduce power consumption
and allow the possibility of increasing clock speed. If you reduce the number
of pipeline stages you reduce the effects of branch-misprediction.

I'm not sure how many (ISA level) registers are useful for an OoO microarch.
Too few and you have lots of instructions just moving stuff back and forth
from memory (probably the stack), too many and it increases the code size
without really adding performance.

- Tim

From: Anton Ertl on 22 Jun 2010 14:13

MitchAlsup <MitchAlsup(a)aol.com> writes:
>On Jun 22, 6:47=A0am, Andrew Reilly <areilly...(a)bigpond.net.au> wrote:
>> No: I want the 2's compliment, fixed-point integers to wrap, just like
>> the hardware does.
>
>Note this only when using 'unsigned' arithmetic.

I don't understand what you mean here, but 2s-complement is a
representation for signed numbers, so Andrew Reilly obviously had
signed numbers in mind.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton(a)mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

From: nmm1 on 22 Jun 2010 14:31

In article <fcf2275e-29ab-46bb-8588-fd4b07f7e4fc(a)8g2000vbg.googlegroups.com>,
MitchAlsup <MitchAlsup(a)aol.com> wrote:
>On Jun 22, 7:34=A0am, n...(a)cam.ac.uk wrote:
>> From the viewpoint of a high-level language, that is insane behaviour.
>> And, for better or worse, ISO C attempts to be a high-level language.
>
>This is one of those "for the worse" results.
>C is and was supposed to be a portable assembler.

No, it wasn't. It was a semi-portable assembler - i.e. it was
syntactically portable, and semantically portable PROVIDED that
you stayed away from problematic areas (like overflow). There
have been truly portable assemblers, dating from the 1970s and
onwards.

In the late 1980s, there was massive pressure from commercial
application developers to improve C for use as a portable high
level language. You may think that it was a mistake for ISO
to accept that as a criterion, but the fact is that is what it
did. I am not going to disagree with your view - merely your
claimed facts.

Regards,
Nick Maclaren.

First | Prev | Next | Last
Pages: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Prev: Call for benchmarks: proposals by 30 June
Next: Vaporizing dust during chip manufacturing ?