RISC load-store verses x86 Add from memory. [Computer Architecture]

Prev: Call for benchmarks: proposals by 30 June
Next: Vaporizing dust during chip manufacturing ?

From: Anton Ertl on 22 Jun 2010 14:16

nmm1(a)cam.ac.uk writes:
>Yes, OF COURSE, you see problems on current systems only when you
>enable significant optimisation - but that's been true of most such
>constraints for at least 50 years - yes, 50. The reason that Java
>Grande flopped, big time, is precisely because Java forbids any
>optimisations that might change the results. Do you REALLY want
>a C with the performance of Java?

There are many reasons why Java performs as it does; requiring defined
results is only one of them (and I think, a rather minor one), but a
more important reason is that it is a higher-level language.

This reason would not apply to a lower-level C, on the contrary:
Programmers could use low-level performance tricks to achieve higher
speed.

E.g., if I write "x>x+1" to check if x is the smallest signed values,
and gcc did not miscompile it into "0", it would generate 2
instructions (one addition, on comparison) on typical architectures.
But because gcc miscompiles this code (including, in some gcc
versions, code that does an unsigned addition followed by a signed
comparison), we have to write something like (x==MININT), which is
admittedly easier to understand, but also produces bigger, and on some
machines, slower code (big literal numbers take quite a bit of space
and time). In theory the C compiler could optimize the latter code to
use the former trick, but in practice it doesn't.

>This is exactly like the 1970s, when many people used to say that
>the compilers should preserve the details of the IBM System/360
>arithmetic, because almost all systems were like that and people
>relied on it.

Byte addressed. Yup.
8-bit bytes. Yup.
2s-complement signed integers. Yup.

Yes, they were right, current general-purpose systems are like that.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton(a)mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

From: Terje Mathisen "terje.mathisen at on 22 Jun 2010 16:19

George Neuner wrote:
> On Tue, 22 Jun 2010 12:06:02 +0200, Terje Mathisen<"terje.mathisen at
> tmsw.no"> wrote:
>
>> Watcom allowed you to define pretty much any operation yourself, in the
>> form of inline macro operations where you specified to the compiler
>> where the inputs needed to be, where the result would end up and exactly
>> which registers/parts of memory would be modified.
>
> Does Open-Watcom still have that ability?

Probably, but I haven't even looked at these compilers since about 1995.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

From: jacko on 22 Jun 2010 16:29

On Jun 22, 7:16 pm, an...(a)mips.complang.tuwien.ac.at (Anton Ertl)
wrote:
> n...(a)cam.ac.uk writes:
> >Yes, OF COURSE, you see problems on current systems only when you
> >enable significant optimisation - but that's been true of most such
> >constraints for at least 50 years - yes, 50. The reason that Java
> >Grande flopped, big time, is precisely because Java forbids any
> >optimisations that might change the results. Do you REALLY want
> >a C with the performance of Java?
>
> There are many reasons why Java performs as it does; requiring defined
> results is only one of them (and I think, a rather minor one), but a
> more important reason is that it is a higher-level language.

I've seen the word undefined in the Java spec. My main concern at
present is conversion of String to byte[]. I think it should be UTF-8,
but who knows?

Cheers Jacko

From: MitchAlsup on 22 Jun 2010 17:18

On Jun 22, 1:12 pm, timcaff...(a)aol.com (Tim McCaffrey) wrote:
> In article
> <dd4f0201-da56-4baa-acfd-9798fa72e...(a)x27g2000yqb.googlegroups.com>,
> MitchAl...(a)aol.com says...
> >The measure of performance is not instructions, but the <lack of> time
> >it takes to execute the semantic contents of the program. Given the
> >program at hand, there will be more gate delays of logic to execute
> >the x86 program than to execute the RISC program. x86 instructions
> >take more gates to parse and decode, the x86 data path has an
> >additional operand formatting multiplexer in the integer data path,
> >and some additional logic after integer computations to deal with
> >partial word writes in the register files. So, instead of having about
> >an 80 gate delay per instruction pipeline, the x86 has about a 100
> >gate delay instruction pipeline. An extra pipe stage and you can make
> >this added complexity almost vanish.
>
> >Where x86 wins, is that they (now Intel; AMD used to do this too) can
> >throw billions at FAB development technology (i.e. making faster
> >transistors and faster interconnect wire).
>
> >Secondarily, once you microarchitect a fully out-of-order processor,
> >it really does not mater what the native instruction set is. ReRead
> >that previous sentance until you understand. The native instruction
> >set no longer maters once the microarchitecture has gone fully OoO!
>
> You say ISA doesn't matter, but you note several cases where extra gates were
> added to handle x86ism's.

You are assuming that there is an independent variable between the few
extra gates in an x86 and the availability of an Intel quality FAB to
produce the chips. There is not. Intel can afford another gate or two
in the logic delay of a pipe stage (or of all pipe stages). They can
afford this because they have access to a FABs that can produce the
fastest gates on the planet and can produce half a million chips per
day.

So, x86 has an operand multiplexer in the logic cycle of the integer
units. Big deal, 2 gate delays out of 16-18 total computational gates
(plus flop jitter and skew). The thing that more than completely
compensates for this is the FAB can achieve 2X the frequency with
these 2 extra gates of delay than any of the non-big-guys FABs can
produce without those 2 little gates of added delay. In the load
aligner, you have another 2 gates to deal with misalignedness. None of
these gate mater unless the compitition has access to similar FAB
technology.

As to the complexity of inistruction decode. It takes more area, and
burns more power, but overall, it cost on the order of 1% at the
architectural figure of merit level compared to a RISC-like
instruction set. Another "big deal"? I think not. (Except in the power
end of things. I have solution to this end if anyone cares.)

Mitch

From: MitchAlsup on 22 Jun 2010 17:21

On Jun 22, 1:13 pm, an...(a)mips.complang.tuwien.ac.at (Anton Ertl)
wrote:
> MitchAlsup <MitchAl...(a)aol.com> writes:
> >On Jun 22, 6:47=A0am, Andrew Reilly <areilly...(a)bigpond.net.au> wrote:
> >> No: I want the 2's compliment, fixed-point integers to wrap, just like
> >> the hardware does.
>
> >Note this only when using 'unsigned' arithmetic.
>
> I don't understand what you mean here, but 2s-complement is a
> representation for signed numbers, so Andrew Reilly obviously had
> signed numbers in mind.

My point was that wrap around IS the overflow/underflow case of signed
numbers that started this thread. Signed numbers should not be counted
on to wrap around (they might raise exceptions). Unsigned numbers
should be able to be counted upon to wrap around (and not raise
exceptions). Of course the ALU will naturally wrap the resulting
numbers.

Mitch

First | Prev | Next | Last
Pages: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Prev: Call for benchmarks: proposals by 30 June
Next: Vaporizing dust during chip manufacturing ?