RISC load-store verses x86 Add from memory. [Computer Architecture]

Prev: Call for benchmarks: proposals by 30 June
Next: Vaporizing dust during chip manufacturing ?

From: Andy 'Krazy' Glew on 20 Jun 2010 01:08

On 6/19/2010 12:50 PM, Niels J�rgen Kruse wrote:
> Andy 'Krazy' Glew<ag-news(a)patten-glew.net> wrote:
>
>> Unfortunately, there are several different types of integer overflow.
>> E.g. the overflow conditions for each of the following are different
>>
>> unsigned + unsigned
>> signed + signed
>> signed + unsigned
>
> Are you talking about D or some other language that is not C?

D is a language that espouses thrwing exceptions for errors.

But, I am not talking languages. I do not mean "unsigned" and "signed" as C datatypes. I a talking about generic
arithmetic. Languages should embed rules that model arithmetic, not vice versa.

What I mean, in longhand, is something like

"When two N bit numbers, for any N, are both interpreted as unsigned numbers, then overflow is` reported if there is any
carry out of the most significant bit (for add). Or, if the calculation is performed in infinite (unsigned) precision,
if the infinite precision result changes in value when truncated to N bits."

Similarly for the othedrs.

From: nmm1 on 20 Jun 2010 05:39

In article <885boqFa10U1(a)mid.individual.net>,
Andrew Reilly <areilly---(a)bigpond.net.au> wrote:
>On Sat, 19 Jun 2010 21:29:17 +0100, nmm1 wrote:
>> In article <1jkcsdg.28nffa169emgN%nospam(a)ab-katrinedal.dk>,
>> =?ISO-8859-1?Q?Niels_J=F8rgen_Kruse?= <nospam(a)ab-katrinedal.dk> wrote:
>>>Andy 'Krazy' Glew <ag-news(a)patten-glew.net> wrote:
>>>
>>>> Unfortunately, there are several different types of integer overflow.
>>>> E.g. the overflow conditions for each of the following are different
>>>>
>>>> unsigned + unsigned
>>>> signed + signed
>>>> signed + unsigned
>>>
>>>Are you talking about D or some other language that is not C?
>>
>> He doesn't need to. They're all different in C.
>
>No, the last one doesn't exist (as a different kind of add) in C: it's an
>implicit type conversion (to unsigned, usually, because apparently it's
>more important to preserve the positive range than the negative) followed
>by an unsigned add. ...

It's still different :-) Your description of what happens assumes
thet the integers have the same conversion rank - the situation is
a lot more complicated when they don't (see C99 6.3.1.8).

>The reference to C is interesting, because I've recently had the
>experience of encountering a C compiler that actively thwarted the usual
>idiom for signed overflow detection. That is something like:
>
>where x and y are int:
>if (y >= 0) { if (x + y < x) signed_overflow(); } else { if (x + y > x)
>signed_underflow(); }
>
>The compiler in question (can't remember whether it was a recent gcc or
>one of the ARM compilers) came up with this beauty:
>
>warning: assuming signed overflow does not occur when assuming that (X +
>c) < X is always false
>
>Apparently wording in the C standard lets them do that.

Well, yes, of course. So does Fortran and most comparable languages.
On a better implementation, you would get a run-time error, of course.

> The "best"
>alternative I've found so far is to use extra arithmetic precision.
>Quite a small value of "best". When even addition requires the use of in-
>line assembly to produce useful results, the language is dead.

Eh? That code's erroneous - wrong, buggy, defective. You shouldn't
write code that overflows in C or Fortran. End of story. Sorry,
but that is the situation, both de jure and de facto. Even ignoring
that, it relies on C90 semantics, and C99 is incompatible - you need
to do the following to generate the code that you expect:

if (y >= 0) { if ((<integer type>)(x + y) < x) signed_overflow(); }
else { if ((<integer type>)(x + y) > x) signed_underflow(); }

Also, that's a trivial example. I give a slightly more complicated
one in my arithmetic course, and point out that undetected overflow
can cause ANY effect once you enable serious levels of optimisation.

Sorry, but ....

Regards,
Nick Maclaren.

From: nedbrek on 20 Jun 2010 07:34

Hello all,

"Brett Davis" <ggtgp(a)yahoo.com> wrote in message
news:ggtgp-E06A37.15294419062010(a)news.isp.giganews.com...
> In article <hvi9bp$1v7$1(a)news.eternal-september.org>,
> "nedbrek" <nedbrek(a)yahoo.com> wrote:
>> The advantage CISC has is that the uop sequence looks like:
>> ld tmp = [r2]
>> add r1 += tmp
>>
>> Since tmp is not an architected register, it does not have to be
>> preserved
>> for an interrupt, or seen past the use in add (it is known dead). Thus,
>> it
>> can exist strictly in the bypass network (it is not allocated a rename
>> register, it is not visible to later instructions [does not participate
>> in
>> renaming], and has no architected effects at retirement).
>>
>> The RISC sequence will always be (ld r3 = [r2]; add r1 += r3). r3 is
>> live
>> out, and must be architecturally visible. You can smash ops together,
>> giving you r3,r1 = load-op [r2] + r1
>>
>> You can't say just "need an extra write port" unless you have a simple 5
>> stage pipeline. In a modern machine, this means extra decode bits (in
>> the
>> scheduler and ROB), extra RAT ports, extra complexity come retirement
>> time
>> (do you allow every instruction to update two entries in the retirement
>> register table?)
>
> I forgot that on both PowerPC and x86 the Load Unit has its own register
> write port, so you do have an extra write port, and I assume that on x86
> the Load Unit is involved in the Add from Memory, if only for the bypass
> data.

Is there an arch where load does not produce a reg result?

> So the real savings is having the Add instruction issue the [r2] register
> read instead of a r3 register read, saving an issue slot and cycle?

- There is a saving in your internal instruction format (uop encoding):
the X86 fused uop has {op, dst, src} while the fused RISC has {op, dst1,
dst2, src}

- This impacts the renamer, the X86 has one dest per uop, while the RISC has
two. Every dest in rename equates to one write port (assuming a table based
renamer, like the RAT in P6).

- It also impacts retirement. At retirement time, you must update the
architected registers (usual implementations are an architected register
file [P6] or a architected rename map [P4]). Either way, that is another
write port...

Ned

From: nedbrek on 20 Jun 2010 07:36

Hello all,

"Andy 'Krazy' Glew" <ag-news(a)patten-glew.net> wrote in message
news:4C1DA045.1020507(a)patten-glew.net...
> On 6/19/2010 5:23 AM, nedbrek wrote:
>
>> The advantage CISC has is that the uop sequence looks like:
>> ld tmp = [r2]
>> add r1 += tmp
>>
>> Since tmp is not an architected register, it does not have to be
>> preserved
>> for an interrupt, or seen past the use in add (it is known dead). Thus,
>> it
>> can exist strictly in the bypass network (it is not allocated a rename
>> register, it is not visible to later instructions [does not participate
>> in
>> renaming], and has no architected effects at retirement).
>
> Anecdote: at Intel Mike Haertel and I thought that AMD K7 was taking
> advantage
> of this, to get an effectively larger instruction window. Intel was not.
> When we moved to AMD, we learned that they were not. Perhaps now they
> are.

A lot of innovation at Intel is driven by paranoia about what AMD might be
doing. That is a good thing.

If only the Itanium guys had been more paranoid. They predicted IBM would
ship parts with lower performance over time!

Ned

From: Andrew Reilly on 20 Jun 2010 09:12

On Sun, 20 Jun 2010 06:34:32 -0500, nedbrek wrote:

> Is there an arch where load does not produce a reg result?

"Is" is perhaps too strong a qualifier, but certainly Vax, 32x32, PDP-11
and others of that ilk had arbitrary addressing modes available for all
operands, only one flavor of which was "register". So memory sources and
memory destinations were possible. Not that you would necessarily call
those "load" instructions, I suppose...

Cheers,

--
Andrew

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Prev: Call for benchmarks: proposals by 30 June
Next: Vaporizing dust during chip manufacturing ?