RISC load-store verses x86 Add from memory. [Computer Architecture]

Prev: Call for benchmarks: proposals by 30 June
Next: Vaporizing dust during chip manufacturing ?

From: Andrew Reilly on 28 Jun 2010 08:11

Hi Jean-Marc,

On Mon, 28 Jun 2010 11:10:16 +0200, Jean-Marc Bourguet wrote:

> I think this is a case of different people wanting different things for
> C. The end result in gcc (options -fwrapv/-ftrapv allowing to ask for
> wrap around/trapping instead of letting it uses all the lattitude open
> by the standard) would be good if only -ftrapv worked reliably... It is
> more difficult to test if -fwrapv work -- on the one hand it is probably
> more tested than -ftrapv (it is implicit for Java according to the
> documentation), on the other hand the fact that -ftrapv doesn't isn't a
> confidence builder.

Thank-you, thank-you! I can't imagine what the optimiser is doing to
code without these flags, by default, but as long as you can make it
behave "sanely" (by my standards!) then I'm happy indeed. I apologise
for not having found this switch in the manuals myself, before now.

I guess "fast and mostly working" is good enough for most people/
applications...

Cheers,

--
Andrew

From: EricP on 29 Jun 2010 16:24

Andy 'Krazy' Glew wrote:
> On 6/26/2010 10:40 AM, EricP wrote:
>> Andy 'Krazy' Glew wrote:
>>>
>>> Some examples:
>>>
>>
>> You are really doing signed 9 bit arithmetic there,
>> then casting the s9 result back to either a u8 or s8 type.
>> Whether there is an overflow or not depends on the result
>> type and the value.
>
> Exactly.
>
> And if I was doing the same in 16, 32, 64 bit arithmetic, I would be
> essentially doing the intermediate calculations in 17, 33, 65 bits, and
> casting back.
>
> And, since support for 9, 17, 33, 65 bits is not ubiquitous, and since
> doubling the width to 8, 32, 64 bits, which is ubiquitous, tends to cost
> a lot in performance (let alone the fact that extending precision from
> 64 bits, whether to 65 bits or 128 bits, is not ubiquitous), I am
> looking for expressions that allow detection of overflow based on modulo
> arithmetic.
>
> Although I tend to use a C-like notation to express this, I am NOT
> thinking in terms of C.
>
> Or, of you will - imagine that everything has been cast to the
> appropriate unsigned width. Since C defines unsigned as modulo
> arithmetic, that should not be subject to compiler transformations that
> make signed overflows undefined.

So we only have to calculate a single ninth sign bit value manually.
If you want the result type unsigned u_sum = unsigned u_a + signed s_b
then the result overflows if the sign is set at the end of the calculation.

u_a always zero extends so the initial value of the sign bit
is the sign of s_b, so sign = (s_b < 0);
The sign will toggle if there is a carry out of the sum, so
carry = (u_a + s_b) < u_a;

If s_b < 0 then it will still be set if there is no carry, or
if s_b >= 0 then it will be set if there is a carry,
Putting it together gives

overflow = (s_b < 0) != ((u_a + s_b) < u_a);

or alternatively

overflow = ((s_b < 0) ^ ((u_a + s_b) < u_a)) != 0;

But ((u_a + s_b) < u_a) is the same as (((u_a + s_b) - u_a) < 0)
so just using the sign of the result we get

u_sum = u_a + s_b;
overflow = (s_b ^ (u_sum - u_a)) < 0;

(unless I've made a boo boo someplace)
Eric

From: EricP on 30 Jun 2010 12:03

Tim McCaffrey wrote:
>
> It is the same as the 32 bit version.
>
> What I found annoying when I was doing code generation for x64, is that 99% of
> the documentation is wrong about where the REX byte goes. I had to
> disassemble the code via Eclipse to figure out what was correct. (Then I had
> to argue with a co-worker about it...)

Both Intel and AMD documentation says REX comes after
any optional legacy prefixes and before the opcode.
Is that not correct?

Eric

From: Tim McCaffrey on 30 Jun 2010 15:21

In article <8WJWn.3131$cO.321(a)newsfe09.iad>,
ThatWouldBeTelling(a)thevillage.com says...
>
>Tim McCaffrey wrote:
>>
>> It is the same as the 32 bit version.
>>
>> What I found annoying when I was doing code generation
>> for x64, is that 99% of
>> the documentation is wrong about where the REX byte
>> goes. I had to
>> disassemble the code via Eclipse to figure out what was
>> correct. (Then I had
>> to argue with a co-worker about it...)
>
>Both Intel and AMD documentation says REX comes after
>any optional legacy prefixes and before the opcode.
>Is that not correct?
>
>Eric
>
>
My problem was that it says (somewhere, can't find it
right now) that 0x66, 0xF2, & 0xF2 should be considered
part of the opcode for those instructions that use it, not
a prefix. IOW, for MOV AX,BX 0x66 is a prefix, but for
MOVQ mem64,xmm0 0x66 is part of the opcode.

Well, that isn't the way it works. If you wrote your code
generator to just emit 0x66 0x0F 0xD6 <mod r/m bytes>, and
you just want to prefix it with the REX byte when you use
xmm8..15, too bad. You have to stick the REX byte between
0x66 and 0x0F.

Again, I can't find it right now, but I remember examples
in
both the Intel & AMD documentation that showed (mostly) the
wrong way to add the REX prefix, and once where they had it
correct.

Anyway, GCC & Linux figured it out before I had to, so the
disassembly showed me very quickly when I had it wrong.

- Tim

From: Tim McCaffrey on 1 Jul 2010 15:57

In article
<30028ecd-f025-4c05-bd8a-93c99e00a8a8(a)a30g2000yqn.googlegroups.com>,
MitchAlsup(a)aol.com says...

>No assumption is needed on 1s-complement or 2s-complement machines.
>{Does anyone know of a machine using integer signed-magnitude that is
>still existing?}
>

Why, yes, the Unisys Clearpath Libra systems.
(aka. MCP systems).

And they do all this stuff.

Bounds checking.
Integer overflow detection.

and (x+c) < x will never work on an MCP system. Except in C, because the
compiler emulates 2s-complement.

- Tim

First | Prev | Next | Last
Pages: 21 22 23 24 25 26 27 28 29 30 31 32
Prev: Call for benchmarks: proposals by 30 June
Next: Vaporizing dust during chip manufacturing ?