RISC load-store verses x86 Add from memory. [Computer Architecture]

Prev: Call for benchmarks: proposals by 30 June
Next: Vaporizing dust during chip manufacturing ?

From: Terje Mathisen "terje.mathisen at on 26 Jun 2010 15:55

Anton Ertl wrote:
> Terje Mathisen<"terje.mathisen at tmsw.no"> writes:
>> sum = a+b;
>> possible_overflow = (a ^ b)> 0; // I.e. same sign
>
> That must be (a^b)>=0, or you get false for a=b.
>
>> sum = a+b;
>> no_overflow = ((a ^ b) | (~sum ^ a))< 0;
>
> Faster on some machines, more elegant, and probably easier to explain:
>
> no_overflow = ((a^sum)&(b^sum))>=0

This might indeed be a cycle faster, even if you have to wait for the
sum result first.

overflow_mask = ((a^sum)&(b^sum)) >> (INT_BITS-1); // 0xff or 0

The line above should take 3 cycles, in addition to the single cycle for
the sum.

Next we have to use this mask to either leave the result as is, or
convert it to either SAT_MIN (0x80...) or SAT_MAX (0x7f...) depending
upon the sign of either a or b):

saturated_result = (1 << (INT_BITS-2)) - (a >> (INT_BITS-1));
// 0x7f - -1 -> 0x80 if a was negative

This will take 2 cycles, and it can be done in parallel with the line
above (and the calculation of the sum), so there's no added latency.

sum = saturated_result & overflow_mask | sum & ~overflow_mask;

The end result is 7 cycles total, i.e. just a single cycle more than the
fancy CMOVcc asm version.

> But of course, all of this is outside the standardized subset of C.

In my not so humble opinion, this is simply broken.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

From: MitchAlsup on 27 Jun 2010 11:25

On Jun 26, 3:46 pm, Andy 'Krazy' Glew <ag-n...(a)patten-glew.net> wrote:
> On 6/26/2010 1:36 PM, Chris Gray wrote:
>
> > Andrew Reilly<areilly...(a)bigpond.net.au> writes:
>
> >> In my book, using signed int where it is not necessary/appropriate
> >> *creates* whole classes of bugs, but I'm not fanatical about it. I know
> >> that there are several worthwhile languages that simply don't have built-
> >> in types for unsigned integers, so it seems likely that you can get away
> >> without them...
>
> > I agree with the first part. When working on co-operative distributed code,
> > we often had situations where we were waiting for responses from all nodes.
> > Rather than having an array of booleans saying who we had heard from,
> > we used a count of expected responses. If that counter ever went negative,
> > we had a timing/race bug of some kind, and wanted to know about it. We put
> > lots of asserts in.
>
> Note that the code you descrbe does NOT simply work with unsigneds in C, whereas it works with signed.
>
> I.e. you cannot simply say
>
> unsigned response_count;
> response_count++;
> assert( response_count > 0 );
>
> since C defines wraparound arithmetic for unsigneds, and does not do overflow. (Again, I may be misreading, I don't
> have my copy of the C standard (where'd I put it?), and I am sure that i will be told.)
>
> you must do assert(response_count < some_max_count)
> and assume that underlow wraps to a large number.

No assumption is needed on 1s-complement or 2s-complement machines.
{Does anyone know of a machine using integer signed-magnitude that is
still existing?}

And note:
assert( uindex < MAX )
is simpler than the signed equivalent:
assert( sindex >=0 && sindex < MAX )
and checks for the same index ranges (in terms of untyped bit
patterns).

After the assert fails, one can observe (i.e. print) the index. If it
is just a little above the MAX then it can be assumed to have overrun
the buffer, and if it is massively larger than MAX it can be assumed
to have underrun the buffer.

> > One of the things that using "unsigned" gives you is the ability to tell
> > the compiler/system that it is an error to try to decrement (or have a
> > subtract result) below 0. To me, this is equivalent to overflow detection.
>
> Agreed. It wouyld be nice to have the system tell you are trying to make an unsigned number negative.

However, there is no hardware that will detect and raise an exception
on this condition automatically. One has to go and look for the carry
condition.

Mitch

From: Andy 'Krazy' Glew on 26 Jun 2010 22:07

On 6/26/2010 6:41 PM, MitchAlsup wrote:
> On Jun 26, 3:36 pm, Chris Gray<c...(a)GraySage.com> wrote:
>> One of the things that using "unsigned" gives you is the ability to tell
>> the compiler/system that it is an error to try to decrement (or have a
>> subtract result) below 0. To me, this is equivalent to overflow detection..
>
> It is BETTER than overflow detection (for detecting array access
> problems). And importantly, no integer overflow/underflow detection is
> going to catch this one--especially if people continue to use signed
> integers for variables that should never contain negative values. Here
> is where bounds checking is important and has to be placed there by
> the actual software instruction stream.
>
>> With signed values, it is very tempting for folks in the last couple of
>> generations of programmers to want to use small negative values as special
>> flag values. If those flag values make it into places where they are not
>> expected, incorrect execution can silently result. I use unsigned values
>> where at all possible.
>
> I only use signed variables when I know of a case where the variable
> must contain a negative value some point in its life.

SYNOPSIS

#include <unistd.h>
#include <sys/types.h>
#include <sys/uio.h>

int
read(int d, char *buf, int nbytes)
DESCRIPTION

Read() attempts to read nbytes of data from the object referenced by the file descriptor d into the buffer pointed to by
buf.

RETURN VALUES

If successful, the number of bytes actually read is returned. Upon reading end-of-file, zero is returned. Otherwise, a
-1 is returned and the global variable errno is set to indicate the error.

From: Andy 'Krazy' Glew on 27 Jun 2010 12:47

On 6/26/2010 12:55 PM, Terje Mathisen wrote:
> Anton Ertl wrote:
>> Terje Mathisen<"terje.mathisen at tmsw.no"> writes:
>>> sum = a+b;
>>> possible_overflow = (a ^ b)> 0; // I.e. same sign
>>
>> That must be (a^b)>=0, or you get false for a=b.
>>
>>> sum = a+b;
>>> no_overflow = ((a ^ b) | (~sum ^ a))< 0;
>>
>> Faster on some machines, more elegant, and probably easier to explain:
>>
>> no_overflow = ((a^sum)&(b^sum))>=0
>
> This might indeed be a cycle faster, even if you have to wait for the
> sum result first.
>
> overflow_mask = ((a^sum)&(b^sum)) >> (INT_BITS-1); // 0xff or 0
>
> The line above should take 3 cycles, in addition to the single cycle for
> the sum.
>
> Next we have to use this mask to either leave the result as is, or
> convert it to either SAT_MIN (0x80...) or SAT_MAX (0x7f...) depending
> upon the sign of either a or b):
>
> saturated_result = (1 << (INT_BITS-2)) - (a >> (INT_BITS-1));
> // 0x7f - -1 -> 0x80 if a was negative
>
> This will take 2 cycles, and it can be done in parallel with the line
> above (and the calculation of the sum), so there's no added latency.
>
> sum = saturated_result & overflow_mask | sum & ~overflow_mask;
>
> The end result is 7 cycles total, i.e. just a single cycle more than the
> fancy CMOVcc asm version.
>
>> But of course, all of this is outside the standardized subset of C.
>
> In my not so humble opinion, this is simply broken.

What is broken? The code snippets, or the fact that this is outside the standard subset of C?

I posited earlier that appropriate casting to unsigned and back again might make some such operations standards
compliant. I'm waiting for someone to tell me not so. Apart from this, the biggest problem I see would be typing in
macros - what type would you be casting to and from, since types include both size and signedness. In C++, appropriate
generics might make things work - you can use templates to compute signed_type_correspond_to<T> and
unsigned_type_corresponding_to<T> (and give good error messages along the way. (I think that is in Boost; should be,
if not).

Also I don't see the non-compliancies above. E.g. I don't see possibly overflowing signed aritghmetic. There's lots of
mixing of (signed) integer and logical. Is that also undefined.

From: Robert Myers on 27 Jun 2010 12:51

Terje Mathisen wrote:
> Anton Ertl wrote:

>
>> But of course, all of this is outside the standardized subset of C.
>
> In my not so humble opinion, this is simply broken.
>

What this thread has demonstrated to me, yet again, is what a fatally
flawed enterprise c is, and yet, I know that, without c or something
like it, we would never have anything like Linux.

A defensive maneuver for someone like me is to read threads like this
just to get a clue as to all the c-goop that has leaked out of systems
programming and into applications software. The virtue of reading these
threads in a hardware forum (as opposed to a software forum) is that the
discussion is sometimes about real issues, rather than theology.

Robert.

First | Prev | Next | Last
Pages: 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Prev: Call for benchmarks: proposals by 30 June
Next: Vaporizing dust during chip manufacturing ?