RISC load-store verses x86 Add from memory. [Computer Architecture]

Prev: Call for benchmarks: proposals by 30 June
Next: Vaporizing dust during chip manufacturing ?

From: Andrew Reilly on 26 Jun 2010 01:39

On Fri, 25 Jun 2010 17:57:48 -0700, Andy 'Krazy' Glew wrote:

> Actually, in the programming circles (newsgroups and wiki pages) that I
> lurk on - the sorts of places where people debate Scott Meyers
> "Effective C++: 55 Specific Ways to Improve Your Programs and Designs"
> the trend is to say "Never use unsigned - always use signed."
>
> Although I can't remember which side of this argument Scott was on.
>
> Overall, unsigned just seems to be a way to save a bit. Now that we are
> not short of bits, just make all integers signed. It eliminates a whole
> class of bugs.

I used to just use "int"---it seemed to be what the language designers
intended---but ever since (I think) Terje advocated it, here, years ago,
I've been using unsigned int wherever it seemed appropriate, and I
believe that my code has improved significantly as a result. Nothing to
do with saving a bit; more to do with thinking precisely and deliberately
about what is required. Also, two big plusses: unsigned behaviour is
actually fairly well defined by the standard (so no need for discussions
like this one), and range checking generally only needs to be single
ended, which (in my opinion) makes it more likely to happen.

In my book, using signed int where it is not necessary/appropriate
*creates* whole classes of bugs, but I'm not fanatical about it. I know
that there are several worthwhile languages that simply don't have built-
in types for unsigned integers, so it seems likely that you can get away
without them...

Cheers,

--
Andrew

From: jacko on 26 Jun 2010 06:12

Placing the buffer is the first thing to decide.

On the heap is bad.
In it's own protected page is good, but wasteful.
As a global or static context array, is like halfway, as your not
undermining other task security via the heap.

Other things that could work are AND mask modulo effective addresses,
which would need an aligned type.

On arithmetic overflow and underflow, a set vector for each register,
or register group would work.

Cheers Jacko

From: Andy 'Krazy' Glew on 26 Jun 2010 11:58

On 6/25/2010 5:30 PM, Andy 'Krazy' Glew wrote:
> On 6/25/2010 12:44 AM, Terje Mathisen wrote:
>> Andy 'Krazy' Glew wrote:
>>> Overflow ::
>>>
>>> a >= 0 && b >= 0 ::
>>>
>>> a+b >= a ==> no overflow
>>> a+b < a ==> overflow
>>>
>>> a < 0 && b >= 0 ::
>>> a >= 0 && b < 0 ::
>>> no overflow
>>> (although you can have underflow,
>>> negative overflow, which is handled
>>> similarly)
>>
>> No, you cannot get that: Opposite signs are totally safe.
>
> Sorry, nope. You're thinking in terms of C, specifically C signed.
>
> Think about unsigned + signed.
>
> Consider
>
> uint8_t image_pixel = 1;
> signed int8_t delta = -4;
> image_pixel += delta;
>
> It took me while to realize this in MMX. Images are usually unsigned,
> and are often 8 bits. But image differences are signed. They want to be
> extended precision, for example nine bits rather than eight bits, but
> for the usual reasons we may want to do saturating arithmetic and stay
> in 8 bits (like, if we widen to 16 bits we cut performance in half).

Some examples:

(unsigned) 1111_1110 + (signed) 0000_0001 = (unsigned) 1111_1111 (no overflow/saturation)

(unsigned) 1111_1111 + (signed) 0000_0001 =
= 0000_0000 (normal result, overflowing)
= (unsigned) 1111_1111 (overflow/saturation)

(unsigned) 1111_1111 + (unsigned) 0000_0001 =
= 0000_0000 (normal result, overflowing)
= (unsigned) 1111_1111 (overflow/saturation)

(signed) 1111_1111 + (signed) 0000_0001 =
= (signed)0000_0000 (normal result. no saturation)

(unsigned) 1111_1111 + (signed) 1111_1111 =
= (unsigned)1111_1110 (normal result, no overflow/saturation)

(unsigned) 1111_1111 + (unsigned) 1111_1111 =
= 1111_1110 (normal result, overflowing)
= (unsigned) 1111_1111 (overflow/saturation)

(signed) 1111_1111 + (signed) 1111_1111 =
= (signed)1111_1110 (normal result. no saturation)

(unsigned) 0111_1111 + (signed) 0000_0001 =
= (unsigned)1000_0000 (normal result, no overflow/saturation)

(unsigned) 0111_1111 + (unsigned) 0000_0001 =
= (unsigned)1000_0000 (normal result, no saturation)

(signed) 0111_1111 + (signed) 0000_0001 =
= 1000_0000 (normal result. no saturation)
= (signed) 0111_1111 (overflow/saturation)

Hmmm.... this will look better on the wiki as a table. I think that I will also make it 8-bit + 4-bit.

From: Andy 'Krazy' Glew on 26 Jun 2010 12:17

On 6/25/2010 5:51 PM, mac wrote:
>>> Particularly if your code wants to be able to do different fix-ups
>>> for
>>> overflows from different-sized arguments...
>>
>> Ouch indeed:
>>
>> Reach back and disassemble the ADD/SUB/whatever instruction that
>> generated the INTO, figure out the operand size and target register,
>> and then fix it all: Neither easy nor fast.
>
> Which points out another cost of CISCy instruction encodings. It's not
> just the processor thaf has to parse them. It's any binutil.
> Which might also include native code secure sandboxing.

I have run into this problem many times. Indeed, it is often used as (a) an argument not to add a new instruction, or
(b) a reason to complicate the interrupt or trap or exception or event handling sequence - just to make this easier.
E.g. some of the uglier aspects of SMM and VMX arise because of this.

I sometimes call this the "debugger and disassembler and emulator barrier to new instructions", since debuggers and
disassemblers and emulators are a big example.

Note that it is not just a problem for complex instruction sets. It is also a problem for simple instruction sets - if
thedy have added new instructions that differ in format from old, and sometimes just if they have added a new
instruction at all.

The worst thing that can happen is if the disassembling code does an incomplete disassembly. E.g. if it treats some
undefined fields as don't cares.

I think that we may need an instruction, or an API implemented in something like PALcode, to decode other instructions.
Mainly, we need to distribute the knowledge about how to decode instructions in an executable manner, under control of
the CPU vendor.

Having the CPU vendor write the binutils implementation for his machine helps. But, then the CPU vendor has to do this
for every different OS and system that has a different library.

Shipping it as library code separate from the CPU works - up until the point when you want to run old code on a new
machine, and handle the new instruction. Methinks it's not so much microcode space, and it can leverage your existing
hardware.

Format - both of an API, and/or an instruction:

Decode_Instruction
Pointer to Context
- probably a memory buffer describing all of the various mode bit settings
- 32 bit, 64 bit, virtual, ...
- since often the emulated mode is not the current mode
Return value
- possibly a pointer to a memory buffer, although may fit in a register
Instruction to Decode
I was originally going to say just a single pair of
(instruction_address,instruction_bytes,length)
but then I remembered
a) boundary conditions like wrapping around the end of a segment or 32 bits
b) some GPU and VLIW instruction sets that are not really contiguous
- they have bits of the instruction in two totally different places.

The Return value would probably be a set of predicates such as
is_memory_reference
is_control_flow
...
as well as stuff like
number of operands
number of register operands
register operand #1, type ...
...
and, for the lazy among you
evaluated virtual address
...

The Pin interface, http://www.pintool.org/, may be a good start.

Placing this on

https://semipublic.comp-arch.net/wiki/API_or_instruction_to_decode_an_instruction

From: EricP on 26 Jun 2010 13:40

Andy 'Krazy' Glew wrote:
>
> Some examples:
>

You are really doing signed 9 bit arithmetic there,
then casting the s9 result back to either a u8 or s8 type.
Whether there is an overflow or not depends on the result
type and the value.

e.g.

0_1010_0101 => u8 = 1010_0101 (ok)
0_1010_0101 => s8 = 1010_0101 (signed overflow, sign changed)

(Bits are named b8...b0).
The unsigned cast overflows if bit_8 is 1,
whereas the signed cast overflows if bit_8 != bit_7.

u8_result = u8_a + s8_b;
bit_8 = (u8_result < u8_a);
if (result_type == s8)
bit_7 = u8_result < 0;
overflow = bit_8 != bit_7;
else
overflow = bit_8;
endif

Eric

First | Prev | Next | Last
Pages: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Prev: Call for benchmarks: proposals by 30 June
Next: Vaporizing dust during chip manufacturing ?