|
Prev: Future architectures [was Re: Intel details future Larrabee...]
Next: Von Neumann and revisionists [Re: Future architectures [was Re: Intel details future Larrabee ...]]
From: Andreas Buschmann on 23 Aug 2008 15:34 Hello, on small 32bit architectures, like the original m68000 and the original sparc, would an accumulator of three times the integer register size = 96bit have been helpful? Similar, can it be helpful on todays small 32bit architectures in embedded and multicore designs? A somewhat vague description of what I mean: A 32bit x 32bit multiplication is problematic, as it requires a 64bit register for the result. This problem has been handled in different ways on different architectures, but all of the solutions have some problems. My idea borrows from the dsp world the accumulator, defined as follows: - either drop all multiplication, or just keep the 32bit x 32bit -> 32bit multiplication for address operations. - drop all division - add a 96bit wide accumulator A - this accumulator A is the target of a MAC: A := A + R1 * R2 - this accumulator can be used in moves and in add operations using the three partial registers A0, A1, A2 with A == A2 * 2^64 + A1^32 + A0 . - there are shift and rotate operations on A - there is an extract operation R1 := A[R2+31..R2] . - there is an operation to zero A. optionally add: - bit operations to extract and insert any number of bits. - bitcount and most significant one - other bitmangling operations - a helper function for software division / sqrt Small implementations would use microcode or statemachines, as operations on this register are multiple cycles. Bigger implementations would do two cycle implementation. I do not think that single cycle implementaion would be reasonable outside the DSP world. Targets: - Bignums - software floatingpoint instead of an FPU - bitmangling Has it been done? Does it have a name? Would it be reasonable in the extra area size? Would it be helpful at all? Regards Andreas -- /|) Andreas Buschmann buschman(at)kalahari.han.de /-|) Hannover Germany
From: Wilco Dijkstra on 24 Aug 2008 11:51 "Andreas Buschmann" <buschman(a)kalahari.han.de> wrote in message news:k62jqb.5n2(a)kalahari.han.de... > Hello, > > on small 32bit architectures, like the original m68000 and the original > sparc, would an accumulator of three times the integer register size > = 96bit have been helpful? > > Similar, can it be helpful on todays small 32bit architectures in > embedded and multicore designs? > > A somewhat vague description of what I mean: > A 32bit x 32bit multiplication is problematic, as it requires a 64bit > register for the result. This problem has been handled in different ways > on different architectures, but all of the solutions have some problems. Explicit accumulators are a bad idea in general. They always require extra instructions to move values to/from, and are hard to use effectively by a compiler. So more often than not, they actually slow things down. The MIPS HI/LO registers were considered its worst mistake by one of the original designers. Most 32-bit CPUs define multiplies that can write 2 32-bit results. Some have 32x32+32+32->64 to speed up bignum arithmetic. These instructions are more general and can be implemented depending on the performance goals. For example, some implementations use an internal accumulator to speed up repeated MACs to the same register, thus saving on register ports. Floating point emulation is already fast on 32-bit CPUs (less than 30 cycles for 32-bit IEEE float mul/add/sub on the smallest ARM). If you want it to be faster than that, you'll need to use floating point hardware. Wilco
From: MitchAlsup on 24 Aug 2008 13:16 One of the problems to be overcome is that there are no benchmarks that computer architects can use to study the large scale behavior of whether 32*32->64 is even a good idea, let alone whether 32*32+96->96 is a good idea. Notice that almost no high level language even supports the NOTION of 32*32 gives more than 32-bits !?! And this problem is endemic with the very notion that 99% of computer architecture is not research, but pure development with specific goals in mind {speed, performance, power consumption, size, schedule} and way more than 90% of this is targeting making some architecture that already exists better by some metric. Some issues to be solved: Exceptions: what happens if the 96-bit accumulator overflow/underflow? Signedness: do you implement both signed and unsigned forms for the mulltiplier and multiplicand? what about the accumulator? Memory: How does one store and reload the accumulator (so compilers have some hope of using it)? Number: What happens if you run into an algorithm that simply requires 2 of these things? (3,4,...) printf: how do you print such a thing? scanf: how do you read something like this in? citizenship: how do you make these things first class citizens of data structures? parameters: how do you make one of these things passable as an argument in HLLs? varargs: What happens if you don't know the size of the argument before one of these things arrives as an argument? thinking bigger: How do you work this into 64-bit architectures? 64*64- >128 and 64*64+128->192 ??? what kind of problem are you trying to solve that are not adequately solved by bignums? Sorry for being so down on the basic notion. Mitch
From: Bernd Paysan on 24 Aug 2008 15:46 MitchAlsup wrote: > Notice that almost no high level language even > supports the NOTION of 32*32 gives more than 32-bits !?! For 32x32->64, it's actually not that bad. C99 has long long as part of the standard, but unfortunately, it only works for 32 bit - for 64 bit, long long is 64 bit, as well, so good luck if your compiler has an 128 bit type (GCC has for x86_64, but it's "hidden", you need something like typedef int int128_t __attribute__((__mode__(TI))); to access it). Originally, GCC's long long was intended to be twice as long as long int, and the purpose was to give access to this widening multiplication instructions (and the corresponding division instructions). However, when the first 64 bit architectures arrived, the GCC maintainer thought "better support broken programs instead of keeping it as intended", and (after Anton Ertl filed a bug report) changed that definition to "twice as long as int". -- Bernd Paysan "If you want it done right, you have to do it yourself" http://www.jwdt.com/~paysan/
From: MitchAlsup on 24 Aug 2008 18:57
With all due respect to your entry in this thread, with which I have no disagreement: GCC is not equal to the HLL known as 'C'. Mitch |