From: Andreas Buschmann on
Hello,

on small 32bit architectures, like the original m68000 and the original
sparc, would an accumulator of three times the integer register size
= 96bit have been helpful?

Similar, can it be helpful on todays small 32bit architectures in
embedded and multicore designs?

A somewhat vague description of what I mean:
A 32bit x 32bit multiplication is problematic, as it requires a 64bit
register for the result. This problem has been handled in different ways
on different architectures, but all of the solutions have some problems.

My idea borrows from the dsp world the accumulator, defined as follows:
- either drop all multiplication, or just keep the 32bit x 32bit -> 32bit
multiplication for address operations.
- drop all division
- add a 96bit wide accumulator A
- this accumulator A is the target of a MAC: A := A + R1 * R2
- this accumulator can be used in moves and in add operations using the
three partial registers A0, A1, A2 with A == A2 * 2^64 + A1^32 + A0 .
- there are shift and rotate operations on A
- there is an extract operation R1 := A[R2+31..R2] .
- there is an operation to zero A.

optionally add:
- bit operations to extract and insert any number of bits.
- bitcount and most significant one
- other bitmangling operations
- a helper function for software division / sqrt


Small implementations would use microcode or statemachines, as operations
on this register are multiple cycles.

Bigger implementations would do two cycle implementation.
I do not think that single cycle implementaion would be reasonable outside
the DSP world.

Targets:
- Bignums
- software floatingpoint instead of an FPU
- bitmangling

Has it been done?
Does it have a name?

Would it be reasonable in the extra area size?
Would it be helpful at all?


Regards
Andreas
--
/|) Andreas Buschmann buschman(at)kalahari.han.de
/-|) Hannover
Germany
From: Wilco Dijkstra on

"Andreas Buschmann" <buschman(a)kalahari.han.de> wrote in message news:k62jqb.5n2(a)kalahari.han.de...
> Hello,
>
> on small 32bit architectures, like the original m68000 and the original
> sparc, would an accumulator of three times the integer register size
> = 96bit have been helpful?
>
> Similar, can it be helpful on todays small 32bit architectures in
> embedded and multicore designs?
>
> A somewhat vague description of what I mean:
> A 32bit x 32bit multiplication is problematic, as it requires a 64bit
> register for the result. This problem has been handled in different ways
> on different architectures, but all of the solutions have some problems.

Explicit accumulators are a bad idea in general. They always require
extra instructions to move values to/from, and are hard to use effectively
by a compiler. So more often than not, they actually slow things down.
The MIPS HI/LO registers were considered its worst mistake by one of
the original designers.

Most 32-bit CPUs define multiplies that can write 2 32-bit results. Some
have 32x32+32+32->64 to speed up bignum arithmetic. These instructions
are more general and can be implemented depending on the performance
goals. For example, some implementations use an internal accumulator to
speed up repeated MACs to the same register, thus saving on register
ports.

Floating point emulation is already fast on 32-bit CPUs (less than 30 cycles
for 32-bit IEEE float mul/add/sub on the smallest ARM). If you want it to be
faster than that, you'll need to use floating point hardware.

Wilco


From: MitchAlsup on
One of the problems to be overcome is that there are no benchmarks
that computer architects can use to study the large scale behavior of
whether 32*32->64 is even a good idea, let alone whether 32*32+96->96
is a good idea. Notice that almost no high level language even
supports the NOTION of 32*32 gives more than 32-bits !?!

And this problem is endemic with the very notion that 99% of computer
architecture is not research, but pure development with specific goals
in mind {speed, performance, power consumption, size, schedule} and
way more than 90% of this is targeting making some architecture that
already exists better by some metric.

Some issues to be solved:
Exceptions: what happens if the 96-bit accumulator overflow/underflow?
Signedness: do you implement both signed and unsigned forms for the
mulltiplier and multiplicand? what about the accumulator?
Memory: How does one store and reload the accumulator (so compilers
have some hope of using it)?
Number: What happens if you run into an algorithm that simply requires
2 of these things? (3,4,...)
printf: how do you print such a thing?
scanf: how do you read something like this in?
citizenship: how do you make these things first class citizens of data
structures?
parameters: how do you make one of these things passable as an
argument in HLLs?
varargs: What happens if you don't know the size of the argument
before one of these things arrives as an argument?
thinking bigger: How do you work this into 64-bit architectures? 64*64-
>128 and 64*64+128->192 ???
what kind of problem are you trying to solve that are not adequately
solved by bignums?

Sorry for being so down on the basic notion.

Mitch
From: Bernd Paysan on
MitchAlsup wrote:

> Notice that almost no high level language even
> supports the NOTION of 32*32 gives more than 32-bits !?!

For 32x32->64, it's actually not that bad. C99 has long long as part of the
standard, but unfortunately, it only works for 32 bit - for 64 bit, long
long is 64 bit, as well, so good luck if your compiler has an 128 bit type
(GCC has for x86_64, but it's "hidden", you need something like

typedef int int128_t __attribute__((__mode__(TI)));

to access it). Originally, GCC's long long was intended to be twice as long
as long int, and the purpose was to give access to this widening
multiplication instructions (and the corresponding division instructions).
However, when the first 64 bit architectures arrived, the GCC maintainer
thought "better support broken programs instead of keeping it as intended",
and (after Anton Ertl filed a bug report) changed that definition to "twice
as long as int".

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
From: MitchAlsup on
With all due respect to your entry in this thread, with which I have
no disagreement:

GCC is not equal to the HLL known as 'C'.

Mitch