From: "Andy "Krazy" Glew" on
The two hardware datastructures supporting out of order execution:

Reservation stations.

And, less beautifully, the register renaming map.

But then I am biased.

--

Really, I do think that the reservation stations are beautiful. Even the naive CAM implementation. Especially since
there are more efficient implementations that are logically equivalent.

I am also pretty high on bit matrix schedulers.

From: "Andy "Krazy" Glew" on
On 3/29/2010 7:39 PM, MitchAlsup wrote:
> The most memorable hardware structure is the vector indirect
> addressing mode.
>
> Mitch


Aagh! No! Although work I did on that veered towards reservation stations, which I like.

Nvidia has shown that vector indirect is unnecessary on a SIMT.

Although^2, it turns out that very similar hardware is needed for SIMT scalar indirect.
From: "Andy "Krazy" Glew" on
On 3/30/2010 9:15 PM, glen herrmannsfeldt wrote:
> In comp.arch.fpga "Andy \"Krazy\" Glew"<ag-news(a)patten-glew.net> wrote:
>> The two hardware datastructures supporting out of order execution:
>
>> Reservation stations.
>
>> And, less beautifully, the register renaming map.
>
> Both from the IBM 360/91, as far as I know.
>
> S/360 has only four floating point registers, so register
> renaming was pretty important for out-of-order execution.
>
> OK, how about imprecise interrupts?
>
> -- glen


I never really knew how the 360/91 did register renaming. I don't think it used a RAM style map. I think it used CAMs.

I actually asked Tomasulo this, but he never really answered the question.
From: "Andy "Krazy" Glew" on
> In article<houi8s$rdm$1(a)naig.caltech.edu>,
>> OK, how about imprecise interrupts?

Not a good idea.
From: "Andy "Krazy" Glew" on
On 4/1/2010 11:07 AM, glen herrmannsfeldt wrote:
> In comp.arch.fpga "Andy \"Krazy\" Glew"<ag-news(a)patten-glew.net> wrote:
> (snip)
>
>> I never really knew how the 360/91 did register renaming.
>> I don't think it used a RAM style map. I think it used CAMs.
>
>> I actually asked Tomasulo this, but he never really answered
>> the question.
>
> Never having had anyone to ask, but only read about it in books,
> that sounds about right.

All I know is that I proposed having a separate pipestage to rename registers, using a RAM (SRAM) table indexed by
logical register number returning physical register number, in 1986 or 1987 - in Wen-mei Hwu's microprocessor design
class - after he had taken us through Tomasulo and HPSm.

I.e. I proposed eliminating the CAMs, replacing them by a RAM and an additional pipestage.

The idea seemed new to everyone who encountered it. It was not universally accepted as good. Indeed, I remember arguing
with Tom Olson of AMD (if memory serves), who said that spending an extra pipestage was not a good idea.

I also talked to Mitch about it at around that time, although he was preoccupied with spreadsheets for the



> The explanation I have seen for the CDB, common data bus, was
> that results come out broadcast to all possible destinations.
> Those destinations expecting a result from that source accept it.
> Possible destinations are registers, reservation stations
> (for adders or mutliply/divide), or to be written to main memory.
> Sources are results from arithmetic units, or data read from
> (750 ns, 16 way interleaved) main memory.

Many people say that the CDB was an important invention. I think it was a bad idea - long wires, CAMs.

Conceptually it is elegant, but implementation wise it is a bad idea.

The important thing is taking that conceptually elegant CAM-ful idea, and implementing it in an efficient non-CAM manner.

The modern style of register renaming accomplishes this - certainly for the registers, but also, depending on the
system, for the reservation stations (if those are still being used).




> Among the not so obvious ones, if you store to memory and then
> refetch, register renaming will detect the same address is
> being used and go directly to the source. (No cache on the
> 360/91, it originated on the 360/85.)

I'd love to see a reference for this.

I believe that a UWisc patent on this was one of the things that resulted in a big payment from Intel to UWisc.

Myself, I thought it was obvious.