From: EricP on
glen herrmannsfeldt wrote:
>
> There is an issue of the IBM Journal of Research and
> Development pretty much devoted to the 91. I believe
> it is in there. The 91 is pretty much a favorite for
> books on pipelined processor design, mostly referencing
> that journal issue.

An Efficient Algorithm for Exploiting Multiple Arithmetic Units
http://www.csd.uoc.gr/~hy425/lectures/tomasulo.pdf


From: "Andy "Krazy" Glew" on
On 4/1/2010 7:48 PM, MitchAlsup wrote:
> On Apr 1, 9:05 pm, "Andy \"Krazy\" Glew"<ag-n...(a)patten-glew.net>
> wrote:
>> I also talked to Mitch about it at around that time, although he was preoccupied with spreadsheets for the
>
> Any chance you could complete this sentance?
>
> Perhaps from {88100, 88110, 88120, crazy, insane, Asilomar
> participants, Hot Chips participants, all of the preceeding?}

Got distracted, forgot to finish. Wasn't exactly sure I remembered what you were working on.

Remember the first time I met you, Mitch, and Willie Anderson? What were you working on? Memory bandwidth spreadsheets
for the 88110? SIMD vectors? I remember we talked about DRAM bank structure, and you made your usual "If DRAMs were
designed the way I want them to be designed..." speech. I remember that you were interested in Linpack, while I was
interested in OOO and GCC.
From: "Andy "Krazy" Glew" on
On 4/1/2010 9:31 PM, glen herrmannsfeldt wrote:
> In comp.arch.fpga "Andy \"Krazy\" Glew"<ag-news(a)patten-glew.net> wrote:
> (snip)
>
>> All I know is that I proposed having a separate pipestage
>> to rename registers, using a RAM (SRAM) table indexed by
>> logical register number returning physical register number,
>> in 1986 or 1987 - in Wen-mei Hwu's microprocessor design
>> class - after he had taken us through Tomasulo and HPSm.
>
>> I.e. I proposed eliminating the CAMs, replacing them by a
>> RAM and an additional pipestage.
>
> With the 360/91 system, though, values can easily have more than
> one destination. I suppose that could be done other ways,
> too, but it is especially convenient that way.

That's basically why P6 both renamed to physical registers, and had an RS with CAMs.

RAM style indexing for the big data structure.

CAMs for the relatively smaller RS, broadcast.

I've always regretted not totally eliminating the CAMs in the RS. Always meant to get around to it in P6 v2.0, but that
never happened.

(BTW, no, Willamette did not eliminate the CAMs The bitmap scheduler is CAMs, but decoded CAs rather than encoded CAMs.
Many people think that the term "CAM" only apples to encoded CAMs, but don't really have a name for the decoded CAMs,
e.g. 1-hots. Me, I think encoded vs. decoded is just a circuit trick.)




>> The modern style of register renaming accomplishes this -
>> certainly for the registers, but also, depending on the
>> system, for the reservation stations (if those are still
>> being used).
>
> Logic was much more expensive then, than now, so the
> tradoffs are likely different. If you used RAM tables
> with more than one entry for each source, you could do
> multiple destinations easily.

Right The problem then has always bee "how may destinations", and "how do you handle exceeding the number of
destinations without (a) falling of a cliff, and (b) complexity".



>
>>> Among the not so obvious ones, if you store to memory and then
>>> refetch, register renaming will detect the same address is
>>> being used and go directly to the source. (No cache on the
>>> 360/91, it originated on the 360/85.)
>
>> I'd love to see a reference for this.
>
> There is an issue of the IBM Journal of Research and
> Development pretty much devoted to the 91. I believe
> it is in there. The 91 is pretty much a favorite for
> books on pipelined processor design, mostly referencing
> that journal issue.

I practically memorized that issue. Not there that I remember. Likely we are talking about different things.