Multi-star [Computer Architecture]

Prev: 735998 Computer Knowledge, Free and alwqays Up to Date 80
Next: Using NTP performance as an early warning (Was Re: SYSENTER/SYSEXIT_vs._SYSCALL/SYSRET)

From: "Andy "Krazy" Glew" on 24 Jan 2010 15:15

Andy "Krazy" Glew wrote:
>>> However, I think that I did note in the Multistar writeup that you
>>> could execute straight from the S2, even though not necessarily all
>>> inputs are ready. E.g. by using Wmt-style replay, detecting
>>> non-readinss, ad sending the non-ready instructions back to either
>>> the S1 or S2. (Actualy, I guess that is not Wmt-style replay - they
>>> did not replay through the scheduler. Silly people.)

Ed Brekelbaum:
>> That's probably our biggest area of disagreement.
>>
>> We used to sit next to the Tejas guys. We heard so many horror
>> stories about replay, we were dead set against it. HSW was very much
>> oriented towards making it possible to have little or no replay (the
>> L1 scheduler was "snatch-back", while the L2 would eat the full L1
>> latency on dependent ops).
>>
>> I actually had a slide in the Micro presentation showing relative
>> replay rates. It didn't go well for the P4 :)

Not such a big area of disagreement. I didn't bring y'all replay: Sager and Upton and Hinton brought y'all replay.

However, it is possible to do replay in a much better, more stable, way than Willamette (and ... Tejas) did. Replay
through the scheduler. Transitive closure to propagate replay cancellations. At least one regular poster to this
newsgroup almost shipped a replay microarchitecture that IMHO was significantly better than Willamette's. If he can
talk about stuff, e.g. stuff that was patented and is hence public, I hope that he will.

I just refer to replay because it is the only way that I know of to do incomplete dependency tracking in the scheduler,
and then recovery.

I.e. either you do what Multistar proposed: incomplete dependency tracking in the S2, complete dependency tracking in
the S1, and no replay. Or you do incomplete dependency tracking in some scheduler Si, send stuff to execute, and then
have a way of recovering wen it wasn't ready. replay is what I call all such "I wasn't really ready to execute"
recovery mechanisms. Even runahead can be considered replay.

From: "Andy "Krazy" Glew" on 24 Jan 2010 15:18

nedbrek wrote:
>>> I see you discovered the key to HSW (pages 36 and 37), combining the P4
>>> and P3 register file styles :) I got all excited when I read what you said above
>>
>> Ah, wait... I see on page 31 you say "register read is performed before
>> instructions are inserted into the fast window making it a data capture
>> device".
>
> I was thinking of the ppt from the Micro presentation, it is made explicit
> there (slide 5 is P3 vs. P4, slide 7 is "Stop! You're both right!")

Can you provide me a link to this presentation? I can't seem to find it.

First | Prev |
Pages: 1 2
Prev: 735998 Computer Knowledge, Free and alwqays Up to Date 80
Next: Using NTP performance as an early warning (Was Re: SYSENTER/SYSEXIT_vs._SYSCALL/SYSRET)