Transitive Closure Propagation of Pipeline Replays [Computer Architecture]

Prev: x86 instruction set usage-difference between windows 95 andwindows xp ?
Next: Probably a range/overflow checking bug in Delphi 2007 compiler.

From: Andy 'Krazy' Glew on 25 May 2010 03:16

http://semipublic.comp-arch.net/wiki/Transitive_Closure_Propagation

== AMD Patent Application ==

See patent application
http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PG01&s1=transitive.TTL.&s2=Advanced.AS.&OS=TTL/transitive+AND+AN/Advanced&RS=TTL/transitive+AND+AN/Advanced

United States Patent Application 20080028193
Dhodapkar; Ashutosh S.
Transitive suppression of instruction replay
Filed July 31, 2006

== Other Work ==

Mike Haertel and I realized that replay was a dynamically unstable system as soon as we were introduced to the concept
of replay, by Sager at Upton at the beginning of Willamette 1994-1996. The concept of transitive closure to suppress
replay was the obvious next step. We explored concepts such as bitmasks attached to older instructions indicating what
younger instructions might need to be canceled on a replay, and vice versa.

Complete transitive closure is best suited to bitmask schedulers, because of the high fan-out.

Incomplete transitive closure is another possibility: rather than indicating all children, or all ancestors, one might
indicate, for example, the next N descendants. Indicating children can be done by iterating through the scheduler,
whereas indicating ancestors tends to be best suited to CAMing.

Instantaneous transitive closure cancellation is not necessary. It is only necessary that the cancellation wavefront
propagate faster than the true execution wavefront. As mentioned above, listing N children or other descendants,
compared to a smaller number M of immediate children. Particularly suited to schedulers of limited fan-out. Indeed,
one might have a fixed size list of N children, and use any excess N-M for grandchildren, etc. In case the number of
immediate descendants is too small, overflow lists, possibly with a different, flattened, dependency structure for
cancellations.

Some have observed that not making any special efforts to propagate the cancellation wave faster is usually okay: so
long as the cancellation wave does not take cache misses.

We might arrange for all of the cancellation wave operations to be unit latency, lower than true computation.

It may be desirable to place the cancellation propagation in a different scheduler - aka a recovery scheduler.

Hybrid systems, with bitmask transitive closure in an inner scheduler and tag oriented iterative cancellation in an
outer scheduler, is a possibility.

My (Glew's) multistar paper microarchitecture,
http://docs.google.com/fileview?id=0B5qTWL2s3LcQZDIyZDVmN2EtYjY4MC00YjU2LWE4ZGMtYzk2MmU4M2U2NDQ5&hl=en,
designed August 2004, disclosed in patent applications
United States Patent Application 20070083735
Glew; Andrew Forsyth,
Hierarchical processor
Filed: August 29, 2005
describe replay schedulers and anti-schedulers with partial transitive closure.

This was not claimed as novel in this patent application because
transitive closure was already understood to be well-known.

|
Pages: 1
Prev: x86 instruction set usage-difference between windows 95 andwindows xp ?
Next: Probably a range/overflow checking bug in Delphi 2007 compiler.