From: Paul A. Clayton on
Given that L2 (and farther) reads are usually transmitted in about
four transfers (e.g., 64B blocks-->four 16B transfers), would it be
profitable to place the predicted critical block in a nearby (lower
latency) area? (Even a predictor as simple as first read on previous
access might provide some benefit--at the cost of two tag bits and
some miss handling complexity.) An extension of this might be used in
an L2 shared by two cores: critical blocks could be placed near the
appropriate core. (Obviously, such would involve more complex
allocation and placement issues.)

While I have not read many NUCA papers, I have not yet seen any that
use predictability of access (prefetchability) to bias placement.

(It looks like the POWER7 L3 cache might be something like a Reactive
NUCA [Hardavellas et al., 2009] cache!)


Paul A. Clayton
just a technophile