Intel cache inclusion [Computer Architecture]

Prev: Ye Olde Log-Based SpMT Uarch
Next: x86 i/o management

From: Quadibloc on 6 May 2010 13:26

I'm glad you found a reference for it, as that's one thing I couldn't
find.

The same search that led me to that page led me to other items about
cache inclusion which make it reasonable that the outermost-level
cache, if it is much bigger than the other caches, would be inclusive.

Basically, a fully-inclusive cache makes sense when:

- the cache is very large, and not direct-mapped,
- thus, any items in that cache that are in more low-level caches must
be locked in that cache.

This also lets cache coherency be maintained more easily, since
multiple cores on the chip don't have to snoop each others' L1 (or
L2!) caches, so the L3 cache on a multi-core chip probably should act
this way, explaining Intel's choice.

A fully-exclusive cache makes sense when:

- the cache is not much larger than the lower-level cache it supplies
data to,
- thus, the two caches together provide more cache space when they
don't repeat any data.

If the L2 caches are big enough, though, instead of being fully-
exclusive, they probably would follow the mixed-inclusion strategy
that Intel has experience with on its earlier chips.

John Savard

From: Stephen Fuld on 6 May 2010 14:07

On 5/6/2010 10:26 AM, Quadibloc wrote:
> I'm glad you found a reference for it, as that's one thing I couldn't
> find.
>
> The same search that led me to that page led me to other items about
> cache inclusion which make it reasonable that the outermost-level
> cache, if it is much bigger than the other caches, would be inclusive.
>
> Basically, a fully-inclusive cache makes sense when:
>
> - the cache is very large, and not direct-mapped,
> - thus, any items in that cache that are in more low-level caches must
> be locked in that cache.
>
> This also lets cache coherency be maintained more easily, since
> multiple cores on the chip don't have to snoop each others' L1 (or
> L2!) caches, so the L3 cache on a multi-core chip probably should act
> this way, explaining Intel's choice.

Isn't that only true if the lower level caches are write thru? I think
Intel's L1 is, but I am not sure about the L2.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

From: Jeremy Linton on 6 May 2010 14:12

On 5/6/2010 7:00 AM, Mark Brehob wrote:

> I'm not seeing the claim that the Intel Core i7 is fully inclusive
> there (I may be blind) but that did end up leading me to
> http://software.intel.com/en-us/articles/who-moved-the-goal-posts-the-rapidly-changing-world-of-cpus/
> where the claim is clearly made for the L3. By the tone of the
> presentation I'm guessing that previous processors didn't have an
> inclusive L3/L2, but I can't prove it...
My understanding is that previous intel's were neither. As compared
with AMD's which have been exclusive for a long time. I've seen the
phenom II's advertised as "8M effective cache" size because they are
adding the L3 (6M) with the L2's (4x512k on the 4 core).

BTW, this exclusive policy apparently hurt AMD's power usage until
recently when they added a flush to L3 on power down core. Before that
they had to keep the caches powered up to respond to snoops.

From: Andy 'Krazy' Glew on 6 May 2010 22:28

On 5/6/2010 9:32 AM, MitchAlsup wrote:
> On May 5, 9:34 pm, Mark Brehob<bre...(a)gmail.com> wrote:
>> Hello,
>>
>> Does anyone know the history of cache inclusion on Intel processors?

Well, I was in a conference room when Wen-Han Wang - he got some ACM or IEEE award for his papers on cache inclusion -
proposed the accidentally inclusive policy.

I.e. the guy who invented[*] cache inclusion also invented Intel's non-inclusive policy.

[*] I add this note to "invented cache inclusion" since I am sure that somebody at IBM or elsewhere did similar work.
How about "is well known for publishing cache inclusion"?

> I suspect that if the sets of associativity of the interior caches
> are larger than the sets of associativity of the outermost cache, then
> it is unwise to try to guarantee inclusion.

Mitch has identified one of the glass jaws of inclusive caches, wrt Multi/ManyCore and associativity.

An alternative to using inclusive caches to reduce the need to snoop probe interior caches is to use a separate snoop
filter - a cache like structure that might store only 1 bit per cache line (or maybe log2(Ncores) bits per cache line),
possibly with tags not per cache line, but at a coarser granularity - perhaps one tag per 4KiB page. Such a snoop
filter can cover a much larger set of cache line addreses than an L3 cache, and is much less likely to suffer the above
mentioned glass jaw.

---

Another issue with inclusive caches is maintaining inclusion. Typically we use an LRU or pseudo-LRU policy to determine
what cache line should be evicted when a new cache line fill must be done. However, LRU bits in the L3 cache will not
be updated for lines that are constantly hitting in the L1 and L2 caches.

Some old systems (older than P6 in 1991) identified the victim on the fill request, relying on particular configurations
of associativity so that the inner caches could model the outer. This doesn't work so well for MultiCore.

One common mechanism, I believe due to Wen-hann, is backwards invalidate. This doesn't change the LRU problem; it just
means that when the outer cache evicts a line, it tells the inner cache to do the same.

Some systems will "trickle through" LRU updates from an inner cache to an outer inclusive cache. Piggy backing on other
traffic, or occasionally "just because".

I suppose that you could do a handshake like the following:

Processor i inner cache $i sends a miss to outer cache
Outer cache misses, determines it needs an eviction,
and then chooses a processor j and/or inner cache $j
to send an eviction request to. Without specifying the victim.
(Note: a cache line may be in multiple caches: may need to multicast)
Inner cache $j sees the eviction request.
Which probably specifies an outer cache set that $j can use to determine
which of its own sets it should choose a victim from.
$j chooses a victim, and then tells the outer cache, as well as
any peer inner caches that need to be invalidated.

Hmm... this may be novel. I should probably write up an invention disclosure for it, and/or do a prior art search.
Normally if I think I have invented something while doing a comp.arch post, I stop posting immediately. (This
sometimes explains the abrupt stops in my posts.)
I won't stop here because, although I am not aware of prior art, I would not be at all surprised if something like
this has already been done. Plus, I expect that some other comp.arch'er will quickly tell me what systems invented the
above policy.

From: Andy 'Krazy' Glew on 7 May 2010 01:41

On 5/6/2010 7:28 PM, Andy 'Krazy' Glew wrote:
>> On May 5, 9:34 pm, Mark Brehob<bre...(a)gmail.com> wrote:
>>> Does anyone know the history of cache inclusion on Intel processors?
>
> Well, I was in a conference room when Wen-Hann Wang - he got some ACM or
> IEEE award for his papers on cache inclusion - proposed the accidentally
> inclusive policy.
>
> I.e. the guy who invented[*] cache inclusion also invented Intel's
> non-inclusive policy.
>
> [*] I add this note to "invented cache inclusion" since I am sure that
> somebody at IBM or elsewhere did similar work. How about "is well known
> for publishing cache inclusion"?

Actually, I'm NOT sure that somebody invented cache inclusion before Wen-Hann. He's a friend, and I want to keep him
that way.

I am sure that somebody will SAY that somebody at IBM invented cache inclusion.

For that matter - Wen-Hann was at IBM before Intel.

First | Prev | Next | Last
Pages: 1 2 3
Prev: Ye Olde Log-Based SpMT Uarch
Next: x86 i/o management