|
Prev: "Livermore Loops" on x86 Linux
Next: Advantage and Disadvantage of combining the write buffer and victim cache
From: prasad on 21 Jun 2006 09:44 Hi, 1. Why do we require two stages of caches(L1 and L2). 2. What are the advantages or disadvantages of having two stages of cache instead of having single, larger L1cache. 3. In one paper(Level 2 Cache for High-performance ARM Core-based SoC Systems) it is mentioned that "The larger the cache is, the slower it is too, and so a large L1 cache will limit the CPU clock rate"..... How does cache size and operating frequencies are interrelated? Thanks and Regards, Anjaneya Prasad.
From: JJ on 21 Jun 2006 10:58 prasad wrote: > Hi, > > 1. Why do we require two stages of caches(L1 and L2). > 2. What are the advantages or disadvantages of having two stages of > cache instead of having single, larger L1cache. > 3. In one paper(Level 2 Cache for High-performance ARM Core-based SoC > Systems) it is mentioned that "The larger the cache is, the slower it > is too, and so a large L1 cache will > limit the CPU clock rate"..... How does cache size and operating > frequencies are interrelated? > > Thanks and Regards, > Anjaneya Prasad. Boils down to basic circuit design since caches are just collections of particularly well structured circuits, how to drive large C with starting buffer of fixed size. As C increases by e, an extra buffer stage is needed and the delay increases by an extra stage. If all stages are close to e, then the delay path is near optimal (according to 80s style circuit design with simpler metal interconnect model). In SRAMs though, things generally increase by powers of 2 and blocks are replaced by similar functions both larger & slower with extra inputs, outputs. So a cache might double the hight of its SRAM so now the bitlines have twice the C and require either more sensitive (slower) sense amps, or complexity can be added to mux the upper & lower halves. Either same no of levels of hierarchy must run slower, or more levels of hierarchy added. Delay follows the log of the size. The art is to minimize these incremental costs, thats called circuit design. Now the same staging idea applies to processor design that runs say 1000 times faster than DRAM, if memory stages each 10x faster and say 16x bigger then we might have caches L1 16K at 1x, L2 256K at 10x, and possibly L3 4M at 100x slower than cpu speeds . The 3rd level might actually be off chip faster DRAM rather than SRAM. L2 might only be <<10x slower than L1. As long as locality is high, L1 satisfies most requests, L2 as much as the remainder as possible and so on, hopefully you rarely go to L3 or main memory. Now if only all this were true! John Jakson
From: Stephen Sprunk on 21 Jun 2006 17:59 "prasad" <maprasad(a)gmail.com> wrote in message news:1150897444.517183.152020(a)r2g2000cwb.googlegroups.com... > 1. Why do we require two stages of caches(L1 and L2). > 2. What are the advantages or disadvantages of having two stages of > cache instead of having single, larger L1cache. The larger you make the cache, the more clocks it takes to access. If you can have a 16kB L1 cache at 2 cycles and a 512kB L2 cache at 20 cycles, that's nearly always going to provide better performance than just one 512kB cache at 18 cycles. The disadvantage of having two levels of cache is that discovering a miss in the L1 means those cycles are wasted, adding to the time it takes to access the L2. However, L1 hit rates tend to be extremely high for most applications, so this is a very minor complaint. > 3. In one paper(Level 2 Cache for High-performance ARM Core-based SoC > Systems) it is mentioned that "The larger the cache is, the slower it > is too, and so a large L1 cache will > limit the CPU clock rate"..... How does cache size and operating > frequencies are interrelated? It doesn't necessarily limit the CPU clock rate; accessing a larger area of memory takes more _time_, and that means either more cycles at the same clock rate or the same number of cycles at a lower clock rate. Generally, the L1 is made as large as can be accessed within a few clocks at the CPU's maximum speed (a few tens of kB, usually), whereas the L2 is made as large as you have space for on the die (and many clocks slower). S -- Stephen Sprunk "Stupid people surround themselves with smart CCIE #3723 people. Smart people surround themselves with K5SSS smart people who disagree with them." --Aaron Sorkin -- Posted via a free Usenet account from http://www.teranews.com
From: Nishu on 21 Jun 2006 23:30 Stephen Sprunk wrote: > "prasad" <maprasad(a)gmail.com> wrote in message > news:1150897444.517183.152020(a)r2g2000cwb.googlegroups.com... > > 1. Why do we require two stages of caches(L1 and L2). > > 2. What are the advantages or disadvantages of having two stages of > > cache instead of having single, larger L1cache. > > The larger you make the cache, the more clocks it takes to access. If you > can have a 16kB L1 cache at 2 cycles and a 512kB L2 cache at 20 cycles, > that's nearly always going to provide better performance than just one 512kB > cache at 18 cycles. > I've a doubt. Are you considering that 16kB L1 cache at 2 cycles to get degraded to L1' cache at 18 cycles if the size is increased to 512kB? If yes, Would you guide how it can be degraded.... delay in address translation? If No, What if we have 512kB of L1 cache at 2 cycles instead of two-level cache system? Even if I consider more _time_ delay for search in bigger cache, wouldn't the former be still faster? If it doesnt come down to cost cutting, and saving the die space (considering L1 is not economic in cost and space), i doubt the need of having two-level cache system. -Nishu
From: Stephen Sprunk on 22 Jun 2006 00:22 "Nishu" <naresh.attri(a)gmail.com> wrote in message news:1150947012.892894.240170(a)u72g2000cwu.googlegroups.com... > Stephen Sprunk wrote: >> "prasad" <maprasad(a)gmail.com> wrote in message >> news:1150897444.517183.152020(a)r2g2000cwb.googlegroups.com... >> > 1. Why do we require two stages of caches(L1 and L2). >> > 2. What are the advantages or disadvantages of having two stages of >> > cache instead of having single, larger L1cache. >> >> The larger you make the cache, the more clocks it takes to access. >> If you can have a 16kB L1 cache at 2 cycles and a 512kB L2 cache >> at 20 cycles, that's nearly always going to provide better performance >> than just one 512kB cache at 18 cycles. > > I've a doubt. Are you considering that 16kB L1 cache at 2 cycles to get > degraded to L1' cache at 18 cycles if the size is increased to 512kB? > > If yes, Would you guide how it can be degraded.... delay in address > translation? That's what I'm saying. I don't know enough of the transistor-level details of why that is, but I've seen it stated over and over by people who _do_ know the details. > If No, What if we have 512kB of L1 cache at 2 cycles instead of > two-level cache system? Even if I consider more _time_ delay for search > in bigger cache, wouldn't the former be still faster? No. The bigger the cache, the longer it takes to access. If you took an existing L2 cache and moved it to L1 (eliminating the current L1), it would take just as long to access -- and it'd kill performance. You can't make one cache fast and big at the same time, so the next best thing is to have a fast, small cache (L1) backed up by a slower, bigger cache (L2). > If it doesnt come down to cost cutting, and saving the die space > (considering L1 is not economic in cost and space), i doubt the need of > having two-level cache system. It's not about cost-cutting. Chipmakers today are desperate to find ways to use up the wealth of transistors available in a manner that actually improves performance. They throw hundreds of millions of transistors at making L2 bigger (or adding L3); they'd certainly throw a few tens of thousands at making L1 bigger if it helped -- but it actually slows things down because it increases the access time. It's a balancing act. S -- Stephen Sprunk "Stupid people surround themselves with smart CCIE #3723 people. Smart people surround themselves with K5SSS smart people who disagree with them." --Aaron Sorkin -- Posted via a free Usenet account from http://www.teranews.com
|
Next
|
Last
Pages: 1 2 3 4 Prev: "Livermore Loops" on x86 Linux Next: Advantage and Disadvantage of combining the write buffer and victim cache |