From: prasad on
Hi,

1. Why do we require two stages of caches(L1 and L2).
2. What are the advantages or disadvantages of having two stages of
cache instead of having single, larger L1cache.
3. In one paper(Level 2 Cache for High-performance ARM Core-based SoC
Systems) it is mentioned that "The larger the cache is, the slower it
is too, and so a large L1 cache will
limit the CPU clock rate"..... How does cache size and operating
frequencies are interrelated?

Thanks and Regards,
Anjaneya Prasad.

From: JJ on

prasad wrote:
> Hi,
>
> 1. Why do we require two stages of caches(L1 and L2).
> 2. What are the advantages or disadvantages of having two stages of
> cache instead of having single, larger L1cache.
> 3. In one paper(Level 2 Cache for High-performance ARM Core-based SoC
> Systems) it is mentioned that "The larger the cache is, the slower it
> is too, and so a large L1 cache will
> limit the CPU clock rate"..... How does cache size and operating
> frequencies are interrelated?
>
> Thanks and Regards,
> Anjaneya Prasad.

Boils down to basic circuit design since caches are just collections of
particularly well structured circuits, how to drive large C with
starting buffer of fixed size.

As C increases by e, an extra buffer stage is needed and the delay
increases by an extra stage. If all stages are close to e, then the
delay path is near optimal (according to 80s style circuit design with
simpler metal interconnect model).

In SRAMs though, things generally increase by powers of 2 and blocks
are replaced by similar functions both larger & slower with extra
inputs, outputs. So a cache might double the hight of its SRAM so now
the bitlines have twice the C and require either more sensitive
(slower) sense amps, or complexity can be added to mux the upper &
lower halves. Either same no of levels of hierarchy must run slower, or
more levels of hierarchy added. Delay follows the log of the size.

The art is to minimize these incremental costs, thats called circuit
design.

Now the same staging idea applies to processor design that runs say
1000 times faster than DRAM, if memory stages each 10x faster and say
16x bigger then we might have caches L1 16K at 1x, L2 256K at 10x, and
possibly L3 4M at 100x slower than cpu speeds . The 3rd level might
actually be off chip faster DRAM rather than SRAM. L2 might only be
<<10x slower than L1.

As long as locality is high, L1 satisfies most requests, L2 as much as
the remainder as possible and so on, hopefully you rarely go to L3 or
main memory. Now if only all this were true!

John Jakson

From: Stephen Sprunk on
"prasad" <maprasad(a)gmail.com> wrote in message
news:1150897444.517183.152020(a)r2g2000cwb.googlegroups.com...
> 1. Why do we require two stages of caches(L1 and L2).
> 2. What are the advantages or disadvantages of having two stages of
> cache instead of having single, larger L1cache.

The larger you make the cache, the more clocks it takes to access. If you
can have a 16kB L1 cache at 2 cycles and a 512kB L2 cache at 20 cycles,
that's nearly always going to provide better performance than just one 512kB
cache at 18 cycles.

The disadvantage of having two levels of cache is that discovering a miss in
the L1 means those cycles are wasted, adding to the time it takes to access
the L2. However, L1 hit rates tend to be extremely high for most
applications, so this is a very minor complaint.

> 3. In one paper(Level 2 Cache for High-performance ARM Core-based SoC
> Systems) it is mentioned that "The larger the cache is, the slower it
> is too, and so a large L1 cache will
> limit the CPU clock rate"..... How does cache size and operating
> frequencies are interrelated?

It doesn't necessarily limit the CPU clock rate; accessing a larger area of
memory takes more _time_, and that means either more cycles at the same
clock rate or the same number of cycles at a lower clock rate.

Generally, the L1 is made as large as can be accessed within a few clocks at
the CPU's maximum speed (a few tens of kB, usually), whereas the L2 is made
as large as you have space for on the die (and many clocks slower).

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin


--
Posted via a free Usenet account from http://www.teranews.com

From: Nishu on

Stephen Sprunk wrote:

> "prasad" <maprasad(a)gmail.com> wrote in message
> news:1150897444.517183.152020(a)r2g2000cwb.googlegroups.com...
> > 1. Why do we require two stages of caches(L1 and L2).
> > 2. What are the advantages or disadvantages of having two stages of
> > cache instead of having single, larger L1cache.
>
> The larger you make the cache, the more clocks it takes to access. If you
> can have a 16kB L1 cache at 2 cycles and a 512kB L2 cache at 20 cycles,
> that's nearly always going to provide better performance than just one 512kB
> cache at 18 cycles.
>

I've a doubt. Are you considering that 16kB L1 cache at 2 cycles to get
degraded to L1' cache at 18 cycles if the size is increased to 512kB?

If yes, Would you guide how it can be degraded.... delay in address
translation?

If No, What if we have 512kB of L1 cache at 2 cycles instead of
two-level cache system? Even if I consider more _time_ delay for search
in bigger cache, wouldn't the former be still faster?
If it doesnt come down to cost cutting, and saving the die space
(considering L1 is not economic in cost and space), i doubt the need of
having two-level cache system.

-Nishu

From: Stephen Sprunk on
"Nishu" <naresh.attri(a)gmail.com> wrote in message
news:1150947012.892894.240170(a)u72g2000cwu.googlegroups.com...
> Stephen Sprunk wrote:
>> "prasad" <maprasad(a)gmail.com> wrote in message
>> news:1150897444.517183.152020(a)r2g2000cwb.googlegroups.com...
>> > 1. Why do we require two stages of caches(L1 and L2).
>> > 2. What are the advantages or disadvantages of having two stages of
>> > cache instead of having single, larger L1cache.
>>
>> The larger you make the cache, the more clocks it takes to access.
>> If you can have a 16kB L1 cache at 2 cycles and a 512kB L2 cache
>> at 20 cycles, that's nearly always going to provide better performance
>> than just one 512kB cache at 18 cycles.
>
> I've a doubt. Are you considering that 16kB L1 cache at 2 cycles to get
> degraded to L1' cache at 18 cycles if the size is increased to 512kB?
>
> If yes, Would you guide how it can be degraded.... delay in address
> translation?

That's what I'm saying. I don't know enough of the transistor-level details
of why that is, but I've seen it stated over and over by people who _do_
know the details.

> If No, What if we have 512kB of L1 cache at 2 cycles instead of
> two-level cache system? Even if I consider more _time_ delay for search
> in bigger cache, wouldn't the former be still faster?

No. The bigger the cache, the longer it takes to access. If you took an
existing L2 cache and moved it to L1 (eliminating the current L1), it would
take just as long to access -- and it'd kill performance. You can't make
one cache fast and big at the same time, so the next best thing is to have a
fast, small cache (L1) backed up by a slower, bigger cache (L2).

> If it doesnt come down to cost cutting, and saving the die space
> (considering L1 is not economic in cost and space), i doubt the need of
> having two-level cache system.

It's not about cost-cutting. Chipmakers today are desperate to find ways to
use up the wealth of transistors available in a manner that actually
improves performance. They throw hundreds of millions of transistors at
making L2 bigger (or adding L3); they'd certainly throw a few tens of
thousands at making L1 bigger if it helped -- but it actually slows things
down because it increases the access time. It's a balancing act.

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin


--
Posted via a free Usenet account from http://www.teranews.com