An O(1) algorithm for solving the LFU page replacement algorithm [Theory]

Prev: A new prime number pattern
Next: solutions to Sipser's Theory of Computation

From: Tim Little on 11 Jul 2010 22:37

On 2010-07-11, dhruvbird <dhruvbird(a)gmail.com> wrote:
> How is the access count (frequency of access) different from the
> number of accesses of a particular object?

Frequency of access is how often a object is accessed *per unit time*.
An page accessed 20 times in the millisecond that it has been in cache
is more frequently accessed (on average) than one that has been in
cache for an hour, accessed 2000 times over that period.

The former has an access frequency of 20,000 accesses per second, the
latter has an access frequency of less than 1 per second.

> As it currently stands, it does track accesses for all pages in the
> cache, which is assumed to be all the pages since all accesses are
> supposed to go through the cache.

Once a page leaves the cache its access count is lost. Your algorithm
says that it re-enters the cache with an access count of 1, and the
O(1) property depends upon that.

Here is a practical example of how it fails to be LFU: suppose you
have a cache of 5 pages, with 6 objects accessed as follows:

(A B C D E B C D E), then (F A) repeated indefinitely.

The first pattern fills your cache with pages BCDE having 2 accesses
each and A having 1 access. When F is accessed, A is spilled. When A
is next accessed, F is spilled and A comes back in with a count of 1.
Despite the frequency of access for pages BCDE going to zero, they
pollute the cache indefinitely because their count of accesses while
in cache is higher than the other two pages.

- Tim

From: dhruvbird on 12 Jul 2010 06:27

On Jul 12, 7:37 am, Tim Little <t...(a)little-possums.net> wrote:
> On 2010-07-11, dhruvbird <dhruvb...(a)gmail.com> wrote:
>
> > How is the access count (frequency of access) different from the
> > number of accesses of a particular object?
>
> Frequency of access is how often a object is accessed *per unit time*.
> An page accessed 20 times in the millisecond that it has been in cache
> is more frequently accessed (on average) than one that has been in
> cache for an hour, accessed 2000 times over that period.
>
> The former has an access frequency of 20,000 accesses per second, the
> latter has an access frequency of less than 1 per second.

Suppose these are the access counts of objects A and B (each entry
occuring every second)

A, B, B, B, B, B, A, B, B, B, B, B, A, B, B, B, B, B, A, B, B, B, B,
B, A, B, B, B, B, B

Then what according to you is the access frequency of A and B
respectively?

I would say A's access frequency is 5/30 == 1/6
and B's access frequency is 25/30 == 5/6

Do you have some other way of computing the access frequencies?

>
> > As it currently stands, it does track accesses for all pages in the
> > cache, which is assumed to be all the pages since all accesses are
> > supposed to go through the cache.
>
> Once a page leaves the cache its access count is lost. Your algorithm
> says that it re-enters the cache with an access count of 1, and the
> O(1) property depends upon that.

Not really. I have made the assumption that when you evict the object,
you evict the count along with it.
You can very easily modify the algorithm to only evict the object and
keep the count. (see below for more details on your example).

>
> Here is a practical example of how it fails to be LFU: suppose you
> have a cache of 5 pages, with 6 objects accessed as follows:
>
> (A B C D E B C D E), then (F A) repeated indefinitely.
>
> The first pattern fills your cache with pages BCDE having 2 accesses
> each and A having 1 access. When F is accessed, A is spilled. When A
> is next accessed, F is spilled and A comes back in with a count of 1.
> Despite the frequency of access for pages BCDE going to zero, they
> pollute the cache indefinitely because their count of accesses while
> in cache is higher than the other two pages.

The above problem can not be solved simply since it is impossible for
the algorithm to know what accesses will be made in the future. If you
want optimal caching, all accesses need to be known in advance, which
is not possible for practical algorithms.
http://en.wikipedia.org/wiki/Page_replacement_algorithm#The_theoretically_optimal_page_replacement_algorithm

While I agree with the point you make about LFU working in a certain
way, this is exactly how all currently implemented LFU algorithms
work.

The point I'm trying to make is that I believe that before this work,
LFU (with access counts) could only be done in O(log n). If you know
of any paper/article mentioning any work that claims it to be better
than O(log n), please do point me to it since I believe this is
something new.

Regards,
-Dhruv.

From: Tim Little on 12 Jul 2010 07:04

On 2010-07-12, dhruvbird <dhruvbird(a)gmail.com> wrote:
> The above problem can not be solved simply since it is impossible
> for the algorithm to know what accesses will be made in the future.

I'm not talking about the future. By the second iteration in my
example, your algorithm has the wrong access frequencies for the
*past*. Despite A and F being accessed many times, they are
continually evicted in favour of pages that were accessed twice and
never touched again.

> While I agree with the point you make about LFU working in a certain
> way, this is exactly how all currently implemented LFU algorithms
> work.

No, LFU caches periodically age entries so that the frequencies
are updated.

- Tim

From: dhruvbird on 12 Jul 2010 15:14

On Jul 12, 4:04 pm, Tim Little <t...(a)little-possums.net> wrote:
> On 2010-07-12, dhruvbird <dhruvb...(a)gmail.com> wrote:
>
>
> > While I agree with the point you make about LFU working in a certain
> > way, this is exactly how all currently implemented LFU algorithms
> > work.
>
> No, LFU caches periodically age entries so that the frequencies
> are updated.

On the contrary, pure LFU is based on access count only and the
algorithm you mention is actually a variation which involves ageing.
http://en.wikipedia.org/wiki/Least_Frequently_Used
http://en.wikipedia.org/wiki/Cache_algorithms#Least-Frequently_Used

Regards,
-Dhruv.

From: Tim Little on 12 Jul 2010 22:10

On 2010-07-12, dhruvbird <dhruvbird(a)gmail.com> wrote:
> On Jul 12, 4:04 pm, Tim Little <t...(a)little-possums.net> wrote:
>> On 2010-07-12, dhruvbird <dhruvb...(a)gmail.com> wrote:
>>
>>
>> > While I agree with the point you make about LFU working in a certain
>> > way, this is exactly how all currently implemented LFU algorithms
>> > work.
>>
>> No, LFU caches periodically age entries so that the frequencies
>> are updated.
>
> On the contrary, pure LFU is based on access count only and the
> algorithm you mention is actually a variation which involves ageing.
> http://en.wikipedia.org/wiki/Least_Frequently_Used

"This computer science article is a stub. You can help Wikipedia by
expanding it." That rather markedly lowers my estimation of the
reliability of that article.

But even there it refers to use *frequency*, not just use count while
in cache. The single reference for that page states "correct
implementation of LFU requires that replacement decisions are made
based on frequency access information (populatity), *for all the
objects accessed since the beginning of a cache's operation*."
[Emphasis mine]

Your algorithm does not do this, and so is not LFU.

> http://en.wikipedia.org/wiki/Cache_algorithms#Least-Frequently_Used

A vague 1-line summary that points to the above stub.

- Tim

| Next | Last
Pages: 1 2 3 4
Prev: A new prime number pattern
Next: solutions to Sipser's Theory of Computation