Linux/Guest cooperative unmapped page cache control [Kernel]

Prev: Dear Account Owner,
Next: [PATCH] rt3070: Fixed a line over 80 character warning reported by checkpatch.pl tool

From: Avi Kivity on 14 Jun 2010 11:40

On 06/14/2010 06:12 PM, Dave Hansen wrote:
> On Mon, 2010-06-14 at 14:18 +0530, Balbir Singh wrote:
>
>> 1. A slab page will not be freed until the entire page is free (all
>> slabs have been kfree'd so to speak). Normal reclaim will definitely
>> free this page, but a lot of it depends on how frequently we are
>> scanning the LRU list and when this page got added.
>>
> You don't have to be freeing entire slab pages for the reclaim to have
> been useful. You could just be making space so that _future_
> allocations fill in the slab holes you just created. You may not be
> freeing pages, but you're reducing future system pressure.
>

Depends. If you've evicted something that will be referenced soon,
you're increasing system pressure.

> If unmapped page cache is the easiest thing to evict, then it should be
> the first thing that goes when a balloon request comes in, which is the
> case this patch is trying to handle. If it isn't the easiest thing to
> evict, then we _shouldn't_ evict it.
>

Easy to evict is just one measure. There's benefit (size of data
evicted), cost to refill (seeks, cpu), and likelihood that the cost to
refill will be incurred (recency).

It's all very complicated. We need better information to make these
decisions. For one thing, I'd like to see age information tied to
objects. We may have two pages that were referenced in wildly different
times be next to each other in LRU order. We have many LRUs, but no
idea of the relative recency of the tails of those LRUs.

If each page or object had an age, we could scale those ages by the
benefit from reclaim and cost to refill and make a better decision as to
what to evict first. But of course page->age means increasing sizeof
struct page, and we can only approximate its value by scanning the
accessed bit, not determine it accurately (unlike the other objects
managed by the cache).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 14 Jun 2010 11:50

On 06/14/2010 06:33 PM, Dave Hansen wrote:
> On Mon, 2010-06-14 at 16:01 +0300, Avi Kivity wrote:
>
>> If we drop unmapped pagecache pages, we need to be sure they can be
>> backed by the host, and that depends on the amount of sharing.
>>
> You also have to set up the host up properly, and continue to maintain
> it in a way that finds and eliminates duplicates.
>
> I saw some benchmarks where KSM was doing great, finding lots of
> duplicate pages. Then, the host filled up, and guests started
> reclaiming. As memory pressure got worse, so did KSM's ability to find
> duplicates.
>

Yup. KSM needs to be backed up by ballooning, swap, and live migration.

> At the same time, I see what you're trying to do with this. It really
> can be an alternative to ballooning if we do it right, since ballooning
> would probably evict similar pages. Although it would only work in idle
> guests, what about a knob that the host can turn to just get the guest
> to start running reclaim?
>

Isn't the knob in this proposal the balloon? AFAICT, the idea here is
to change how the guest reacts to being ballooned, but the trigger
itself would not change.

My issue is that changing the type of object being preferentially
reclaimed just changes the type of workload that would prematurely
suffer from reclaim. In this case, workloads that use a lot of unmapped
pagecache would suffer.

btw, aren't /proc/sys/vm/swapiness and vfs_cache_pressure similar knobs?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Balbir Singh on 14 Jun 2010 13:00

* Dave Hansen <dave(a)linux.vnet.ibm.com> [2010-06-14 08:12:56]:

> On Mon, 2010-06-14 at 14:18 +0530, Balbir Singh wrote:
> > 1. A slab page will not be freed until the entire page is free (all
> > slabs have been kfree'd so to speak). Normal reclaim will definitely
> > free this page, but a lot of it depends on how frequently we are
> > scanning the LRU list and when this page got added.
>
> You don't have to be freeing entire slab pages for the reclaim to have
> been useful. You could just be making space so that _future_
> allocations fill in the slab holes you just created. You may not be
> freeing pages, but you're reducing future system pressure.
>
> If unmapped page cache is the easiest thing to evict, then it should be
> the first thing that goes when a balloon request comes in, which is the
> case this patch is trying to handle. If it isn't the easiest thing to
> evict, then we _shouldn't_ evict it.
>

Like I said earlier, a lot of that works correctly as you said, but it
is also an idealization. If you've got duplicate pages and you know
that they are duplicated and can be retrieved at a lower cost, why
wouldn't we go after them first?

--
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Balbir Singh on 14 Jun 2010 13:20

* Dave Hansen <dave(a)linux.vnet.ibm.com> [2010-06-14 10:09:31]:

> On Mon, 2010-06-14 at 22:28 +0530, Balbir Singh wrote:
> > If you've got duplicate pages and you know
> > that they are duplicated and can be retrieved at a lower cost, why
> > wouldn't we go after them first?
>
> I agree with this in theory. But, the guest lacks the information about
> what is truly duplicated and what the costs are for itself and/or the
> host to recreate it. "Unmapped page cache" may be the best proxy that
> we have at the moment for "easy to recreate", but I think it's still too
> poor a match to make these patches useful.
>

That is why the policy (in the next set) will come from the host. As
to whether the data is truly duplicated, my experiments show up to 60%
of the page cache is duplicated. The first patch today is again
enabled by the host. Both of them are expected to be useful in the
cache != none case.

The data I have shows more details including the performance and
overhead.

--
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 14 Jun 2010 13:40

On 06/14/2010 06:55 PM, Dave Hansen wrote:
> On Mon, 2010-06-14 at 18:44 +0300, Avi Kivity wrote:
>
>> On 06/14/2010 06:33 PM, Dave Hansen wrote:
>>
>>> At the same time, I see what you're trying to do with this. It really
>>> can be an alternative to ballooning if we do it right, since ballooning
>>> would probably evict similar pages. Although it would only work in idle
>>> guests, what about a knob that the host can turn to just get the guest
>>> to start running reclaim?
>>>
>> Isn't the knob in this proposal the balloon? AFAICT, the idea here is
>> to change how the guest reacts to being ballooned, but the trigger
>> itself would not change.
>>
> I think the patch was made on the following assumptions:
> 1. Guests will keep filling their memory with relatively worthless page
> cache that they don't really need.
> 2. When they do this, it hurts the overall system with no real gain for
> anyone.
>
> In the case of a ballooned guest, they _won't_ keep filling memory. The
> balloon will prevent them. So, I guess I was just going down the path
> of considering if this would be useful without ballooning in place. To
> me, it's really hard to justify _with_ ballooning in place.
>

There are two decisions that need to be made:

- how much memory a guest should be given
- given some guest memory, what's the best use for it

The first question can perhaps be answered by looking at guest I/O rates
and giving more memory to more active guests. The second question is
hard, but not any different than running non-virtualized - except if we
can detect sharing or duplication. In this case, dropping a duplicated
page is worthwhile, while dropping a shared page provides no benefit.

How the patch helps answer either question, I'm not sure. I don't think
preferential dropping of unmapped page cache is the answer.

>> My issue is that changing the type of object being preferentially
>> reclaimed just changes the type of workload that would prematurely
>> suffer from reclaim. In this case, workloads that use a lot of unmapped
>> pagecache would suffer.
>>
>> btw, aren't /proc/sys/vm/swapiness and vfs_cache_pressure similar knobs?
>>
> Those tell you how to balance going after the different classes of
> things that we can reclaim.
>
> Again, this is useless when ballooning is being used. But, I'm thinking
> of a more general mechanism to force the system to both have MemFree
> _and_ be acting as if it is under memory pressure.
>

If there is no memory pressure on the host, there is no reason for the
guest to pretend it is under pressure. If there is memory pressure on
the host, it should share the pain among its guests by applying the
balloon. So I don't think voluntarily dropping cache is a good direction.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: Dear Account Owner,
Next: [PATCH] rt3070: Fixed a line over 80 character warning reported by checkpatch.pl tool