Linux/Guest cooperative unmapped page cache control [Kernel]

Prev: Dear Account Owner,
Next: [PATCH] rt3070: Fixed a line over 80 character warning reported by checkpatch.pl tool

From: Avi Kivity on 14 Jun 2010 04:20

On 06/11/2010 07:56 AM, Balbir Singh wrote:
>
>> Just to be clear, let's say we have a mapped page (say of /sbin/init)
>> that's been unreferenced since _just_ after the system booted. We also
>> have an unmapped page cache page of a file often used at runtime, say
>> one from /etc/resolv.conf or /etc/passwd.
>>
>> Which page will be preferred for eviction with this patch set?
>>
>>
> In this case the order is as follows
>
> 1. First we pick free pages if any
> 2. If we don't have free pages, we go after unmapped page cache and
> slab cache
> 3. If that fails as well, we go after regularly memory
>
> In the scenario that you describe, we'll not be able to easily free up
> the frequently referenced page from /etc/*. The code will move on to
> step 3 and do its regular reclaim.
>

Still it seems to me you are subverting the normal order of reclaim. I
don't see why an unmapped page cache or slab cache item should be
evicted before a mapped page. Certainly the cost of rebuilding a dentry
compared to the gain from evicting it, is much higher than that of
reestablishing a mapped page.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Balbir Singh on 14 Jun 2010 04:50

* Avi Kivity <avi(a)redhat.com> [2010-06-14 11:09:44]:

> On 06/11/2010 07:56 AM, Balbir Singh wrote:
> >
> >>Just to be clear, let's say we have a mapped page (say of /sbin/init)
> >>that's been unreferenced since _just_ after the system booted. We also
> >>have an unmapped page cache page of a file often used at runtime, say
> >>one from /etc/resolv.conf or /etc/passwd.
> >>
> >>Which page will be preferred for eviction with this patch set?
> >>
> >In this case the order is as follows
> >
> >1. First we pick free pages if any
> >2. If we don't have free pages, we go after unmapped page cache and
> >slab cache
> >3. If that fails as well, we go after regularly memory
> >
> >In the scenario that you describe, we'll not be able to easily free up
> >the frequently referenced page from /etc/*. The code will move on to
> >step 3 and do its regular reclaim.
>
> Still it seems to me you are subverting the normal order of reclaim.
> I don't see why an unmapped page cache or slab cache item should be
> evicted before a mapped page. Certainly the cost of rebuilding a
> dentry compared to the gain from evicting it, is much higher than
> that of reestablishing a mapped page.
>

Subverting to aviod memory duplication, the word subverting is
overloaded, let me try to reason a bit. First let me explain the
problem

Memory is a precious resource in a consolidated environment.
We don't want to waste memory via page cache duplication
(cache=writethrough and cache=writeback mode).

Now here is what we are trying to do

1. A slab page will not be freed until the entire page is free (all
slabs have been kfree'd so to speak). Normal reclaim will definitely
free this page, but a lot of it depends on how frequently we are
scanning the LRU list and when this page got added.
2. In the case of page cache (specifically unmapped page cache), there
is duplication already, so why not go after unmapped page caches when
the system is under memory pressure?

In the case of 1, we don't force a dentry to be freed, but rather a
freed page in the slab cache to be reclaimed ahead of forcing reclaim
of mapped pages.

Does the problem statement make sense? If so, do you agree with 1 and
2? Is there major concern about subverting regular reclaim? Does
subverting it make sense in the duplicated scenario?

--
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 14 Jun 2010 08:50

On 06/14/2010 11:48 AM, Balbir Singh wrote:
>>>
>>> In this case the order is as follows
>>>
>>> 1. First we pick free pages if any
>>> 2. If we don't have free pages, we go after unmapped page cache and
>>> slab cache
>>> 3. If that fails as well, we go after regularly memory
>>>
>>> In the scenario that you describe, we'll not be able to easily free up
>>> the frequently referenced page from /etc/*. The code will move on to
>>> step 3 and do its regular reclaim.
>>>
>> Still it seems to me you are subverting the normal order of reclaim.
>> I don't see why an unmapped page cache or slab cache item should be
>> evicted before a mapped page. Certainly the cost of rebuilding a
>> dentry compared to the gain from evicting it, is much higher than
>> that of reestablishing a mapped page.
>>
>>
> Subverting to aviod memory duplication, the word subverting is
> overloaded,

Right, should have used a different one.

> let me try to reason a bit. First let me explain the
> problem
>
> Memory is a precious resource in a consolidated environment.
> We don't want to waste memory via page cache duplication
> (cache=writethrough and cache=writeback mode).
>
> Now here is what we are trying to do
>
> 1. A slab page will not be freed until the entire page is free (all
> slabs have been kfree'd so to speak). Normal reclaim will definitely
> free this page, but a lot of it depends on how frequently we are
> scanning the LRU list and when this page got added.
> 2. In the case of page cache (specifically unmapped page cache), there
> is duplication already, so why not go after unmapped page caches when
> the system is under memory pressure?
>
> In the case of 1, we don't force a dentry to be freed, but rather a
> freed page in the slab cache to be reclaimed ahead of forcing reclaim
> of mapped pages.
>

Sounds like this should be done unconditionally, then. An empty slab
page is worth less than an unmapped pagecache page at all times, no?

> Does the problem statement make sense? If so, do you agree with 1 and
> 2? Is there major concern about subverting regular reclaim? Does
> subverting it make sense in the duplicated scenario?
>
>

In the case of 2, how do you know there is duplication? You know the
guest caches the page, but you have no information about the host.
Since the page is cached in the guest, the host doesn't see it
referenced, and is likely to drop it.

If there is no duplication, then you may have dropped a recently-used
page and will likely cause a major fault soon.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Balbir Singh on 14 Jun 2010 09:00

* Avi Kivity <avi(a)redhat.com> [2010-06-14 15:40:28]:

> On 06/14/2010 11:48 AM, Balbir Singh wrote:
> >>>
> >>>In this case the order is as follows
> >>>
> >>>1. First we pick free pages if any
> >>>2. If we don't have free pages, we go after unmapped page cache and
> >>>slab cache
> >>>3. If that fails as well, we go after regularly memory
> >>>
> >>>In the scenario that you describe, we'll not be able to easily free up
> >>>the frequently referenced page from /etc/*. The code will move on to
> >>>step 3 and do its regular reclaim.
> >>Still it seems to me you are subverting the normal order of reclaim.
> >>I don't see why an unmapped page cache or slab cache item should be
> >>evicted before a mapped page. Certainly the cost of rebuilding a
> >>dentry compared to the gain from evicting it, is much higher than
> >>that of reestablishing a mapped page.
> >>
> >Subverting to aviod memory duplication, the word subverting is
> >overloaded,
>
> Right, should have used a different one.
>
> >let me try to reason a bit. First let me explain the
> >problem
> >
> >Memory is a precious resource in a consolidated environment.
> >We don't want to waste memory via page cache duplication
> >(cache=writethrough and cache=writeback mode).
> >
> >Now here is what we are trying to do
> >
> >1. A slab page will not be freed until the entire page is free (all
> >slabs have been kfree'd so to speak). Normal reclaim will definitely
> >free this page, but a lot of it depends on how frequently we are
> >scanning the LRU list and when this page got added.
> >2. In the case of page cache (specifically unmapped page cache), there
> >is duplication already, so why not go after unmapped page caches when
> >the system is under memory pressure?
> >
> >In the case of 1, we don't force a dentry to be freed, but rather a
> >freed page in the slab cache to be reclaimed ahead of forcing reclaim
> >of mapped pages.
>
> Sounds like this should be done unconditionally, then. An empty
> slab page is worth less than an unmapped pagecache page at all
> times, no?
>

In a consolidated environment, even at the cost of some CPU to run
shrinkers, I think potentially yes.

> >Does the problem statement make sense? If so, do you agree with 1 and
> >2? Is there major concern about subverting regular reclaim? Does
> >subverting it make sense in the duplicated scenario?
> >
>
> In the case of 2, how do you know there is duplication? You know
> the guest caches the page, but you have no information about the
> host. Since the page is cached in the guest, the host doesn't see
> it referenced, and is likely to drop it.

True, that is why the first patch is controlled via a boot parameter
that the host can pass. For the second patch, I think we'll need
something like a balloon <size> <cache?> with the cache argument being
optional.

>
> If there is no duplication, then you may have dropped a
> recently-used page and will likely cause a major fault soon.
>

Yes, agreed.

--
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 14 Jun 2010 09:10

On 06/14/2010 03:50 PM, Balbir Singh wrote:
>
>>
>>> let me try to reason a bit. First let me explain the
>>> problem
>>>
>>> Memory is a precious resource in a consolidated environment.
>>> We don't want to waste memory via page cache duplication
>>> (cache=writethrough and cache=writeback mode).
>>>
>>> Now here is what we are trying to do
>>>
>>> 1. A slab page will not be freed until the entire page is free (all
>>> slabs have been kfree'd so to speak). Normal reclaim will definitely
>>> free this page, but a lot of it depends on how frequently we are
>>> scanning the LRU list and when this page got added.
>>> 2. In the case of page cache (specifically unmapped page cache), there
>>> is duplication already, so why not go after unmapped page caches when
>>> the system is under memory pressure?
>>>
>>> In the case of 1, we don't force a dentry to be freed, but rather a
>>> freed page in the slab cache to be reclaimed ahead of forcing reclaim
>>> of mapped pages.
>>>
>> Sounds like this should be done unconditionally, then. An empty
>> slab page is worth less than an unmapped pagecache page at all
>> times, no?
>>
>>
> In a consolidated environment, even at the cost of some CPU to run
> shrinkers, I think potentially yes.
>

I don't understand. If you're running the shrinkers then you're
evicting live entries, which could cost you an I/O each. That's
expensive, consolidated or not.

If you're not running the shrinkers, why does it matter if you're
consolidated or not? Drop that age unconditionally.

>>> Does the problem statement make sense? If so, do you agree with 1 and
>>> 2? Is there major concern about subverting regular reclaim? Does
>>> subverting it make sense in the duplicated scenario?
>>>
>>>
>> In the case of 2, how do you know there is duplication? You know
>> the guest caches the page, but you have no information about the
>> host. Since the page is cached in the guest, the host doesn't see
>> it referenced, and is likely to drop it.
>>
> True, that is why the first patch is controlled via a boot parameter
> that the host can pass. For the second patch, I think we'll need
> something like a balloon<size> <cache?> with the cache argument being
> optional.
>

Whether a page is duplicated on the host or not is per-page, it cannot
be a boot parameter.

If we drop unmapped pagecache pages, we need to be sure they can be
backed by the host, and that depends on the amount of sharing.

Overall, I don't see how a user can tune this. If I were a guest admin,
I'd play it safe by not assuming the host will back me, and disabling
the feature.

To get something like this to work, we need to reward cooperating guests
somehow.

>> If there is no duplication, then you may have dropped a
>> recently-used page and will likely cause a major fault soon.
>>
> Yes, agreed.
>

So how do we deal with this?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: Dear Account Owner,
Next: [PATCH] rt3070: Fixed a line over 80 character warning reported by checkpatch.pl tool