Linux/Guest cooperative unmapped page cache control [Kernel]

Prev: Dear Account Owner,
Next: [PATCH] rt3070: Fixed a line over 80 character warning reported by checkpatch.pl tool

From: Avi Kivity on 16 Jun 2010 07:40

On 06/15/2010 05:47 PM, Dave Hansen wrote:
>
>> That's a bug that needs to be fixed. Eventually the host will come
>> under pressure and will balloon the guest. If that kills the guest, the
>> ballooning is not effective as a host memory management technique.
>>
> I'm not convinced that it's just a bug that can be fixed. Consider a
> case where a host sees a guest with 100MB of free memory at the exact
> moment that a database app sees that memory. The host tries to balloon
> that memory away at the same time that the app goes and allocates it.
> That can certainly lead to an OOM very quickly, even for very small
> amounts of memory (much less than 100MB). Where's the bug?
>
> I think the issues are really fundamental to ballooning.
>

There are two issues involved.

One is, can the kernel accurately determine the amount of memory it
needs to work? We have resources such as RAM and swap. We have
liabilities in the form of swappable userspace memory, mlocked userspace
memory, kernel memory to support these, and various reclaimable and
non-reclaimable kernel caches. Can we determine the minimum amount of
RAM to support are workload at a point in time?

If we had this, we could modify the balloon to refuse to balloon if it
takes the kernel beneath the minimum amount of RAM needed.

In fact, this is similar to allocating memory with overcommit_memory =
0. The difference is the balloon allocates mlocked memory, while normal
allocations can be charged against swap. But fundamentally it's the same.

>>> If all the guests do this, then it leaves that much more free memory on
>>> the host, which can be used flexibly for extra host page cache, new
>>> guests, etc...
>>>
>> If the host detects lots of pagecache misses it can balloon guests
>> down. If pagecache is quiet, why change anything?
>>
> Page cache misses alone are not really sufficient. This is the classic
> problem where we try to differentiate streaming I/O (which we can't
> effectively cache) from I/O which can be effectively cached.
>

True. Random I/O across a very large dataset is also difficult to cache.

>> If the host wants to start new guests, it can balloon guests down. If
>> no new guests are wanted, why change anything?
>>
> We're talking about an environment which we're always trying to
> optimize. Imagine that we're always trying to consolidate guests on to
> smaller numbers of hosts. We're effectively in a state where we
> _always_ want new guests.
>

If this came at no cost to the guests, you'd be right. But at some
point guest performance will be hit by this, so the advantage gained
from freeing memory will be balanced by the disadvantage.

Also, memory is not the only resource. At some point you become cpu
bound; at that point freeing memory doesn't help and in fact may
increase your cpu load.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Balbir Singh on 17 Jun 2010 02:10

* Avi Kivity <avi(a)redhat.com> [2010-06-16 14:39:02]:

> >We're talking about an environment which we're always trying to
> >optimize. Imagine that we're always trying to consolidate guests on to
> >smaller numbers of hosts. We're effectively in a state where we
> >_always_ want new guests.
>
> If this came at no cost to the guests, you'd be right. But at some
> point guest performance will be hit by this, so the advantage gained
> from freeing memory will be balanced by the disadvantage.
>
> Also, memory is not the only resource. At some point you become cpu
> bound; at that point freeing memory doesn't help and in fact may
> increase your cpu load.
>

We'll probably need control over other resources as well, but IMHO
memory is the most precious because it is non-renewable.

--
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev |
Pages: 1 2 3 4 5 6
Prev: Dear Account Owner,
Next: [PATCH] rt3070: Fixed a line over 80 character warning reported by checkpatch.pl tool