VMware Balloon driver [Kernel]

Prev: [PATCH] /dev/mem: Allow rewinding
Next: drivers/uwb: Rename dev_info to wdi

From: Andrew Morton on 5 Apr 2010 17:30

On Sun, 4 Apr 2010 14:52:02 -0700
Dmitry Torokhov <dtor(a)vmware.com> wrote:

> This is standalone version of VMware Balloon driver. Unlike previous
> version, that tried to integrate VMware ballooning transport into virtio
> subsystem, and use stock virtio_ballon driver, this one implements both
> controlling thread/algorithm and hypervisor transport.
>
> We are submitting standalone driver because KVM maintainer (Avi Kivity)
> expressed opinion (rightly) that our transport does not fit well into
> virtqueue paradigm and thus it does not make much sense to integrate
> with virtio.
>

I think I've forgotten what balloon drivers do. Are they as nasty a
hack as I remember believing them to be?

A summary of what this code sets out to do, and how it does it would be
useful.

Also please explain the applicability of this driver. Will xen use it?
kvm? Out-of-tree code?

The code implements a user-visible API (in /proc, at least). Please
fully describe the proposed interface(s) in the changelog so we can
review and understand that proposal.

>
> ...
>
> +static bool vmballoon_send_start(struct vmballoon *b)
> +{
> + unsigned long status, dummy;
> +
> + STATS_INC(b->stats.start);
> +
> + status = VMWARE_BALLOON_CMD(START, VMW_BALLOON_PROTOCOL_VERSION, dummy);
> + if (status == VMW_BALLOON_SUCCESS)
> + return true;
> +
> + pr_debug("%s - failed, hv returns %ld\n", __func__, status);

The code refers to something called "hv". I suspect that's stale?

> + STATS_INC(b->stats.start_fail);
> + return false;
> +}
> +
> +static bool vmballoon_check_status(struct vmballoon *b, unsigned long status)
> +{
> + switch (status) {
> + case VMW_BALLOON_SUCCESS:
> + return true;
> +
> + case VMW_BALLOON_ERROR_RESET:
> + b->reset_required = true;
> + /* fall through */
> +
> + default:
> + return false;
> + }
> +}
> +
> +static bool vmballoon_send_guest_id(struct vmballoon *b)
> +{
> + unsigned long status, dummy;
> +
> + status = VMWARE_BALLOON_CMD(GUEST_ID, VMW_BALLOON_GUEST_ID, dummy);
> +
> + STATS_INC(b->stats.guest_type);
> +
> + if (vmballoon_check_status(b, status))
> + return true;
> +
> + pr_debug("%s - failed, hv returns %ld\n", __func__, status);
> + STATS_INC(b->stats.guest_type_fail);
> + return false;
> +}

The lack of comments makes it all a bit hard to take in.

>
> ...
>
> +static int __init vmballoon_init(void)
> +{
> + int error;
> +
> + /*
> + * Check if we are running on VMware's hypervisor and bail out
> + * if we are not.
> + */
> + if (!vmware_platform())
> + return -ENODEV;
> +
> + vmballoon_wq = create_freezeable_workqueue("vmmemctl");
> + if (!vmballoon_wq) {
> + pr_err("failed to create workqueue\n");
> + return -ENOMEM;
> + }
> +
> + /* initialize global state */
> + memset(&balloon, 0, sizeof(balloon));

The memset seems to be unneeded.

> + INIT_LIST_HEAD(&balloon.pages);
> + INIT_LIST_HEAD(&balloon.refused_pages);
> +
> + /* initialize rates */
> + balloon.rate_alloc = VMW_BALLOON_RATE_ALLOC_MAX;
> + balloon.rate_free = VMW_BALLOON_RATE_FREE_MAX;
> +
> + INIT_DELAYED_WORK(&balloon.dwork, vmballoon_work);
> +
> + /*
> + * Start balloon.
> + */
> + if (!vmballoon_send_start(&balloon)) {
> + pr_err("failed to send start command to the host\n");
> + error = -EIO;
> + goto fail;
> + }
> +
> + if (!vmballoon_send_guest_id(&balloon)) {
> + pr_err("failed to send guest ID to the host\n");
> + error = -EIO;
> + goto fail;
> + }
> +
> + error = vmballoon_procfs_init(&balloon);
> + if (error)
> + goto fail;
> +
> + queue_delayed_work(vmballoon_wq, &balloon.dwork, 0);
> +
> + return 0;
> +
> +fail:
> + destroy_workqueue(vmballoon_wq);
> + return error;
> +}
>
> ...
>

Oh well, ho hum. Help is needed on working out what to do about this,
please.

Congrats on the new job, btw ;)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jeremy Fitzhardinge on 5 Apr 2010 18:10

On 04/05/2010 02:24 PM, Andrew Morton wrote:
> I think I've forgotten what balloon drivers do. Are they as nasty a
> hack as I remember believing them to be?
>

(I haven't looked at Dmitry's patch yet, so this is from the Xen
perspective.)

In the simplest form, they just look like a driver which allocates a
pile of pages, and the underlying memory gets returned to the
hypervisor. When you want the memory back, it reattaches memory to the
pageframes and releases the memory back to the kernel. This allows a
virtual machine to shrink with respect to its original size.

Going the other way - expanding beyond the memory allocation - is a bit
trickier because you need to get some new page structures from
somewhere. We don't do this in Xen yet, but I've done some experiments
with hotplug memory to implement this. Or a simpler approach is to fake
up some reserved E820 ranges to grow into.

> A summary of what this code sets out to do, and how it does it would be
> useful.
>
> Also please explain the applicability of this driver. Will xen use it?
> kvm? Out-of-tree code?
>
The basic idea of the driver is to allow a guest system to give up
memory it isn't using so it can be reused by other virtual machines (or
the host itself).

Xen and KVM already have equivalents in the kernel. Now that I've had a
quick look at Dmitry's patch, it's certainly along the same lines as the
Xen code, but it isn't clear to me how much code they could end up
sharing. There's a couple of similar-looking loops, but the bulk of the
code appears to be VMware specific.

One area that would be very useful as common code would be some kind of
policy engine to drive the balloon driver. That is, something that can
look at the VM's state and say "we really have a couple hundred MB of
excess memory we could happily give back to the host". And - very
important - "don't go below X MB, because then we'll die in a flaming
swap storm".

At the moment this is driven by vendor-specific tools with heuristics of
varying degrees of sophistication (which could be as simple as
absolutely manual control). The problem has two sides because there's
the decision made by guests on how much memory they can afford to give
up, and also on the host side who knows what the system-wide memory
pressures are. And it can be affected by hypervisor-specific features,
such as whether pages can be transparently shared between domains,
demand-faulted from swap, etc.

And Dan Magenheimer is playing with a more fine-grained mechanism where
a guest kernel can draw on spare host memory without actually committing
that memory to the guest, which allows memory to be reallocated on the
fly with more fluidity.

> The code implements a user-visible API (in /proc, at least). Please
> fully describe the proposed interface(s) in the changelog so we can
> review and understand that proposal.
>

It seems to me that sysfs would be a better match. It would be nice to
try and avoid gratuitous differences.

>> ...
>>
>> +static bool vmballoon_send_start(struct vmballoon *b)
>> +{
>> + unsigned long status, dummy;
>> +
>> + STATS_INC(b->stats.start);
>> +
>> + status = VMWARE_BALLOON_CMD(START, VMW_BALLOON_PROTOCOL_VERSION, dummy);
>> + if (status == VMW_BALLOON_SUCCESS)
>> + return true;
>> +
>> + pr_debug("%s - failed, hv returns %ld\n", __func__, status);
>>
> The code refers to something called "hv". I suspect that's stale?
>

hv = hypervisor

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andrew Morton on 5 Apr 2010 18:20

On Mon, 05 Apr 2010 15:03:08 -0700
Jeremy Fitzhardinge <jeremy(a)goop.org> wrote:

> On 04/05/2010 02:24 PM, Andrew Morton wrote:
> > I think I've forgotten what balloon drivers do. Are they as nasty a
> > hack as I remember believing them to be?
> >
>
> (I haven't looked at Dmitry's patch yet, so this is from the Xen
> perspective.)
>
> In the simplest form, they just look like a driver which allocates a
> pile of pages, and the underlying memory gets returned to the
> hypervisor. When you want the memory back, it reattaches memory to the
> pageframes and releases the memory back to the kernel. This allows a
> virtual machine to shrink with respect to its original size.
>
> Going the other way - expanding beyond the memory allocation - is a bit
> trickier because you need to get some new page structures from
> somewhere. We don't do this in Xen yet, but I've done some experiments
> with hotplug memory to implement this. Or a simpler approach is to fake
> up some reserved E820 ranges to grow into.
>

Lots of stuff for Dmitry to add to his changelog ;)

> > A summary of what this code sets out to do, and how it does it would be
> > useful.
> >
> > Also please explain the applicability of this driver. Will xen use it?
> > kvm? Out-of-tree code?
> >
> The basic idea of the driver is to allow a guest system to give up
> memory it isn't using so it can be reused by other virtual machines (or
> the host itself).

So... does this differ in any fundamental way from what hibernation
does, via shrink_all_memory()?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 5 Apr 2010 18:30

On 04/06/2010 01:17 AM, Andrew Morton wrote:
>> The basic idea of the driver is to allow a guest system to give up
>> memory it isn't using so it can be reused by other virtual machines (or
>> the host itself).
>>
> So... does this differ in any fundamental way from what hibernation
> does, via shrink_all_memory()?
>

Just the _all_ bit, and the fact that we need to report the freed page
numbers to the hypervisor.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andrew Morton on 5 Apr 2010 18:50

On Tue, 06 Apr 2010 01:26:11 +0300
Avi Kivity <avi(a)redhat.com> wrote:

> On 04/06/2010 01:17 AM, Andrew Morton wrote:
> >> The basic idea of the driver is to allow a guest system to give up
> >> memory it isn't using so it can be reused by other virtual machines (or
> >> the host itself).
> >>
> > So... does this differ in any fundamental way from what hibernation
> > does, via shrink_all_memory()?
> >
>
> Just the _all_ bit, and the fact that we need to report the freed page
> numbers to the hypervisor.
>

So... why not tweak that, rather than implementing some parallel thing?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2 3 4 5 6 7
Prev: [PATCH] /dev/mem: Allow rewinding
Next: drivers/uwb: Rename dev_info to wdi