From: Chetan Loke on
Hello Dmitry,

On Wed, Jun 30, 2010 at 3:27 PM, Dmitry Torokhov <dtor(a)vmware.com> wrote:
> Hi Chetan,
>
> On Wednesday, June 30, 2010 11:42:53 am Chetan Loke wrote:
>> Q1)Does vmtools handle pvscsi correctly?
>>
>
> Yes, as long as it compiled as a module or installer will not overwrite
> distribution-supplied version unless user explicitly requests installer
> to clobber it.
>
perfect.

> So far distributions have not tried building their kernels with pvscsi
> or vmxnet3 built-in, but did so with our ballon driver, which prompted
> this particular change.
>
We are building iso's which will then be used to build/create an ESX
appliance. So we would need the pvscsi driver from the start. vNICs
will be populated post-install. At which point vmxnet[2/3] will
kick-in via vmtools.

>> Q2)In case if a VM wants to be a good citizen, is there a way for a
>> guest to know about the balloon-event?
>
> I am not sure I follow. Ballooning supposed to be as transparent as
> possible...
>
This is too product specific. I will send you an email separately.


>> Q3)What if an app mlock's its memory resources and driver's have
>> pinned down their pages then how does inflation work?
>
> We will inflate as much as we can. Obviously if there are no more
> memory balloon may not grow to its full target size.
>
> Balloon driver communicates to the hypervisor the total amount of
> memory in the guest, we may want to adjust that number by subtracting
> memory allocated by the kernel, mlocked memory and so on, but it is
> not done currently.
Ok.

I'm stuck with one question -

A) Ballooning will trigger guest's native memory management policy.
A.1) So this could mean guest might swap it's pages on it's vdisk, correct?

Consider this setup -
B) VM1..VMn have backing store(data and OS partitions) on LUNs(SAN).
Further, data LUNs are mounted as RDMs. I chose RDMs just to keep it
simple.
C) Say there's memory pressure. How? Well, few VM's are blasting I/O
to the LUNs. Plus, a backup triggered. Plus, whatever else happened.
C.1) VM's now seem to need more and more memory.
C.2) hypervisors block-layer/other-layers also need more memory.
C.3) Hypervisor's memory-management algorithm kicks-in.
......
C.3.x) Ballooning triggers - now some VM's (excluding the ones
from C.1) are giving up memory and if A.1) above is true then the
guest's pages will be swapped out on the LUNs via
hypervisor's SCSI-LLDD. But look at C.2) above. Is
this a soft-deadlock?

Oh, it's a linux-guest and if C.1) timesout then the guest will send
aborts and eventually a LUN reset ;).

In this particular case, if my suspicion is valid and if all the
signatures match(swap is out on the SAN, block-congestion etc) then
the balloon driver could just bail out.

> Thanks.
> --
> Dmitry

Thanks
Chetan Loke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dmitry Torokhov on
On Wednesday, June 30, 2010 02:26:40 pm Chetan Loke wrote:
> Hello Dmitry,
>
> On Wed, Jun 30, 2010 at 3:27 PM, Dmitry Torokhov <dtor(a)vmware.com> wrote:
> > Hi Chetan,
> >
> > On Wednesday, June 30, 2010 11:42:53 am Chetan Loke wrote:
> >> Q1)Does vmtools handle pvscsi correctly?
> >
> > Yes, as long as it compiled as a module or installer will not overwrite
> > distribution-supplied version unless user explicitly requests installer
> > to clobber it.
>
> perfect.
>
> > So far distributions have not tried building their kernels with pvscsi
> > or vmxnet3 built-in, but did so with our ballon driver, which prompted
> > this particular change.
>
> We are building iso's which will then be used to build/create an ESX
> appliance. So we would need the pvscsi driver from the start.

Well, with typical setup, even though pvscsi is a module, as long as it
is in initramfs it will still be loaded automatically. If you are building
truly custom appliance and require pvscsi built-in you'll have to modify
the tools installer script.

> vNICs
> will be populated post-install. At which point vmxnet[2/3] will
> kick-in via vmtools.

Depending on what you base your appliance vmxnet3 might be already in
the kernel along with pvscsi.

>
> >> Q2)In case if a VM wants to be a good citizen, is there a way for a
> >> guest to know about the balloon-event?
> >
> > I am not sure I follow. Ballooning supposed to be as transparent as
> > possible...
>
> This is too product specific. I will send you an email separately.
>

OK.

> >> Q3)What if an app mlock's its memory resources and driver's have
> >> pinned down their pages then how does inflation work?
> >
> > We will inflate as much as we can. Obviously if there are no more
> > memory balloon may not grow to its full target size.
> >
> > Balloon driver communicates to the hypervisor the total amount of
> > memory in the guest, we may want to adjust that number by subtracting
> > memory allocated by the kernel, mlocked memory and so on, but it is
> > not done currently.
>
> Ok.
>
> I'm stuck with one question -
>
> A) Ballooning will trigger guest's native memory management policy.
> A.1) So this could mean guest might swap it's pages on it's vdisk,
> correct?
>

Yes.

> Consider this setup -
> B) VM1..VMn have backing store(data and OS partitions) on LUNs(SAN).
> Further, data LUNs are mounted as RDMs. I chose RDMs just to keep it
> simple.
> C) Say there's memory pressure. How? Well, few VM's are blasting I/O
> to the LUNs. Plus, a backup triggered. Plus, whatever else happened.
> C.1) VM's now seem to need more and more memory.
> C.2) hypervisors block-layer/other-layers also need more memory.
> C.3) Hypervisor's memory-management algorithm kicks-in.
> ......
> C.3.x) Ballooning triggers - now some VM's (excluding the ones
> from C.1) are giving up memory and if A.1) above is true then the
> guest's pages will be swapped out on the LUNs via
> hypervisor's SCSI-LLDD. But look at C.2) above. Is
> this a soft-deadlock?
>

If there is no memory something will have to give up. If you look at
the ballon driver you will see that when it switches from non-sleeping
to sleeping allocations or otherwise starts getting allocation errors
it will throttle the inflation rates to give the box a "breather" and
not choke it completely right then and there.

> Oh, it's a linux-guest and if C.1) timesout then the guest will send
> aborts and eventually a LUN reset ;).
>
> In this particular case, if my suspicion is valid and if all the
> signatures match(swap is out on the SAN, block-congestion etc) then
> the balloon driver could just bail out.
>

Yes, it is not guaranteed that ballon will reach this target, and in
this case host itself might start swapping causing severe performance
issues.

Realistically it all boils down to this: even though you may overcommit
you still have to adequately provision your hosts so they could handle
the load.

Thanks.

--
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alexander Clouter on
Dmitry Torokhov <dtor(a)vmware.com> wrote:
>
>> > Now we have 2 drivers fighting. There is no backing device and so driver
>> > core will not save us by refusing to bind to already claimed device.
>>
>> If vmware_balloon is present in /sys/modules or is loaded, don't load
>> vmmemctl. And vice versa.
>>
>> I dunno - it's silly for me to sit here proposing solutions. it's
>> better that you do it!
>
> Unfortunately I do not have a good solution at the moment. I guess we'll
> have to work with distributions to make sure they keep it as a module
> (it also makes most sense for them since not everyone runs on our
> platform).
>
I cannot seriously believe you are considering a viable solution is
"everyone[1] must abide by these rules otherwise our installer might
barf". The only benefactor of this patch is your installer and the
effect is an undocumented and peculiar constraint on a kernel module.

Seriously, add sometime so that you get something in /sys/modules (maybe
it's time for something in /sys/class?) or maybe do something so that
you have: VMWARE_BALLOON_CMD(STATUS, ...) where the guest can say if
there is already something ballooning for it. Surely the guest should
be aware if there is more than one balloon driver at play?

I think a friend of mine summed it up rather well: "Fixing the kernel
instead of fixing the VMWare installer is an inspired move".

Cheers

[1] the dropdown menu on distrowatch lists 319 distrubutions

--
Alexander Clouter
..sigmonster says: May the bluebird of happiness twiddle your bits.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/