Add yield hypercall for KVM guests [Kernel]

Prev: Tight check of pfn_valid on sparsemem - v4
Next: Paravirt-spinlock implementation for KVM guests (Version 0)

From: Jeremy Fitzhardinge on 26 Jul 2010 13:20

On 07/25/2010 11:14 PM, Srivatsa Vaddagiri wrote:
> Add KVM hypercall for yielding vcpu timeslice.

Can you do a directed yield?

J

> Signed-off-by: Srivatsa Vaddagiri<vatsa(a)linux.vnet.ibm.com>
>
> ---
> arch/x86/include/asm/kvm_para.h | 1 +
> arch/x86/kvm/x86.c | 7 ++++++-
> include/linux/kvm.h | 1 +
> include/linux/kvm_para.h | 1 +
> 4 files changed, 9 insertions(+), 1 deletion(-)
>
> Index: current/arch/x86/include/asm/kvm_para.h
> ===================================================================
> --- current.orig/arch/x86/include/asm/kvm_para.h
> +++ current/arch/x86/include/asm/kvm_para.h
> @@ -16,6 +16,7 @@
> #define KVM_FEATURE_CLOCKSOURCE 0
> #define KVM_FEATURE_NOP_IO_DELAY 1
> #define KVM_FEATURE_MMU_OP 2
> +#define KVM_FEATURE_YIELD 4
> /* This indicates that the new set of kvmclock msrs
> * are available. The use of 0x11 and 0x12 is deprecated
> */
> Index: current/arch/x86/kvm/x86.c
> ===================================================================
> --- current.orig/arch/x86/kvm/x86.c
> +++ current/arch/x86/kvm/x86.c
> @@ -1618,6 +1618,7 @@ int kvm_dev_ioctl_check_extension(long e
> case KVM_CAP_PCI_SEGMENT:
> case KVM_CAP_DEBUGREGS:
> case KVM_CAP_X86_ROBUST_SINGLESTEP:
> + case KVM_CAP_YIELD_HYPERCALL:
> r = 1;
> break;
> case KVM_CAP_COALESCED_MMIO:
> @@ -1993,7 +1994,8 @@ static void do_cpuid_ent(struct kvm_cpui
> entry->eax = (1<< KVM_FEATURE_CLOCKSOURCE) |
> (1<< KVM_FEATURE_NOP_IO_DELAY) |
> (1<< KVM_FEATURE_CLOCKSOURCE2) |
> - (1<< KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
> + (1<< KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
> + (1<< KVM_FEATURE_YIELD);
> entry->ebx = 0;
> entry->ecx = 0;
> entry->edx = 0;
> @@ -4245,6 +4247,9 @@ int kvm_emulate_hypercall(struct kvm_vcp
> case KVM_HC_MMU_OP:
> r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2),&ret);
> break;
> + case KVM_HC_YIELD:
> + ret = 0;
> + yield();
> default:
> ret = -KVM_ENOSYS;
> break;
> Index: current/include/linux/kvm.h
> ===================================================================
> --- current.orig/include/linux/kvm.h
> +++ current/include/linux/kvm.h
> @@ -524,6 +524,7 @@ struct kvm_enable_cap {
> #define KVM_CAP_PPC_OSI 52
> #define KVM_CAP_PPC_UNSET_IRQ 53
> #define KVM_CAP_ENABLE_CAP 54
> +#define KVM_CAP_YIELD_HYPERCALL 55
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> Index: current/include/linux/kvm_para.h
> ===================================================================
> --- current.orig/include/linux/kvm_para.h
> +++ current/include/linux/kvm_para.h
> @@ -17,6 +17,7 @@
>
> #define KVM_HC_VAPIC_POLL_IRQ 1
> #define KVM_HC_MMU_OP 2
> +#define KVM_HC_YIELD 3
>
> /*
> * hypercalls use architecture specific
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Srivatsa Vaddagiri on 28 Jul 2010 11:00

On Mon, Jul 26, 2010 at 10:19:41AM -0700, Jeremy Fitzhardinge wrote:
> On 07/25/2010 11:14 PM, Srivatsa Vaddagiri wrote:
> >Add KVM hypercall for yielding vcpu timeslice.
>
> Can you do a directed yield?

We don't have that support yet in Linux scheduler. Also I feel it would be more
useful when the target vcpu and yielding vcpu are on the same physical cpu,
rather than when they are on separate cpus. With latter, yielding (or
donating) timeslice need not ensure that target vcpu runs immediately
and also I suspect fairness issues needs to be tackled as well (large number of
waiters shouldn't boot a lock-holders time slice too much that it gets a
larger share).

- vatsa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 2 Aug 2010 04:40

On 07/26/2010 08:19 PM, Jeremy Fitzhardinge wrote:
> On 07/25/2010 11:14 PM, Srivatsa Vaddagiri wrote:
>> Add KVM hypercall for yielding vcpu timeslice.
>
> Can you do a directed yield?
>

A problem with directed yield is figuring out who to yield to. One idea
is to look for a random vcpu that is not running and donate some runtime
to it. In the best case, it's the lock holder and we cause it to start
running. Middle case it's not the lock holder, but we lose enough
runtime to stop running, so at least we don't waste cpu. Worst case we
continue running not having woken the lock holder. Spin again, yield
again hoping to find the right vcpu.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 2 Aug 2010 04:50

On 07/28/2010 05:55 PM, Srivatsa Vaddagiri wrote:
> On Mon, Jul 26, 2010 at 10:19:41AM -0700, Jeremy Fitzhardinge wrote:
>> On 07/25/2010 11:14 PM, Srivatsa Vaddagiri wrote:
>>> Add KVM hypercall for yielding vcpu timeslice.
>> Can you do a directed yield?
> We don't have that support yet in Linux scheduler.

If you think it's useful, it would be good to design it into the
interface, and fall back to ordinary yield if the host doesn't support it.

A big advantage of directed yield vs yield is that you conserve
resources within a VM; a simple yield will cause the guest to drop its
share of cpu to other guest.

Made up example:

- 2 vcpu guest with 10% contention
- lock hold time 10us every 100us
- timeslice 1ms

Ideally this guest can consume 190% cpu (sleeping whenever there is
contention). But if we yield when we detect contention, then we sleep
for 1ms, and utilization drops to around 100%-150% (a vcpu will usually
fall asleep soon within a few 100us periods).

> Also I feel it would be more
> useful when the target vcpu and yielding vcpu are on the same physical cpu,
> rather than when they are on separate cpus. With latter, yielding (or
> donating) timeslice need not ensure that target vcpu runs immediately

Donate at least the amount needed to wake up the other vcpu, we can
calculate it during wakeup.

> and also I suspect fairness issues needs to be tackled as well (large number of
> waiters shouldn't boot a lock-holders time slice too much that it gets a
> larger share).

I feel ordinary yield suffers from fairness a lot more.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ryan Harper on 2 Aug 2010 10:50

* Avi Kivity <avi(a)redhat.com> [2010-08-02 03:33]:
> On 07/26/2010 08:19 PM, Jeremy Fitzhardinge wrote:
> > On 07/25/2010 11:14 PM, Srivatsa Vaddagiri wrote:
> >>Add KVM hypercall for yielding vcpu timeslice.
> >
> >Can you do a directed yield?
> >
>
> A problem with directed yield is figuring out who to yield to. One idea
> is to look for a random vcpu that is not running and donate some runtime
> to it. In the best case, it's the lock holder and we cause it to start
> running. Middle case it's not the lock holder, but we lose enough
> runtime to stop running, so at least we don't waste cpu. Worst case we
> continue running not having woken the lock holder. Spin again, yield
> again hoping to find the right vcpu.

It's been quite some time, but played with directed yielding for Xen[1]
and we were looking to model the POWER directed yield (H_CONFER) where
the lock holding vcpu was indiciated in the spinlock. When acquiring
the lock, record the vcpu id. When another vcpu attempts to acquire the
lock if it can't it can yield its time to the lock holder.

1. http://lists.xensource.com/archives/html/xen-devel/2005-05/msg00776.html

--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh(a)us.ibm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2
Prev: Tight check of pfn_valid on sparsemem - v4
Next: Paravirt-spinlock implementation for KVM guests (Version 0)