TSC reset compensation [Kernel]

Prev: Fix a possible backwards warp of kvmclock
Next: [PATCH] perf: excluding "." and ".." directories when calculating tids.

From: Zachary Amsden on 16 Jun 2010 18:40

On 06/16/2010 03:52 AM, Glauber Costa wrote:
> On Mon, Jun 14, 2010 at 09:34:18PM -1000, Zachary Amsden wrote:
>
>> Attempt to synchronize TSCs which are reset to the same value. In the
>> case of a reliable hardware TSC, we can just re-use the same offset, but
>> on non-reliable hardware, we can get closer by adjusting the offset to
>> match the elapsed time.
>>
>> Signed-off-by: Zachary Amsden<zamsden(a)redhat.com>
>> ---
>> arch/x86/kvm/x86.c | 34 ++++++++++++++++++++++++++++++++--
>> 1 files changed, 32 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 8e836e9..cedb71f 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -937,14 +937,44 @@ static inline void kvm_request_guest_time_update(struct kvm_vcpu *v)
>> set_bit(KVM_REQ_CLOCK_SYNC,&v->requests);
>> }
>>
>> +static inline int kvm_tsc_reliable(void)
>> +{
>> + return (boot_cpu_has(X86_FEATURE_CONSTANT_TSC)&&
>> + boot_cpu_has(X86_FEATURE_NONSTOP_TSC)&&
>> + !check_tsc_unstable());
>> +}
>> +
>>
> why can't we re-use vmware TSC_RELIABLE flag?
>

It's only set for VMware. Basically, it means "you are running in a
VMware hypervisor, TSC is reliable". Which KVM won't ever be, at least,
not in production use, so it doesn't make that sort of sense here.
Besides, a system with a reliable TSC can become a system without a
reliable TSC : CPU hotplug will always guarantee this.

We could, however, have the guest set the TSC_RELIABLE flag for itself
if KVM somehow makes that promise (currently, it does not).

Zach
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Rik van Riel on 13 Jul 2010 18:20

On 07/12/2010 10:25 PM, Zachary Amsden wrote:
> Attempt to synchronize TSCs which are reset to the same value. In the
> case of a reliable hardware TSC, we can just re-use the same offset, but
> on non-reliable hardware, we can get closer by adjusting the offset to
> match the elapsed time.
>
> Signed-off-by: Zachary Amsden<zamsden(a)redhat.com>

Acked-by: Rik van Riel <riel(a)redhat.com>

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 18 Jul 2010 10:40

On 07/13/2010 05:25 AM, Zachary Amsden wrote:
> Attempt to synchronize TSCs which are reset to the same value. In the
> case of a reliable hardware TSC, we can just re-use the same offset, but
> on non-reliable hardware, we can get closer by adjusting the offset to
> match the elapsed time.
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 3b4efe2..4b42893 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -396,6 +396,9 @@ struct kvm_arch {
> unsigned long irq_sources_bitmap;
> s64 kvmclock_offset;
> spinlock_t tsc_write_lock;
> + u64 last_tsc_nsec;
> + u64 last_tsc_offset;
> + u64 last_tsc_write;
>

So that we know what the lock protects, let's have

struct kvm_global_tsc {
spinlock_t lock;
...
} tsc;

> @@ -896,10 +896,39 @@ static DEFINE_PER_CPU(unsigned long, cpu_tsc_khz);
> void guest_write_tsc(struct kvm_vcpu *vcpu, u64 data)
> {
> struct kvm *kvm = vcpu->kvm;
> - u64 offset;
> + u64 offset, ns, elapsed;
> + struct timespec ts;
>
> spin_lock(&kvm->arch.tsc_write_lock);
> offset = data - native_read_tsc();
> + ktime_get_ts(&ts);
> + monotonic_to_bootbased(&ts);
> + ns = timespec_to_ns(&ts);
> + elapsed = ns - kvm->arch.last_tsc_nsec;
> +
> + /*
> + * Special case: identical write to TSC within 5 seconds of
> + * another CPU is interpreted as an attempt to synchronize
> + * (the 5 seconds is to accomodate host load / swapping).
> + *
> + * In that case, for a reliable TSC, we can match TSC offsets,
> + * or make a best guest using kernel_ns value.
> + */
> + if (data == kvm->arch.last_tsc_write&& elapsed< 5ULL * NSEC_PER_SEC) {
> + if (!check_tsc_unstable()) {
> + offset = kvm->arch.last_tsc_offset;
> + pr_debug("kvm: matched tsc offset for %llu\n", data);
> + } else {
> + u64 tsc_delta = elapsed * __get_cpu_var(cpu_tsc_khz);
> + tsc_delta = tsc_delta / USEC_PER_SEC;
> + offset += tsc_delta;
> + pr_debug("kvm: adjusted tsc offset by %llu\n", tsc_delta);
> + }
> + ns = kvm->arch.last_tsc_nsec;
> + }
> + kvm->arch.last_tsc_nsec = ns;
> + kvm->arch.last_tsc_write = data;
> + kvm->arch.last_tsc_offset = offset;
>

We'd have a false alarm here during a reset within 5 seconds of boot.
Does it matter? Easy to work around by forgetting the state during reset.

> kvm_x86_ops->write_tsc_offset(vcpu, offset);
> spin_unlock(&kvm->arch.tsc_write_lock);
>
>

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Zachary Amsden on 19 Jul 2010 16:10

On 07/18/2010 04:34 AM, Avi Kivity wrote:
> On 07/13/2010 05:25 AM, Zachary Amsden wrote:
>> Attempt to synchronize TSCs which are reset to the same value. In the
>> case of a reliable hardware TSC, we can just re-use the same offset, but
>> on non-reliable hardware, we can get closer by adjusting the offset to
>> match the elapsed time.
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h
>> b/arch/x86/include/asm/kvm_host.h
>> index 3b4efe2..4b42893 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -396,6 +396,9 @@ struct kvm_arch {
>> unsigned long irq_sources_bitmap;
>> s64 kvmclock_offset;
>> spinlock_t tsc_write_lock;
>> + u64 last_tsc_nsec;
>> + u64 last_tsc_offset;
>> + u64 last_tsc_write;
>
> So that we know what the lock protects, let's have
>
> struct kvm_global_tsc {
> spinlock_t lock;
> ...
> } tsc;
>
>> @@ -896,10 +896,39 @@ static DEFINE_PER_CPU(unsigned long, cpu_tsc_khz);
>> void guest_write_tsc(struct kvm_vcpu *vcpu, u64 data)
>> {
>> struct kvm *kvm = vcpu->kvm;
>> - u64 offset;
>> + u64 offset, ns, elapsed;
>> + struct timespec ts;
>>
>> spin_lock(&kvm->arch.tsc_write_lock);
>> offset = data - native_read_tsc();
>> + ktime_get_ts(&ts);
>> + monotonic_to_bootbased(&ts);
>> + ns = timespec_to_ns(&ts);
>> + elapsed = ns - kvm->arch.last_tsc_nsec;
>> +
>> + /*
>> + * Special case: identical write to TSC within 5 seconds of
>> + * another CPU is interpreted as an attempt to synchronize
>> + * (the 5 seconds is to accomodate host load / swapping).
>> + *
>> + * In that case, for a reliable TSC, we can match TSC offsets,
>> + * or make a best guest using kernel_ns value.
>> + */
>> + if (data == kvm->arch.last_tsc_write&& elapsed< 5ULL *
>> NSEC_PER_SEC) {
>> + if (!check_tsc_unstable()) {
>> + offset = kvm->arch.last_tsc_offset;
>> + pr_debug("kvm: matched tsc offset for %llu\n", data);
>> + } else {
>> + u64 tsc_delta = elapsed * __get_cpu_var(cpu_tsc_khz);
>> + tsc_delta = tsc_delta / USEC_PER_SEC;
>> + offset += tsc_delta;
>> + pr_debug("kvm: adjusted tsc offset by %llu\n", tsc_delta);
>> + }
>> + ns = kvm->arch.last_tsc_nsec;
>> + }
>> + kvm->arch.last_tsc_nsec = ns;
>> + kvm->arch.last_tsc_write = data;
>> + kvm->arch.last_tsc_offset = offset;
>
> We'd have a false alarm here during a reset within 5 seconds of boot.
> Does it matter? Easy to work around by forgetting the state during
> reset.
>

Not forgetting, but ignoring; reset within 5 seconds will not reset TSC,
which normally is fine. The problem is that one CPU could reset within
5 seconds and one slightly after. Forgetting during reset is a good
solution.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev |
Pages: 1 2
Prev: Fix a possible backwards warp of kvmclock
Next: [PATCH] perf: excluding "." and ".." directories when calculating tids.