From: Zachary Amsden on
On 04/19/2010 12:54 AM, Avi Kivity wrote:
> On 04/19/2010 01:51 PM, Peter Zijlstra wrote:
>>
>>>> Right, so on x86 we have:
>>>>
>>>> X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency
>>>> independent, not that it doesn't stop in C states and similar fun
>>>> stuff.
>>>>
>>>> X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is
>>>> constant
>>>> and synced between cores.
>>>>
>>>>
>>> Sockets and boards too? (IOW, how reliable is TSC_RELIABLE)?
>> Not sure, IIRC we clear that when the TSC sync test fails, eg when we
>> mark the tsc clocksource unusable.
>
> Worrying. By the time we detect this the guest may already have
> gotten confused by clocks going backwards.

Upstream, we are marking the TSC unstable preemptively when hardware
which will eventually sync test is detected, so this should be fine.

Zach
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Marcelo Tosatti on
On Mon, Apr 19, 2010 at 03:25:43PM -0300, Glauber Costa wrote:
> On Mon, Apr 19, 2010 at 09:19:38AM -0700, Jeremy Fitzhardinge wrote:
> > On 04/19/2010 07:26 AM, Glauber Costa wrote:
> > >> Is the problem that the tscs are starting out of sync, or that they're
> > >> drifting relative to each other over time? Do the problems become worse
> > >> the longer the uptime? How large are the offsets we're talking about here?
> > >>
> > > The offsets usually seem pretty small, under a microsecond. So I don't think
> > > it has anything to do with tscs starting out of sync. Specially because the
> > > delta-based calculation has the exact purpose of handling that case.
> > >
> >
> > So you think they're drifting out of sync from an initially synced
> > state? If so, what would bound the drift?
> I think delta calculation introduces errors.

Yes.

> Marcelo can probably confirm it, but he has a nehalem with an appearently
> very good tsc source. Even this machine warps.
>
> It stops warping if we only write pvclock data structure once and forget it,
> (which only updated tsc_timestamp once), according to him.

Yes. So its not as if the guest visible TSCs go out of sync (they don't
on this machine Glauber mentioned, or even on a multi-core Core 2 Duo),
but the delta calculation is very hard (if not impossible) to get right.

The timewarps i've seen were in the 0-200ns range, and very rare (once
every 10 minutes or so).

> Obviously, we can't do that everywhere.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jeremy Fitzhardinge on
On 04/20/2010 11:54 AM, Avi Kivity wrote:
> On 04/20/2010 09:23 PM, Jeremy Fitzhardinge wrote:
>> On 04/20/2010 02:31 AM, Avi Kivity wrote:
>>
>>> btw, do you want this code in pvclock.c, or shall we keep it kvmclock
>>> specific?
>>>
>> I think its a pvclock-level fix. I'd been hoping to avoid having
>> something like this, but I think its ultimately necessary.
>>
>
> Did you observe drift on Xen, or is this "ultimately" pointing at the
> future?

People are reporting weirdnesses that "clocksource=jiffies" apparently
resolves. Xen and KVM are faced with the same hardware constraints, and
it wouldn't surprise me if there were small measurable
non-monotonicities in the PV clock under Xen. May as well be safe.

Of course, it kills any possibility of being able to usefully export
this interface down to usermode.

My main concern about this kind of simple fix is that if there's a long
term systematic drift between different CPU's tscs, then this will
somewhat mask the problem while giving really awful time measurement on
the "slow" CPU(s). In that case it really needs to adjust the scaling
factor to correct for the drift (*not* update the offset). But if we're
definitely only talking about fixed, relatively small time offsets then
it is fine.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Zachary Amsden on
On 04/20/2010 09:42 AM, Jeremy Fitzhardinge wrote:
> On 04/20/2010 11:54 AM, Avi Kivity wrote:
>
>> On 04/20/2010 09:23 PM, Jeremy Fitzhardinge wrote:
>>
>>> On 04/20/2010 02:31 AM, Avi Kivity wrote:
>>>
>>>
>>>> btw, do you want this code in pvclock.c, or shall we keep it kvmclock
>>>> specific?
>>>>
>>>>
>>> I think its a pvclock-level fix. I'd been hoping to avoid having
>>> something like this, but I think its ultimately necessary.
>>>
>>>
>> Did you observe drift on Xen, or is this "ultimately" pointing at the
>> future?
>>
> People are reporting weirdnesses that "clocksource=jiffies" apparently
> resolves. Xen and KVM are faced with the same hardware constraints, and
> it wouldn't surprise me if there were small measurable
> non-monotonicities in the PV clock under Xen. May as well be safe.
>

Does the drift only occur on SMP VMs?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Avi Kivity on
On 04/23/2010 04:44 AM, Zachary Amsden wrote:
> Or apply this patch.
> time-warp.patch
>
>
> diff -rup a/time-warp-test.c b/time-warp-test.c
> --- a/time-warp-test.c 2010-04-15 16:30:13.955981607 -1000
> +++ b/time-warp-test.c 2010-04-15 16:35:37.777982377 -1000
> @@ -91,7 +91,7 @@ static inline unsigned long long __rdtsc
> {
> DECLARE_ARGS(val, low, high);
>
> - asm volatile("cpuid; rdtsc" : EAX_EDX_RET(val, low, high));
> + asm volatile("cpuid; rdtsc" : EAX_EDX_RET(val, low, high) :: "ebx", "ecx");
>
>

Plus, replace cpuid by lfence/mfence. cpuid will trap.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/