Add a global synchronization point for pvclock [Kernel]

Prev: Sound not transferred to docking station after resume from StR
Next: Process-shared futexes on hugepages puts the kernel in an infinite loop in 2.6.32.11; is this fixed now?

From: Peter Zijlstra on 19 Apr 2010 10:50

On Mon, 2010-04-19 at 17:33 +0300, Avi Kivity wrote:
> On 04/19/2010 05:21 PM, Glauber Costa wrote:
> >
> >> Oh yes, just trying to avoid a patch with both atomic64_read() and
> >> ACCESS_ONCE().
> >>
> > you're mixing the private version of the patch you saw with this one.
> > there isn't any atomic reads in here. I'll use a barrier then
> >
>
> This patch writes last_value atomically, but reads it non-atomically. A
> barrier is insufficient.

What avi says! :-)

On a 32bit machine a 64bit read are two 32bit reads, so

last = last_value;

becomes:

last.high = last_value.high;
last.low = last_vlue.low;

(or the reverse of course)

Now imagine a write getting interleaved with that ;-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 20 Apr 2010 11:20

On 04/20/2010 03:59 PM, Glauber Costa wrote:
>
>> Might be due to NMIs or SMIs interrupting the rdtsc(); ktime_get()
>> operation which establishes the timeline. We could limit it by
>> having a loop doing rdtsc(); ktime_get(); rdtsc(); and checking for
>> some bound, but it isn't worthwhile (and will break nested
>> virtualization for sure). Better to have the option to calibrate
>> kvmclock just once on machines with
>> X86_FEATURE_NONSTOP_TRULY_RELIABLE _TSC_HONESTLY.
>>
> For the record, we can only even do that in those machines. If we try to update
> time structures only once in machines with the
> X86_FEATURE_TSC_SAYS_IT_IS_OKAY_BUT_IN_REALITY_IS_NOT_OKAY feature flag, guests
> won't even boot.
>
> We can detect that, and besides doing calculation only once, also export some
> bit indicating that to the guest. Humm... I'm thinking now, that because of
> migration, we should check this bit every time, because we might have changed host.
> So instead of using an expensive cpuid check, we should probably use some bit in
> the vcpu_time_info structure, and use a cpuid bit just to say it is enabled.
>

Right, we need a bit in the structure itself that says you can trust the
time not to go backwards, and a bit in cpuid that says you can trust the
bit in the structure (unless we can be sure no implementations leak
garbage into the padding field).

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jeremy Fitzhardinge on 19 Apr 2010 12:20

On 04/19/2010 07:26 AM, Glauber Costa wrote:
>> Is the problem that the tscs are starting out of sync, or that they're
>> drifting relative to each other over time? Do the problems become worse
>> the longer the uptime? How large are the offsets we're talking about here?
>>
> The offsets usually seem pretty small, under a microsecond. So I don't think
> it has anything to do with tscs starting out of sync. Specially because the
> delta-based calculation has the exact purpose of handling that case.
>

So you think they're drifting out of sync from an initially synced
state? If so, what would bound the drift?

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jeremy Fitzhardinge on 19 Apr 2010 12:20

On 04/19/2010 07:33 AM, Avi Kivity wrote:
> On 04/19/2010 05:21 PM, Glauber Costa wrote:
>>
>>> Oh yes, just trying to avoid a patch with both atomic64_read() and
>>> ACCESS_ONCE().
>>>
>> you're mixing the private version of the patch you saw with this one.
>> there isn't any atomic reads in here. I'll use a barrier then
>>
>
> This patch writes last_value atomically, but reads it non-atomically.
> A barrier is insufficient.

Well, on a 32b system, you can explicitly order the updates of low and
high, then do a high-low-checkhigh read. That would be much more
efficient than atomic64. If we really care about 32b.

J

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Glauber Costa on 19 Apr 2010 14:30

On Mon, Apr 19, 2010 at 09:19:38AM -0700, Jeremy Fitzhardinge wrote:
> On 04/19/2010 07:26 AM, Glauber Costa wrote:
> >> Is the problem that the tscs are starting out of sync, or that they're
> >> drifting relative to each other over time? Do the problems become worse
> >> the longer the uptime? How large are the offsets we're talking about here?
> >>
> > The offsets usually seem pretty small, under a microsecond. So I don't think
> > it has anything to do with tscs starting out of sync. Specially because the
> > delta-based calculation has the exact purpose of handling that case.
> >
>
> So you think they're drifting out of sync from an initially synced
> state? If so, what would bound the drift?
I think delta calculation introduces errors.

Marcelo can probably confirm it, but he has a nehalem with an appearently
very good tsc source. Even this machine warps.

It stops warping if we only write pvclock data structure once and forget it,
(which only updated tsc_timestamp once), according to him.

Obviously, we can't do that everywhere.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: Sound not transferred to docking station after resume from StR
Next: Process-shared futexes on hugepages puts the kernel in an infinite loop in 2.6.32.11; is this fixed now?