From: Venkatesh Pallipadi on
On Mon, May 3, 2010 at 1:21 PM, Dan Magenheimer
<dan.magenheimer(a)oracle.com> wrote:
>
> In a patch posted late last year by Venki:
>
> http://lkml.org/lkml/2009/12/17/360
>
> it was noted that some systems that specify the "Invariant TSC"
> bit in CPUID (on recent processors) are sadly not guaranteed to
> have synchronized TSCs.  As a result, Ingo's check_tsc_warp() is
> executed; if the warp test passes, the kernel uses TSC
> as clocksource and, if it doesn't pass, the kernel marks
> the TSC as unstable and chooses a different clocksource.
>
> Whether the kernel deems TSC to be reliable or not is a very
> useful piece of information to userland, e.g. to certain
> enterprise apps such the Oracle DB, some JVM's, etc.  If
> TSC IS reliable, rdtsc can be used by many of these
> enterprise applications in many situations in place of a
> gettimeofday call.  Rdtsc can be much faster even than
> a vsyscall and it is certainly much much faster when,
> for one reason or another, vsyscall is not enabled.
> This can make a huge performance difference in real
> benchmarks when timestamps are frequently taken (10%
> benchmark performance improvement was measured using
> rdtsc vs gettimeofday syscall).
>
> Running a warp test in userland is not nearly as accurate
> as the warp test run by the kernel.  So it makes sense to expose
> the results of the kernel warp test to userland, maybe
> through /sys/devices/system/clocksource/tsc_reliable
>
> Comments?

[ Sorry if this is a duplicate. I had messed up my mail client format setting ]

One option is to remove tsc from
/sys/devices/system/clocksource/clocksource*/available_clocksource
when it is detected as unstable.

That should already be happening with NOHZ or HIGHRES selected. But,
should be simple to add some code to do this always.

Would that work?

Thanks,
Venki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dan Magenheimer on
> > Running a warp test in userland is not nearly as accurate
> > as the warp test run by the kernel. �So it makes sense to expose
> > the results of the kernel warp test to userland, maybe
> > through /sys/devices/system/clocksource/tsc_reliable
> >
> > Comments?
>
> [ Sorry if this is a duplicate. I had messed up my mail client format
> setting ]
>
> One option is to remove tsc from
> /sys/devices/system/clocksource/clocksource*/available_clocksource
> when it is detected as unstable.
>
> That should already be happening with NOHZ or HIGHRES selected. But,
> should be simple to add some code to do this always.
>
> Would that work?

Hi Venki --

In some offlist discussion, a similar solution was suggested:
If /sys/devices/system/clocksource/clocksource*/current_clocksource
is "tsc" AND the "Invariant TSC" CPUID bit is set, then "reliable TSC"
can be assumed.

BUT, exposing the information explicitly from the kernel would be
more comforting rather than requiring some reverse-engineering some
combination of kernel tests that might change over time. If the
kernel determines TSC is reliable, that seems like it should be
good enough for userland.

AND it was also pointed out that userland usage of TSC is almost
useless unless some reliable reasonably-precise frequency is also
known. A possible solution to this is to expose:

/sys/devices/system/clocksource/clocksource*/clocksource_mult and
/sys/devices/system/clocksource/clocksource*/clocksource_shift

(or some other more TSC-specific name) and provide the same
mult/shift values the kernel uses for clocksource_cyc2ns().

By the way, excuse my ignorance, but is there ever a clocksourceN
where N is not zero?

Hope things are going well in google-land!

Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/