From: Andi Kleen on
Venkatesh Pallipadi <venki(a)google.com> writes:

> From: Dan Magenheimer <dan.magenheimer(a)oracle.com>
>
> Kernel information about calibrated value of tsc_khz and
> tsc_stability (result of tsc warp test) are useful bits of information
> for any app that wants to use TSC directly. Export this read_only
> information in sysfs.

Is this really a good idea? It will encourage the applications
to use RDTSC directly, but there are all kinds of constraints on
that. Even the kernel has a hard time with them, how likely
is it that applications will get all that right?

It would be better to fix them to use the vsyscalls instead.
Or if they can't use the vsyscalls for some reason today fix them.

This way if anything changes again in TSC the kernel could
shield the applications.

-Andi

--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jaswinder Singh Rajput on
Hello,

On Sat, May 15, 2010 at 7:10 AM, Venkatesh Pallipadi <venki(a)google.com> wrote:
> From: Dan Magenheimer <dan.magenheimer(a)oracle.com>
>
> Kernel information about calibrated value of tsc_khz and
> tsc_stability (result of tsc warp test) are useful bits of information
> for any app that wants to use TSC directly. Export this read_only
> information in sysfs.
>
> Signed-off-by: Venkatesh Pallipadi <venki(a)google.com>
> Signed-off-by: Dan Magenheimer <dan.magenheimer(a)oracle.com>
> ---
> �arch/x86/kernel/tsc.c | � 76 +++++++++++++++++++++++++++++++++++++++++++++++++
> �1 files changed, 76 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 9faf91a..24dd484 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -10,6 +10,7 @@
> �#include <linux/clocksource.h>
> �#include <linux/percpu.h>
> �#include <linux/timex.h>
> +#include <linux/sysdev.h>
>
> �#include <asm/hpet.h>
> �#include <asm/timer.h>
> @@ -857,6 +858,81 @@ static void __init init_tsc_clocksource(void)
> � � � �clocksource_register(&clocksource_tsc);
> �}
>
> +#ifdef CONFIG_SYSFS
> +/*
> + * Export TSC related info to user land. This reflects kernel usage of TSC
> + * as hints to userspace users of TSC. The read_only info provided here:
> + * - tsc_stable: 1 implies system has TSC that always counts at a constant
> + * � rate, sync across CPUs and has passed the kernel warp test.
> + * - tsc_khz: TSC frequency in khz.
> + * - tsc_mult and tsc_shift: multiplier and shift to optimally convert
> + * � TSC delta to ns; ns = ((u64) delta * mult) >> shift
> + */
> +
> +#define define_show_var_function(_name, _var) \
> +static ssize_t show_##_name( \
> + � � � struct sys_device *dev, struct sysdev_attribute *attr, char *buf) \
> +{ \
> + � � � return sprintf(buf, "%u\n", (unsigned int) _var);\
> +}
> +
> +define_show_var_function(tsc_stable, !tsc_unstable);
> +define_show_var_function(tsc_khz, tsc_khz);
> +define_show_var_function(tsc_mult, clocksource_tsc.mult);
> +define_show_var_function(tsc_shift, clocksource_tsc.shift);
> +
> +static SYSDEV_ATTR(tsc_stable, 0444, show_tsc_stable, NULL);
> +static SYSDEV_ATTR(tsc_khz, 0444, show_tsc_khz, NULL);
> +static SYSDEV_ATTR(tsc_mult, 0444, show_tsc_mult, NULL);
> +static SYSDEV_ATTR(tsc_shift, 0444, show_tsc_shift, NULL);
> +
> +static struct sysdev_attribute *tsc_attrs[] = {
> + � � � &attr_tsc_stable,
> + � � � &attr_tsc_khz,
> + � � � &attr_tsc_mult,
> + � � � &attr_tsc_shift,
> +};
> +
> +static struct sysdev_class tsc_sysclass = {
> + � � � .name = "tsc",
> +};
> +
> +static struct sys_device device_tsc = {
> + � � � .id = 0,
> + � � � .cls = &tsc_sysclass,
> +};
> +
> +static int __init init_tsc_sysfs(void)
> +{
> + � � � int err, i = 0;
> +
> + � � � err = sysdev_class_register(&tsc_sysclass);
> + � � � if (err)
> + � � � � � � � return err;
> +
> + � � � err = sysdev_register(&device_tsc);
> + � � � if (err)
> + � � � � � � � goto fail;

fail will call sysdev_unregister(&device_tsc) which is not
appropriate, please fix this goto.

Thanks,
--
Jaswinder Singh.

> +
> + � � � for (i = 0; i < ARRAY_SIZE(tsc_attrs); i++) {
> + � � � � � � � err = sysdev_create_file(&device_tsc, tsc_attrs[i]);
> + � � � � � � � if (err)
> + � � � � � � � � � � � goto fail;
> + � � � }
> +
> + � � � return 0;
> +
> +fail:
> + � � � while (--i >= 0)
> + � � � � � � � sysdev_remove_file(&device_tsc, tsc_attrs[i]);
> +
> + � � � sysdev_unregister(&device_tsc);
> + � � � sysdev_class_unregister(&tsc_sysclass);
> + � � � return err;
> +}
> +device_initcall(init_tsc_sysfs);
> +#endif
> +
> �#ifdef CONFIG_X86_64
> �/*
> �* calibrate_cpu is used on systems with fixed rate TSCs to determine
> --
> 1.7.0.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at �http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at �http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dan Magenheimer on
> From: Andi Kleen [mailto:andi(a)firstfloor.org]
>
> > Kernel information about calibrated value of tsc_khz and
> > tsc_stability (result of tsc warp test) are useful bits of
> information
> > for any app that wants to use TSC directly. Export this read_only
> > information in sysfs.
>
> Is this really a good idea? It will encourage the applications
> to use RDTSC directly, but there are all kinds of constraints on

Indeed, that is what it is intended to do.

> that. Even the kernel has a hard time with them, how likely
> is it that applications will get all that right?

That's the point of exposing the tsc_reliable kernel data.
If the processor has Invariant TSC and the system has
successfully passed Ingo's warp test and, as a result
the kernel is using TSC as a clocksource, why not enable
userland apps that need to obtain timestamp data
tens or hundreds of thousands of times per second to
also use the TSC directly?

> It would be better to fix them to use the vsyscalls instead.
> Or if they can't use the vsyscalls for some reason today fix them.

The problem is from an app point-of-view there is no vsyscall.
There are two syscalls: gettimeofday and clock_gettime. Sometimes,
if it gets lucky, they turn out to be very fast and sometimes
it doesn't get lucky and they are VERY slow (resulting in a performance
hit of 10% or more), depending on a number of factors completely
out of the control of the app and even undetectable to the app.

Note also that even vsyscall with TSC as the clocksource will
still be significantly slower than rdtsc, especially in the
common case where a timestamp is directly stored and the
delta between two timestamps is later evaluated; in the
vsyscall case, each timestamp is a function call and a convert
to nsec but in the TSC case, each timestamp is a single
instruction.

> This way if anything changes again in TSC the kernel could
> shield the applications.

If tsc_reliable is 1, the system and the kernel are guaranteeing
to the app that nothing will change in the TSC. In an Invariant
TSC system that has passed Ingo's warp test (to eliminate the
possibility of a fixed interprocessor TSC gap due to a broken BIOS
in a multi-node NUMA system), if anything changes in the clock
signal that drives the TSC, the system is badly broken and far
worse things -- like inter-processor cache incoherency -- may happen.

Is it finally possible to get past the horrible SMP TSC problems
of the past and allow apps, under the right conditions, to be able
to use rdtsc again? This patch argues "yes".
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Venkatesh Pallipadi on
On Sat, May 15, 2010 at 5:35 AM, Jaswinder Singh Rajput
<jaswinderlinux(a)gmail.com> wrote:
> Hello,
>> +
>> +static int __init init_tsc_sysfs(void)
>> +{
>> + � � � int err, i = 0;
>> +
>> + � � � err = sysdev_class_register(&tsc_sysclass);
>> + � � � if (err)
>> + � � � � � � � return err;
>> +
>> + � � � err = sysdev_register(&device_tsc);
>> + � � � if (err)
>> + � � � � � � � goto fail;
>
> fail will call sysdev_unregister(&device_tsc) which is not
> appropriate, please fix this goto.
>

Thanks for catching this. Will fix this in the patch refresh.

Thanks,
Venki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Venkatesh Pallipadi on
On Sat, May 15, 2010 at 6:29 AM, Dan Magenheimer
<dan.magenheimer(a)oracle.com> wrote:
>> From: Andi Kleen [mailto:andi(a)firstfloor.org]
>>
>> > Kernel information about calibrated value of tsc_khz and
>> > tsc_stability (result of tsc warp test) are useful bits of
>> information
>> > for any app that wants to use TSC directly. Export this read_only
>> > information in sysfs.
>>
>> Is this really a good idea? �It will encourage the applications
>> to use RDTSC directly, but there are all kinds of constraints on
>
> Indeed, that is what it is intended to do.
>
>> that. Even the kernel has a hard time with them, how likely
>> is it that applications will get all that right?
>
> That's the point of exposing the tsc_reliable kernel data.
> If the processor has Invariant TSC and the system has
> successfully passed Ingo's warp test and, as a result
> the kernel is using TSC as a clocksource, why not enable
> userland apps that need to obtain timestamp data
> tens or hundreds of thousands of times per second to
> also use the TSC directly?
>

I am a little concerned about applications getting this right, with
rdtsc and related barriers etc. May be that calls for a userspace
library.

Despite of that, it is useful information to human user to know
whether tsc is stable and what TSC freq is. It will be very hard for
userspace to do the TSC calibration. And to know whether TSC warp test
passed or whether TSC is marked unstable by the kernel, there is no
one place to get these answers today. Users have to look at
clocksource, dmesg, etc.

Thanks,
Venki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/