From: Dan Magenheimer on
> Sorry, this is sort of mixing points
> Again, this is mixing the discussion.

Oops, sorry, missed that. :-}

> Maybe something like a tsc_long_calibration=1 option would allow for
> this?

Sounds good to me. If it's non-obvious what value to choose for
the new calibration, maybe specifying it in MS (per MAX_QUICK_PIT_MS
in arch/x86/kernel/tsc.c) would be nice.

> However, I really do like the idea of pulling the stamped value from
> the MSR and if its close to what we quickly calibrated, use that.

On a quick sample of two machines looking at the TSC calibration
done by Xen (which exposes the equivalent of tsc_khz), it appears
that the stamped value is different from the calibration by about
1000ppm. YMMV.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on
On 05/24/2010 05:01 PM, Dan Magenheimer wrote:
> On a quick sample of two machines looking at the TSC calibration
> done by Xen (which exposes the equivalent of tsc_khz), it appears
> that the stamped value is different from the calibration by about
> 1000ppm. YMMV.

So pretty seriously different. Less than I would have expected, but not
out of the ballpark.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Brian Bloniarz on
I still sorta think the clearest thing is to keep the kernel's
quick, dynamic calibration at boot, and expose the tsc_khz it
decides on. People who don't care about clocksync get their
quick boot, NTP knows enough to correct its drift estimate
when it starts, and users don't need to learn all these
details to get stable clocksync (*).

If we're being strict, something like NTP needs to know exactly
what's driving gettimeofday(). If the clocksource changes, it
needs to know that so it could correct or trash its drift
estimate. If there's one-time calibration, it needs to know
what the result was. If there's continuous calibration, it
either needs to be notified, or have the ability to disable
it. Right? So I think exporting tsc_khz in some form is a
step in the right direction.

So what's wrong with just adding a
/sys/devices/system/clocksource/clocksource0/tsc_khz?
Maybe Thomas Gleixner's suggestion of a vget_tsc_raw()
would also suffice, I'm not sure I understand the details
enough.

Any of the other fixes people have discussed (tsc_khz=
bootopt, tsc_long_calibration=1) would be enough to make
me happy though :)

(*) Though they still need to learn enough to coax the
kernel into giving them a fast gettimeofday(). That's a
price you gotta pay if you care enough :)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Brian Bloniarz on
On 05/24/2010 09:33 PM, Brian Bloniarz wrote:
> So what's wrong with just adding a
> /sys/devices/system/clocksource/clocksource0/tsc_khz?

As an RFC:

Add clocksource.sys_register & sys_unregister so the
current clocksource can add supplemental information to
/sys/devices/system/clocksource/clocksource0/

Export tsc_khz when current_clocksource==tsc so that
daemons like NTP can account for the variability of
calibration results.

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 9faf91a..9c99965 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -76,6 +76,11 @@ unsigned long long
sched_clock(void) __attribute__((alias("native_sched_clock")));
#endif

+int sysfs_tsc_register(struct sys_device *clocksource_dev,
+ struct clocksource *cs);
+void sysfs_tsc_unregister(struct sys_device *clocksource_dev,
+ struct clocksource *cs);
+
int check_tsc_unstable(void)
{
return tsc_unstable;
@@ -757,6 +762,8 @@ static struct clocksource clocksource_tsc = {
#ifdef CONFIG_X86_64
.vread = vread_tsc,
#endif
+ .sys_register = sysfs_tsc_register,
+ .sys_unregister = sysfs_tsc_unregister,
};

void mark_tsc_unstable(char *reason)
@@ -967,3 +974,22 @@ void __init tsc_init(void)
init_tsc_clocksource();
}

+static ssize_t show_tsc_khz(
+ struct sys_device *dev, struct sysdev_attribute *attr, char *buf)
+{
+ return sprintf(buf, "%u\n", tsc_khz);
+}
+
+static SYSDEV_ATTR(tsc_khz, 0444, show_tsc_khz, NULL);
+
+int sysfs_tsc_register(struct sys_device *clocksource_dev,
+ struct clocksource *cs)
+{
+ return sysdev_create_file(clocksource_dev, &attr_tsc_khz);
+}
+
+void sysfs_tsc_unregister(struct sys_device *clocksource_dev,
+ struct clocksource *cs)
+{
+ sysdev_remove_file(clocksource_dev, &attr_tsc_khz);
+}
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 5ea3c60..d9f6f13 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -15,6 +15,7 @@
#include <linux/cache.h>
#include <linux/timer.h>
#include <linux/init.h>
+#include <linux/sysdev.h>
#include <asm/div64.h>
#include <asm/io.h>

@@ -156,6 +157,8 @@ extern u64 timecounter_cyc2time(struct timecounter *tc,
* @vread: vsyscall based read
* @suspend: suspend function for the clocksource, if necessary
* @resume: resume function for the clocksource, if necessary
+ * @sys_register: optional, register additional sysfs attributes
+ * @sys_unregister: optional, unregister sysfs attributes
*/
struct clocksource {
/*
@@ -194,6 +197,10 @@ struct clocksource {
struct list_head wd_list;
cycle_t wd_last;
#endif
+ int (*sys_register)(struct sys_device *clocksource_dev,
+ struct clocksource *cs);
+ void (*sys_unregister)(struct sys_device *clocksource_dev,
+ struct clocksource *cs);
};

/*
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index f08e99c..d8b69a5 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -41,6 +41,8 @@ void timecounter_init(struct timecounter *tc,
}
EXPORT_SYMBOL_GPL(timecounter_init);

+void sysfs_alter_clocksource(struct clocksource *old, struct clocksource *new);
+
/**
* timecounter_read_delta - get nanoseconds since last call of this function
* @tc: Pointer to time counter
@@ -572,6 +574,7 @@ static void clocksource_select(void)
}
if (curr_clocksource != best) {
printk(KERN_INFO "Switching to clocksource %s\n", best->name);
+ sysfs_alter_clocksource(curr_clocksource, best);
curr_clocksource = best;
timekeeping_notify(curr_clocksource);
}
@@ -834,6 +837,8 @@ static struct sys_device device_clocksource = {
.cls = &clocksource_sysclass,
};

+static int sysfs_active = 0;
+
static int __init init_clocksource_sysfs(void)
{
int error = sysdev_class_register(&clocksource_sysclass);
@@ -848,10 +853,34 @@ static int __init init_clocksource_sysfs(void)
error = sysdev_create_file(
&device_clocksource,
&attr_available_clocksource);
+
+ if (!error)
+ {
+ mutex_lock(&clocksource_mutex);
+ if(curr_clocksource->sys_register)
+ error = curr_clocksource->sys_register(
+ &device_clocksource, curr_clocksource);
+ mutex_unlock(&clocksource_mutex);
+ }
+
+ if (!error)
+ sysfs_active = 1;
return error;
}

device_initcall(init_clocksource_sysfs);
+
+void sysfs_alter_clocksource(struct clocksource *old,
+ struct clocksource *new)
+{
+ if(!sysfs_active)
+ return;
+ if(old->sys_unregister)
+ old->sys_unregister(&device_clocksource, old);
+ if(new->sys_register)
+ new->sys_register(&device_clocksource, new);
+}
+
#endif /* CONFIG_SYSFS */

/**
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: john stultz on
On Tue, 2010-05-25 at 20:16 -0400, Brian Bloniarz wrote:
> On 05/24/2010 09:33 PM, Brian Bloniarz wrote:
> > So what's wrong with just adding a
> > /sys/devices/system/clocksource/clocksource0/tsc_khz?
>
> As an RFC:
>
> Add clocksource.sys_register & sys_unregister so the
> current clocksource can add supplemental information to
> /sys/devices/system/clocksource/clocksource0/
>
> Export tsc_khz when current_clocksource==tsc so that
> daemons like NTP can account for the variability of
> calibration results.

I think this is a bad idea, as it creates an ABI that is arch AND
machine specific, which will cause portability problems in applications
that expect the interface to be there.

thanks
-john


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/