From: Casper H.S. Dik on
Chris Friesen <cbf123(a)mail.usask.ca> writes:

>On 03/09/2010 11:58 AM, Scott Lurndal wrote:

>> So put a very high res timer in the northbridge and have it respond
>> to some address above top of high memory.

>You mean like mapping /dev/hpet on modern x86 systems running linux? :)

>I seem to remember an architecture (maybe Sparc?) that distributed a
>fast-but-not-insanely-fast clock pulse to all cpus. Like 1MHz or
>something similar. This was fast enough to be useful but not so fast
>that clock skew becomes significant. This then incremented a counter in
>each cpu which could be read in a single instruction.

10MHz, IIRC; the %stick register. (There's also %tick which counts on
the clock frequency)

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
From: Scott Lurndal on
Chris Friesen <cbf123(a)mail.usask.ca> writes:
>On 03/09/2010 11:58 AM, Scott Lurndal wrote:
>
>> So put a very high res timer in the northbridge and have it respond
>> to some address above top of high memory.
>
>You mean like mapping /dev/hpet on modern x86 systems running linux? :)

Not really. The HPET still relies on interrupts (high perf _EVENT_ timer).

>
>I seem to remember an architecture (maybe Sparc?) that distributed a
>fast-but-not-insanely-fast clock pulse to all cpus. Like 1MHz or
>something similar. This was fast enough to be useful but not so fast
>that clock skew becomes significant. This then incremented a counter in
>each cpu which could be read in a single instruction.
>

And we're looped back to rdtsc :-)

scott
From: Chris Friesen on
On 03/09/2010 07:32 PM, Scott Lurndal wrote:
> Chris Friesen <cbf123(a)mail.usask.ca> writes:
>> On 03/09/2010 11:58 AM, Scott Lurndal wrote:
>>
>>> So put a very high res timer in the northbridge and have it respond
>>> to some address above top of high memory.
>>
>> You mean like mapping /dev/hpet on modern x86 systems running linux? :)
>
> Not really. The HPET still relies on interrupts (high perf _EVENT_ timer).

I believe you can read the HPET to get a 64-bit timestamp. It's slower
than rdtsc though.

>> I seem to remember an architecture (maybe Sparc?) that distributed a
>> fast-but-not-insanely-fast clock pulse to all cpus. Like 1MHz or
>> something similar. This was fast enough to be useful but not so fast
>> that clock skew becomes significant. This then incremented a counter in
>> each cpu which could be read in a single instruction.
>>
>
> And we're looped back to rdtsc :-)

Until relatively recently (especially on AMC cpus) rdtsc varied with cpu
frequency and sleep states, and was not necessarily synchronized across
multiple cores.

It's now possible to determine whether rdtsc is reliable...on linux an
easy way is to look at /proc/cpuinfo. Ideally you want to see
"constant_tsc" and "nonstop_tsc".

Chris
From: William Ahern on
Chris Friesen <cbf123(a)mail.usask.ca> wrote:
> On 03/09/2010 07:32 PM, Scott Lurndal wrote:
<snip>
> > And we're looped back to rdtsc :-)

> Until relatively recently (especially on AMC cpus) rdtsc varied with cpu
> frequency and sleep states, and was not necessarily synchronized across
> multiple cores.

> It's now possible to determine whether rdtsc is reliable...on linux an
> easy way is to look at /proc/cpuinfo. Ideally you want to see
> "constant_tsc" and "nonstop_tsc".

On Linux/x86_64, at least, the kernel already uses HPET+rtdsc tricks, and it
uses some special hacks for gettimeofday and similar so that a regular
system call isn't necessary. You can tell whether it's enabled by

cat /proc/sys/kernel/vsyscall64

It should read 1 or 2. If 0 then it's falling back to a regular syscall.
I can do 2^26 calls to gettimeofday in 3.8 seconds

william(a)proxy0:/tmp$ time ./bench

real 0m3.800s
user 0m3.800s
sys 0m0.000s

If I disable vsyscall64 then it runs only 4x slower, which is a testament to
how fast system calls are in general on Linux/x86.

I think the vsyscall (now called vdso, I think) mechanism is also
implemented on other architectures.
From: William Ahern on
Scott Lurndal <scott(a)slp53.sl.home> wrote:
<snip>
> SVR4/Unixware had reserved read-only page in the application virtual address space
> that could be mapped into the application (silently, the first time
> gettimeofday() was called). This page had the current TOD at a fixed
> location (and was updated out of the kernel timer routines); this turned
> gettimeofday() into a simple memory reference. IIRC they did this to
> improve Oracle performance.

This is pretty much how it works in Linux (x86, ppc, and s390).

For example, see do_realtime() starting at line 46 in
http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/arch/x86/vdso/vclock_gettime.c