faster gettimeofday? [Unix Programming]

Prev: ANN: Seed7 Release 2010-03-07
Next: pthread_mutex/ptheread_spinlocks & shared memory

From: Casper H.S. Dik on 9 Mar 2010 16:03

Chris Friesen <cbf123(a)mail.usask.ca> writes:

>On 03/09/2010 11:58 AM, Scott Lurndal wrote:

>> So put a very high res timer in the northbridge and have it respond
>> to some address above top of high memory.

>You mean like mapping /dev/hpet on modern x86 systems running linux? :)

>I seem to remember an architecture (maybe Sparc?) that distributed a
>fast-but-not-insanely-fast clock pulse to all cpus. Like 1MHz or
>something similar. This was fast enough to be useful but not so fast
>that clock skew becomes significant. This then incremented a counter in
>each cpu which could be read in a single instruction.

10MHz, IIRC; the %stick register. (There's also %tick which counts on
the clock frequency)

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

From: Scott Lurndal on 9 Mar 2010 20:32

Chris Friesen <cbf123(a)mail.usask.ca> writes:
>On 03/09/2010 11:58 AM, Scott Lurndal wrote:
>
>> So put a very high res timer in the northbridge and have it respond
>> to some address above top of high memory.
>
>You mean like mapping /dev/hpet on modern x86 systems running linux? :)

Not really. The HPET still relies on interrupts (high perf _EVENT_ timer).

>
>I seem to remember an architecture (maybe Sparc?) that distributed a
>fast-but-not-insanely-fast clock pulse to all cpus. Like 1MHz or
>something similar. This was fast enough to be useful but not so fast
>that clock skew becomes significant. This then incremented a counter in
>each cpu which could be read in a single instruction.
>

And we're looped back to rdtsc :-)

scott

From: Chris Friesen on 10 Mar 2010 11:08

On 03/09/2010 07:32 PM, Scott Lurndal wrote:
> Chris Friesen <cbf123(a)mail.usask.ca> writes:
>> On 03/09/2010 11:58 AM, Scott Lurndal wrote:
>>
>>> So put a very high res timer in the northbridge and have it respond
>>> to some address above top of high memory.
>>
>> You mean like mapping /dev/hpet on modern x86 systems running linux? :)
>
> Not really. The HPET still relies on interrupts (high perf _EVENT_ timer).

I believe you can read the HPET to get a 64-bit timestamp. It's slower
than rdtsc though.

>> I seem to remember an architecture (maybe Sparc?) that distributed a
>> fast-but-not-insanely-fast clock pulse to all cpus. Like 1MHz or
>> something similar. This was fast enough to be useful but not so fast
>> that clock skew becomes significant. This then incremented a counter in
>> each cpu which could be read in a single instruction.
>>
>
> And we're looped back to rdtsc :-)

Until relatively recently (especially on AMC cpus) rdtsc varied with cpu
frequency and sleep states, and was not necessarily synchronized across
multiple cores.

It's now possible to determine whether rdtsc is reliable...on linux an
easy way is to look at /proc/cpuinfo. Ideally you want to see
"constant_tsc" and "nonstop_tsc".

Chris

From: William Ahern on 10 Mar 2010 14:10

Chris Friesen <cbf123(a)mail.usask.ca> wrote:
> On 03/09/2010 07:32 PM, Scott Lurndal wrote:
<snip>
> > And we're looped back to rdtsc :-)

> Until relatively recently (especially on AMC cpus) rdtsc varied with cpu
> frequency and sleep states, and was not necessarily synchronized across
> multiple cores.

> It's now possible to determine whether rdtsc is reliable...on linux an
> easy way is to look at /proc/cpuinfo. Ideally you want to see
> "constant_tsc" and "nonstop_tsc".

On Linux/x86_64, at least, the kernel already uses HPET+rtdsc tricks, and it
uses some special hacks for gettimeofday and similar so that a regular
system call isn't necessary. You can tell whether it's enabled by

cat /proc/sys/kernel/vsyscall64

It should read 1 or 2. If 0 then it's falling back to a regular syscall.
I can do 2^26 calls to gettimeofday in 3.8 seconds

william(a)proxy0:/tmp$ time ./bench

real 0m3.800s
user 0m3.800s
sys 0m0.000s

If I disable vsyscall64 then it runs only 4x slower, which is a testament to
how fast system calls are in general on Linux/x86.

I think the vsyscall (now called vdso, I think) mechanism is also
implemented on other architectures.

From: William Ahern on 10 Mar 2010 15:36

Scott Lurndal <scott(a)slp53.sl.home> wrote:
<snip>
> SVR4/Unixware had reserved read-only page in the application virtual address space
> that could be mapped into the application (silently, the first time
> gettimeofday() was called). This page had the current TOD at a fixed
> location (and was updated out of the kernel timer routines); this turned
> gettimeofday() into a simple memory reference. IIRC they did this to
> improve Oracle performance.

This is pretty much how it works in Linux (x86, ppc, and s390).

For example, see do_realtime() starting at line 46 in
http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/arch/x86/vdso/vclock_gettime.c

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: ANN: Seed7 Release 2010-03-07
Next: pthread_mutex/ptheread_spinlocks & shared memory