From: Scott Lurndal on
Moi <root(a)invalid.address.org> writes:
>On Sun, 07 Mar 2010 21:52:46 -0800, David Schwartz wrote:

>A few years ago, I did some experiments with rdtsc, and it seemed that its costs
>vary. On my home machine (a dual core x86) the rdtsc instruction itself took
>an additional 200-300 tics. The machine at work (a single core AMD atlon)
>used only about 20 ticks.
>
>The rdtsc instruction seems to drain and refill the instruction pipelines;

No, rdtsc is _not_ a serializing instruction.

However, an OS can prevent an application from using it by setting the TSD
flag in CR4. In this case, the OS will take a trap and emulate the effect
of the RDTSC instruction; which may take a considerable time.

It has no effect on caches.

Intel lists between 31 and 100 clocks (depending on processor family) (doc 248966-018)
AMD lists 45 core clocks + 16 northbridge clocks (doc 40546 rev 3.07)

These are both quite expensive compared to the highly optimized integer
arithmetic instructions which average one or two clocks.

The TSC values are synchronized at boot time on multiple core systems.

scott