From: Richard Maine on
JB <foo(a)bar.invalid> wrote:

> Here in Fortran-happy-happy-land, the solution in the vast majority of
> cases is to use the standard timing intrinsics DATE_AND_TIME,
> SYSTEM_CLOCK, and CPU_TIME.

Indeed. I would say that one should have very specific reasons to use
anything other than those. I won't deny that those reasons might exist
in some cases, but they really do need to be specific. Something like "I
read a post on comp.lang.fortran from someone who says that he usually
uses method x (be it RDTSC or anything else)" doesn't even come close to
being specific enough.

Answering in advance the obvious question about what would be specific
enough, I'd say that if one has to ask, then one doesn't have a specific
enough reason.

--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain
From: Arjan on
> Here in Fortran-happy-happy-land, the solution in the vast majority of
> cases is to use the standard timing intrinsics DATE_AND_TIME,
> SYSTEM_CLOCK, and CPU_TIME.



Sounds like I have an answer!
Thanks!

A.
From: glen herrmannsfeldt on
JB <foo(a)bar.invalid> wrote:
> On 2010-03-09, glen herrmannsfeldt <gah(a)ugcs.caltech.edu> wrote:
(snip, in response to a question about performance timing)

>> For IA32, I usually use a routine that returns the value of
>> the time stamp counter, as given by the RDTSC instruction.

> Here, let me formulate a corollary to Godwin's law: "As an online
> programming discussion about timing grows longer, the probability of
> someone suggesting use of RDTSC approaches 1".

> The wikipedia page contains reasons why it should not be used except
> in very specific circumstances:

> http://en.wikipedia.org/wiki/Rdtsc

I completely agree that one needs to be careful with its use.
Even so, I have never had any problems. I have never seen a negative
increment, even on multiprocessor systems. With variable clock
rate processors, one has to know what is important.

When trying to find computation bottlenecks, I usually consider
clock cycles to be the important factor, not elapsed time
(especially in a possibly varying clock speed processor.)

> Here in Fortran-happy-happy-land, the solution in the vast majority of
> cases is to use the standard timing intrinsics DATE_AND_TIME,
> SYSTEM_CLOCK, and CPU_TIME.

If you read the standard, those routines have pretty much the same
disclaimers as RDTSC in the wikipedia site. Also, they are often
low resolution even as processors get faster. You might find
that the CPU_TIME or DATE_AND_TIME values don't update at all
through a fairly long computation. If you average over
enough calls to a routine, then you can get a reasonably value
even for a low resolution clock, but it isn't easy.

I have even used RDTSC in Java, through JNI calling what looks
(to Java) like a C function, with useful results.

-- glen
From: James Van Buskirk on
"JB" <foo(a)bar.invalid> wrote in message
news:slrnhpdean.3eg.foo(a)kosh.hut.fi...

> Here, let me formulate a corollary to Godwin's law: "As an online
> programming discussion about timing grows longer, the probability of
> someone suggesting use of RDTSC approaches 1".

> The wikipedia page contains reasons why it should not be used except
> in very specific circumstances:

Yeah, don't ever use RDTSC because then you would have a chance to
measure performance and possibly enhance performance rather than
just blather about performance which is much more in vogue nowadays.

Fortran just doesn't provide primitives which can split out the
time taken by one subroutine in the context of running with
everything else in the program fighting it for cache, TLB entries,
and BTB entries.

It is not at all unusual for lots of pieces of a program to be
performing suboptimally and if you fiddle with one of the pieces
the improvement (or not) can get lost in the noise. You can try
to write a benchmark that only invokes the subroutine you are
working on, but it's trickier to do this than to filter out the
noise inherent in RDTSC. At least I have seen otherwise
respected programmers write total garbage benchmarks that don't
measure performance correctly because they use cache of BTB
differently than the subroutine would in practice.

RDTSC can "measure" glitches like interrupts and processor
switchover, so it's the responsibility of the user to detect
these events and filter them out so as to see what the effects of
your adjustments have been.

--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end


From: Phred Phungus on
JB wrote:
> On 2010-03-09, glen herrmannsfeldt <gah(a)ugcs.caltech.edu> wrote:
>> Arjan <arjan.van.dijk(a)rivm.nl> wrote:
>>
>>> Until now I monitor the performance of my application by measuring the
>>> real time spent by my program and subtract the value from the former
>>> iteration from the latest estimate. This gives me the number of
>>> seconds per iteration of my process. I have only 1 CPU, so the
>>> available time is distributed over all processes. My current
>>> application uses a lot of CPU and produces only a tiny bit of output,
>>> so I/O-time is not restrictive. How can I measure the net cpu-time
>>> spent by my program per iteration of my calculation, i.e. corrected
>>> for the fraction of CPU assigned to the process?
>> For IA32, I usually use a routine that returns the value of
>> the time stamp counter, as given by the RDTSC instruction.
>
> Here, let me formulate a corollary to Godwin's law: "As an online
> programming discussion about timing grows longer, the probability of
> someone suggesting use of RDTSC approaches 1".

I don't want to tarnish your thesis, aber habe ich etwa den Fuehrer
erwaehnt?

Glen's posts have none of the triteness that your law suggests.

Also, bitte schoen.
--
fred