From: Måns Rullgård on
lacos(a)ludens.elte.hu (Ersek, Laszlo) writes:

> In article <yw1xbpey7opo.fsf(a)unicorn.mansr.com>, =?iso-8859-1?Q?M=E5ns_Rullg=E5rd?= <mans(a)mansr.com> writes:
>> lacos(a)ludens.elte.hu (Ersek, Laszlo) writes:
>>
>>> In article <4B951582.2040300(a)mail.usask.ca>, Chris Friesen <cbf123(a)mail.usask.ca> writes:
>>>> On 03/08/2010 07:55 AM, Sumer Cip wrote:
>>>>> Interestingly tried 'rdtsc' just to see what is happening and it gives
>>>>> the same results. It is a simple 3 lines of ASM call. Well, is there a
>>>>> way to tradeoff between accuracy, for example If I say only second
>>>>> precision enough, then is there a faster function? Why I am pushing
>>>>> this is because profiler's %95 runtime overhead is because of this
>>>>> call?
>>>>
>>>> Have you looked at the assembly to make sure that your instrumentation
>>>> is giving you what you expect? "rdtsc" should be pretty fast.
>>>>
>>>> If it's still not fast enough, then you can't do exact profiling and you
>>>> need to go to statistical methods. A simple version is to program the
>>>> RTC to interrupt you at some interval (preferably not a multiple or
>>>> divisor of the system clock) and then use the value of the instruction
>>>> pointer when you were interrupted to bump some statistics.
>>>
>>> Fantastic! I was trying to suggest timer_create() [0] with SIGEV_SIGNAL
>>> and timer_settime() [1], but couldn't tell what the program should do in
>>> the signal handler. However, your suggestion should be implementable on
>>> platforms with Realtime Signals Extension [2] by setting SA_SIGINFO,
>>> because then the handler will be entered [3] as
>>>
>>> [...]
>>>
>>> I guess a histogram could be made of the collected IP values, and IP
>>> values could be translated to source code locations via addr2line.
>>
>> If you have a Linux system, oprofile does all this and more without
>> any instrumentation required, and with very low overhead.
>>
>> Solaris probably has something similar, though I don't know the name.
>
> Thanks! However,
>
> In article
> <e785d1bf-4317-4b0d-9c1a-d2f973eb45b0(a)u9g2000yqb.googlegroups.com>,
> Sumer Cip <sumerc(a)gmail.com> writes:
>
>> I am developing a profiler
>
> Thus the OP might need to (re)implement what oprofile does. (In which
> case your advice translates to "dear OP, please look at the source of
> oprofile", of course.)

Or maybe he'll find out he doesn't need to develop yet another profiler.

> (... The website in your sig (mansr.com) makes my browser's connect()
> return with -1/ECONNREFUSED.)

My sig contains an email address, not a website.

--
M�ns Rullg�rd
mans(a)mansr.com
From: Ersek, Laszlo on
In article <yw1x3a0a7hwi.fsf(a)unicorn.mansr.com>,
=?iso-8859-1?Q?M=E5ns_Rullg=E5rd?= <mans(a)mansr.com> writes:

> My sig contains an email address, not a website.

My mistake, sorry.

lacos
From: Moi on
On Sun, 07 Mar 2010 21:52:46 -0800, David Schwartz wrote:

> On Mar 7, 7:30 pm, Sumer Cip <sum...(a)gmail.com> wrote:
>
>> I am developing a profiler, and in my tests, it seems most of time
>> spent in the profiler is the gettimeofday() function. I have also used
>> clock_gettime() function and gives same results. Is there any way to
>> optimize the below piece of code more? Maybe another syscall I am
>> missing or anything?
>
> On x86 systems with one TSC or known synchronized TSCs, you can use
> 'rdtsc'. I think pretty much you just need to accept that profiling will
> be invasive.

A few years ago, I did some experiments with rdtsc, and it seemed that its costs
vary. On my home machine (a dual core x86) the rdtsc instruction itself took
an additional 200-300 tics. The machine at work (a single core AMD atlon)
used only about 20 ticks.

The rdtsc instruction seems to drain and refill the instruction pipelines;
for the multicore machine it may also force a drain and sync to the other core,
(and maybe even flush or drop the cache.) The attempt to keep the
cpu clocks in sync maybe or maybe not an operation imposed by the kernel.

IIRC there was also some interaction with prepending a cpuid instruction to the
rdtsc.

The conclusion: YMMV.

AvK
From: Sumer Cip on
On 8 Mart, 21:06, Måns Rullgård <m...(a)mansr.com> wrote:
> la...(a)ludens.elte.hu (Ersek, Laszlo) writes:
> > In article <yw1xbpey7opo....(a)unicorn.mansr.com>, =?iso-8859-1?Q?M=E5ns_Rullg=E5rd?= <m...(a)mansr.com> writes:
> >> la...(a)ludens.elte.hu (Ersek, Laszlo) writes:
>
> >>> In article <4B951582.2040...(a)mail.usask.ca>, Chris Friesen <cbf...(a)mail.usask.ca> writes:
> >>>> On 03/08/2010 07:55 AM, Sumer Cip wrote:
> >>>>> Interestingly tried 'rdtsc' just to see what is happening and it gives
> >>>>> the same results. It is a simple 3 lines of ASM call. Well, is there a
> >>>>> way to tradeoff between accuracy, for example If I say only second
> >>>>> precision enough, then is there a faster function? Why I am pushing
> >>>>> this is because profiler's %95 runtime overhead is because of this
> >>>>> call?
>
> >>>> Have you looked at the assembly to make sure that your instrumentation
> >>>> is giving you what you expect?  "rdtsc" should be pretty fast.
>
> >>>> If it's still not fast enough, then you can't do exact profiling and you
> >>>> need to go to statistical methods.  A simple version is to program the
> >>>> RTC to interrupt you at some interval (preferably not a multiple or
> >>>> divisor of the system clock) and then use the value of the instruction
> >>>> pointer when you were interrupted to bump some statistics.
>
> >>> Fantastic! I was trying to suggest timer_create() [0] with SIGEV_SIGNAL
> >>> and timer_settime() [1], but couldn't tell what the program should do in
> >>> the signal handler. However, your suggestion should be implementable on
> >>> platforms with Realtime Signals Extension [2] by setting SA_SIGINFO,
> >>> because then the handler will be entered [3] as
>
> >>> [...]
>
> >>> I guess a histogram could be made of the collected IP values, and IP
> >>> values could be translated to source code locations via addr2line.
>
> >> If you have a Linux system, oprofile does all this and more without
> >> any instrumentation required, and with very low overhead.
>
> >> Solaris probably has something similar, though I don't know the name.
>
> > Thanks! However,
>
> > In article
> > <e785d1bf-4317-4b0d-9c1a-d2f973eb4...(a)u9g2000yqb.googlegroups.com>,
> > Sumer Cip <sum...(a)gmail.com> writes:
>
> >> I am developing a profiler
>
> > Thus the OP might need to (re)implement what oprofile does. (In which
> > case your advice translates to "dear OP, please look at the source of
> > oprofile", of course.)
>
> Or maybe he'll find out he doesn't need to develop yet another profiler.
>
> > (... The website in your sig (mansr.com) makes my browser's connect()
> > return with -1/ECONNREFUSED.)
>
> My sig contains an email address, not a website.
>
> --
> M ns Rullg rd
> m...(a)mansr.com

The profiler is not a Unix specific C profiler, it is a multithreaded
Python profiler written in C.
From: Casper H.S. Dik on
Rick Jones <rick.jones2(a)hp.com> writes:

>I like gethrtime() - works great for things like netperf time
>histograms.

>That Solaris gettimeofday() would be a wrapper around gethrtime() is
>interesting - the Solaris manpage for gethrtime() talks about how it
>is "not correlated in any way to the time of day" which naturally, is
>fine for delta time measurements, but it would seem that the "fairly
>thin wrapper" would have to do all those things a "normal"
>gettimeofday() call would so the result would indeed represent the
>time of day.

In Solaris, gettimeofday() is implemented as a "fast-trap" and not as an
ordinary system-trap. Any such trap could now sufficient information
to map gethrtime() to tod.

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.