lost of efficiency of operator overloading [Fortran]

Prev: passing a type of different KIND to a C function from Fortran ?binding
Next: ALLOCATABLE array as MPI buffer

From: Hifi-Comp on 27 May 2010 07:43

Thanks a lot for all your responses.

I used meaningless loop to get signficant running for comparing. The
whole purpose is to assess the speed of - and * after operator
overloading.

Clearly, it is very compiler dependent. I add 'ftot = ftot - f' into
each loop as you suggested, compiled using CVF6.6 with full
optimization, the original code runs for 0.25 secs while the
overloaded one runs for 4.86 secs.
Compiled using gcc 4.5.0 with -O3 -c compiling *.f90, and -o for
linking *.o, the original code runs for 0.210 secs while the
overloaded one runs for 4.969 secs. It seems I cannot even get close
to what Steve or Mike obtained. I am using a Dell XPS laptop (Intel
(R) core(TM) 2 Duo CPU T9300(a)2.5GHz, 772 Mhz, 3.49GB of RAM) under
WinXP. The time
is calculated by

CALL SYSTEM_CLOCK(start,rate)

before the piece of code for timing

and

CALL SYSTEM_CLOCK(finish)

after the piece of code for timing

The time is calculated using

sec=REAL(finish-start)/REAL(rate)

From: mecej4 on 27 May 2010 08:14

Hifi-Comp wrote:

> I am writing a code using operator overloading feature of F90/95. The
> basic math is to replace all real(8) with two real(8) contained in a
> type named DUAL_NUM, and overloading the corresponding calculations
> according to newly defined data. Based on the coding, the computing
> should not more than three times more than computing for real(8).
> However, my simple test shows that computing for DUAL_NUM is almost
> nine times more expensive. I hope some of your knowledgable Fortran
> experts can help me figure out the loss of efficiency and how can I
> make the code more efficient. Thanks alot!
>
> TYPE,PUBLIC:: DUAL_NUM
> REAL(8)::x_ad_
> REAL(8)::xp_ad_
> END TYPE DUAL_NUM
>
> PUBLIC OPERATOR (+)
> INTERFACE OPERATOR (+)
> MODULE PROCEDURE ADD_DD ! dual+ dual, ELEMENTAL
> END INTERFACE
>
> PUBLIC OPERATOR (*)
> INTERFACE OPERATOR (*)
> MODULE PROCEDURE MULT_DD ! dual*dual, ELEMENTAL
> END INTERFACE
>
> ELEMENTAL FUNCTION ADD_DD(u,v) RESULT(res)
> TYPE (DUAL_NUM), INTENT(IN)::u,v
> TYPE (DUAL_NUM)::res
> res%x_ad_ = u%x_ad_+v%x_ad_
> res%xp_ad_ = u%xp_ad_+v%xp_ad_
> END FUNCTION ADD_DD
>
> ELEMENTAL FUNCTION MULT_DD(u,v) RESULT(res)
> TYPE (DUAL_NUM), INTENT(IN)::u,v
> TYPE (DUAL_NUM)::res
> res%x_ad_ = u%x_ad_*v%x_ad_
> res%xp_ad_= u%xp_ad_*v%x_ad_ + u%x_ad_*v%xp_ad_
> END FUNCTION MULT_DD
>
> The segment of the original code:
> REAL(8):: x, y, z,f
> x=1.0d0;y=2.0d0;z=0.3d0
>
> !**********************************
> DO i=1,50000000
> f=x-y*z
> ENDDO
> !**********************************
>
> The do loop runs for 0.516 seconds.
>
> The corresponding overloaded code:
> TYPE(DUAL_NUM):: x,y,z,f
>
> x=DUAL_NUM(1.0d0,1.0D0);
> y=DUAL_NUM(2.0d0,1.0D0);
> z=DUAL_NUM(0.3d0,0.0D0)
>
> !**********************************
> DO i=1,50000000
> f=X-y*z
> ENDDO
> !*********************************
> The do loop runs for 4.513 seconds.
>
> Supposedly, for DUAL_NUM, the operations needed for minus are twice as
> those needed for REAL, and the operations needed for times are thrice
> as those needed for REAL. That is the time needed for computation
> should not be more than three times of computation for real. However,
> the overall time is almost nine times more. What else takes more time?

You have no provision for carries and overflows in your multiplication. And,
you have not yet reached the fun part: division. Once you implement
division, you will appreciate why doing multiple-precision floating point
arithmetic in software is undertaken only if unavoidable.

-- mecej4

From: steve on 27 May 2010 10:29

On May 27, 4:43 am, Hifi-Comp <wenbinyu.hea...(a)gmail.com> wrote:
> Thanks a lot for all your responses.
>
> I used meaningless loop to get signficant running for comparing. The
> whole purpose is to assess the speed of - and * after operator
> overloading.
>
> Clearly, it is very compiler dependent. I add 'ftot = ftot - f' into
> each loop as you suggested, compiled using CVF6.6 with full
> optimization, the original code runs for 0.25 secs while the
> overloaded one runs for 4.86 secs.
> Compiled using gcc 4.5.0 with -O3 -c compiling *.f90, and -o for
> linking *.o, the original code runs for 0.210 secs while the
> overloaded one runs for 4.969 secs. It seems I cannot even get close
> to what Steve or Mike obtained. I am using a Dell XPS laptop (Intel
> (R) core(TM) 2 Duo CPU T9...(a)2.5GHz, 772 Mhz, 3.49GB of RAM) under
> WinXP. The time
> is calculated by

Well, I was using gfortran 4.6.0, which has the -fwhole-program
option. This option allows gfortran to analyze the whole program
for optimization opportunities. In the case of your code, the
subprograms in the module are in-lined into the main program.
Once in-lined, the gcc middle-end performs CSE.

> CALL SYSTEM_CLOCK(start,rate)
>
> before the piece of code for timing
>
> and
>
> CALL SYSTEM_CLOCK(finish)
>
> after the piece of code for timing
>
> The time is calculated using
>
> sec=REAL(finish-start)/REAL(rate)

It might be slightly better to use cpu_time. On
many systems 'rate' will be 128 (or so); giving
millisecond resolution. cpu_time() at least with
gfortran uses a higher resolution timer.

--
steve

From: Uno on 27 May 2010 17:53

Hifi-Comp wrote:
> Thanks a lot for all your responses.
>
> I used meaningless loop to get signficant running for comparing. The
> whole purpose is to assess the speed of - and * after operator
> overloading.
>
> Clearly, it is very compiler dependent. I add 'ftot = ftot - f' into
> each loop as you suggested, compiled using CVF6.6 with full
> optimization, the original code runs for 0.25 secs while the
> overloaded one runs for 4.86 secs.
> Compiled using gcc 4.5.0 with -O3 -c compiling *.f90, and -o for
> linking *.o, the original code runs for 0.210 secs while the
> overloaded one runs for 4.969 secs. It seems I cannot even get close
> to what Steve or Mike obtained. I am using a Dell XPS laptop (Intel
> (R) core(TM) 2 Duo CPU T9300(a)2.5GHz, 772 Mhz, 3.49GB of RAM) under
> WinXP. The time
> is calculated by
>
> CALL SYSTEM_CLOCK(start,rate)
>
> before the piece of code for timing
>
> and
>
> CALL SYSTEM_CLOCK(finish)
>
> after the piece of code for timing
>
> The time is calculated using
>
> sec=REAL(finish-start)/REAL(rate)
>
>

Can you make a source listing after you're done with edits?
--
Uno

From: Kay Diederichs on 28 May 2010 06:06

mecej4 schrieb:
> Hifi-Comp wrote:
>
>> I am writing a code using operator overloading feature of F90/95. The
>> basic math is to replace all real(8) with two real(8) contained in a
>> type named DUAL_NUM, and overloading the corresponding calculations
>> according to newly defined data. Based on the coding, the computing
>> should not more than three times more than computing for real(8).
>> However, my simple test shows that computing for DUAL_NUM is almost
>> nine times more expensive. I hope some of your knowledgable Fortran
>> experts can help me figure out the loss of efficiency and how can I
>> make the code more efficient. Thanks alot!
>>
>> TYPE,PUBLIC:: DUAL_NUM
>> REAL(8)::x_ad_
>> REAL(8)::xp_ad_
>> END TYPE DUAL_NUM
>>
>> PUBLIC OPERATOR (+)
>> INTERFACE OPERATOR (+)
>> MODULE PROCEDURE ADD_DD ! dual+ dual, ELEMENTAL
>> END INTERFACE
>>
>> PUBLIC OPERATOR (*)
>> INTERFACE OPERATOR (*)
>> MODULE PROCEDURE MULT_DD ! dual*dual, ELEMENTAL
>> END INTERFACE
>>
>> ELEMENTAL FUNCTION ADD_DD(u,v) RESULT(res)
>> TYPE (DUAL_NUM), INTENT(IN)::u,v
>> TYPE (DUAL_NUM)::res
>> res%x_ad_ = u%x_ad_+v%x_ad_
>> res%xp_ad_ = u%xp_ad_+v%xp_ad_
>> END FUNCTION ADD_DD
>>
>> ELEMENTAL FUNCTION MULT_DD(u,v) RESULT(res)
>> TYPE (DUAL_NUM), INTENT(IN)::u,v
>> TYPE (DUAL_NUM)::res
>> res%x_ad_ = u%x_ad_*v%x_ad_
>> res%xp_ad_= u%xp_ad_*v%x_ad_ + u%x_ad_*v%xp_ad_
>> END FUNCTION MULT_DD
>>
>> The segment of the original code:
>> REAL(8):: x, y, z,f
>> x=1.0d0;y=2.0d0;z=0.3d0
>>
>> !**********************************
>> DO i=1,50000000
>> f=x-y*z
>> ENDDO
>> !**********************************
>>
>> The do loop runs for 0.516 seconds.
>>
>> The corresponding overloaded code:
>> TYPE(DUAL_NUM):: x,y,z,f
>>
>> x=DUAL_NUM(1.0d0,1.0D0);
>> y=DUAL_NUM(2.0d0,1.0D0);
>> z=DUAL_NUM(0.3d0,0.0D0)
>>
>> !**********************************
>> DO i=1,50000000
>> f=X-y*z
>> ENDDO
>> !*********************************
>> The do loop runs for 4.513 seconds.
>>
>> Supposedly, for DUAL_NUM, the operations needed for minus are twice as
>> those needed for REAL, and the operations needed for times are thrice
>> as those needed for REAL. That is the time needed for computation
>> should not be more than three times of computation for real. However,
>> the overall time is almost nine times more. What else takes more time?
>
> You have no provision for carries and overflows in your multiplication. And,
> you have not yet reached the fun part: division. Once you implement
> division, you will appreciate why doing multiple-precision floating point
> arithmetic in software is undertaken only if unavoidable.
>
> -- mecej4

I understand your comment as meaning that you have identified the code
as doing a part of interval arithmetics (at least that's what I think
it's headed to), and that furthermore you have looked into that more deeply.
I am quite interested in learning about existing software (e.g. Fortran
MODULE) that allows to (as simply as possible) convert an existing
program from normal arithmetics to interval arithmetics, e.g. to
pinpoint parts of code that benefit from higher precision calculations.

Another "fun part" of that, once one has the + - * / is, I guess, to
provide overloaded functions of min max abs sqrt exp log sin cos tan and
so on. But it would be extremely useful, I'd say.

Do you have any pointers?

thanks,
Kay

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: passing a type of different KIND to a C function from Fortran ?binding
Next: ALLOCATABLE array as MPI buffer