lost of efficiency of operator overloading [Fortran]

Prev: passing a type of different KIND to a C function from Fortran ?binding
Next: ALLOCATABLE array as MPI buffer

From: Tobias Burnus on 28 May 2010 09:25

On 05/28/2010 03:04 PM, Hifi-Comp wrote:
> I get the following error message after I execute: gfortran -Ofast -
> march=native -flto -fwhole-program test.f90
>
> f951.exe: error: invalid option argument '-Ofast'

Well, that option is really new - not even one week old. Thus, better
stick to -O3 -ffast-math (assuming that your programs are -ffast-math
save).

> f951.exe: error: unrecognized command line option "-flto"
> Fatal Error: Option -fwhole-program is not supported for Fortran
> gcc version 4.5.0 20090421 (experimental) [trunk revision 146519]

That's more than one year old, i.e. you have essentially a 4.4.0-release
compiler and thus you miss essentially all new 4.5 features:
http://gcc.gnu.org/gcc-4.5/changes.html

> I downloaded gfortran from http://mingw-w64.sourceforge.net/. Where
> can I download a more recent build version of gfortran for WinXP?

32bit or 64bit Windows? MinGW64 of course features the latter, but it
also has 32bit builds.

I have admittedly not a real overview about Windows builds; looking at
http://gcc.gnu.org/wiki/GFortranBinaries, I think you could try:

http://sourceforge.net/projects/mingw-w64/files/

MinGW (32bit): Toolchains targetting Win32 -> Personal Builds
sezero_20100527 -> mingw-w32-bin_i686-mingw_20100527_sezero.zip

MinGW (64bit): Toolchains targetting Win64 -> Personal Builds ->
sezero_20100527 -> mingw-w64-bin_x86_64-mingw_20100527_sezero.zip

The file names there follow the scheme <target>-<host>. Where "target"
is the system for which you want to compile for (e.g. "mingw-w64"
compiles for the 64bit version of Windows) and "host" means on which
system you want to compile (e.g. "i686-mingw" means that you want to
compile on 32bit windows). On that site you find also binaries to
compile for Window (or WINE ;-) on Linux or on Darwin/MacOS.

Tobias

From: Jim Xia on 28 May 2010 09:33

On May 26, 10:37 pm, Hifi-Comp <wenbinyu.hea...(a)gmail.com> wrote:
> I am writing a code using operator overloading feature of F90/95. The
> basic math is to replace all real(8) with two real(8) contained in a
> type named DUAL_NUM, and overloading the corresponding calculations
> according to newly defined data. Based on the coding, the computing
> should not more than three times more than computing for real(8).
> However, my simple test shows that computing for DUAL_NUM is almost
> nine times more expensive. I hope some of your knowledgable Fortran
> experts can help me figure out the loss of efficiency and how can I
> make the code more efficient. Thanks alot!
>

It's all dependent on the HW and also the compiler you're using. I
tried your two versions of programs on a Power5 machine with XLF, the
two versions of code don't show any difference. Note no optimization
was used in my test.

Jim

Cheers,

From: mecej4 on 28 May 2010 09:48

Kay Diederichs wrote:

> mecej4 schrieb:
>> Hifi-Comp wrote:
>>
>>> I am writing a code using operator overloading feature of F90/95. The
>>> basic math is to replace all real(8) with two real(8) contained in a
>>> type named DUAL_NUM, and overloading the corresponding calculations
>>> according to newly defined data. Based on the coding, the computing
>>> should not more than three times more than computing for real(8).
>>> However, my simple test shows that computing for DUAL_NUM is almost
>>> nine times more expensive. I hope some of your knowledgable Fortran
>>> experts can help me figure out the loss of efficiency and how can I
>>> make the code more efficient. Thanks alot!
>>>
>>> TYPE,PUBLIC:: DUAL_NUM
>>> REAL(8)::x_ad_
>>> REAL(8)::xp_ad_
>>> END TYPE DUAL_NUM
>>>
>>> PUBLIC OPERATOR (+)
>>> INTERFACE OPERATOR (+)
>>> MODULE PROCEDURE ADD_DD ! dual+ dual, ELEMENTAL
>>> END INTERFACE
>>>
>>> PUBLIC OPERATOR (*)
>>> INTERFACE OPERATOR (*)
>>> MODULE PROCEDURE MULT_DD ! dual*dual, ELEMENTAL
>>> END INTERFACE
>>>
>>> ELEMENTAL FUNCTION ADD_DD(u,v) RESULT(res)
>>> TYPE (DUAL_NUM), INTENT(IN)::u,v
>>> TYPE (DUAL_NUM)::res
>>> res%x_ad_ = u%x_ad_+v%x_ad_
>>> res%xp_ad_ = u%xp_ad_+v%xp_ad_
>>> END FUNCTION ADD_DD
>>>
>>> ELEMENTAL FUNCTION MULT_DD(u,v) RESULT(res)
>>> TYPE (DUAL_NUM), INTENT(IN)::u,v
>>> TYPE (DUAL_NUM)::res
>>> res%x_ad_ = u%x_ad_*v%x_ad_
>>> res%xp_ad_= u%xp_ad_*v%x_ad_ + u%x_ad_*v%xp_ad_
>>> END FUNCTION MULT_DD
>>>
>>> The segment of the original code:
>>> REAL(8):: x, y, z,f
>>> x=1.0d0;y=2.0d0;z=0.3d0
>>>
>>> !**********************************
>>> DO i=1,50000000
>>> f=x-y*z
>>> ENDDO
>>> !**********************************
>>>
>>> The do loop runs for 0.516 seconds.
>>>
>>> The corresponding overloaded code:
>>> TYPE(DUAL_NUM):: x,y,z,f
>>>
>>> x=DUAL_NUM(1.0d0,1.0D0);
>>> y=DUAL_NUM(2.0d0,1.0D0);
>>> z=DUAL_NUM(0.3d0,0.0D0)
>>>
>>> !**********************************
>>> DO i=1,50000000
>>> f=X-y*z
>>> ENDDO
>>> !*********************************
>>> The do loop runs for 4.513 seconds.
>>>
>>> Supposedly, for DUAL_NUM, the operations needed for minus are twice as
>>> those needed for REAL, and the operations needed for times are thrice
>>> as those needed for REAL. That is the time needed for computation
>>> should not be more than three times of computation for real. However,
>>> the overall time is almost nine times more. What else takes more time?
>>
>> You have no provision for carries and overflows in your multiplication.
>> And, you have not yet reached the fun part: division. Once you implement
>> division, you will appreciate why doing multiple-precision floating point
>> arithmetic in software is undertaken only if unavoidable.
>>
>> -- mecej4
>
> I understand your comment as meaning that you have identified the code
> as doing a part of interval arithmetics (at least that's what I think
> it's headed to), and that furthermore you have looked into that more
> deeply. I am quite interested in learning about existing software (e.g.
> Fortran MODULE) that allows to (as simply as possible) convert an existing
> program from normal arithmetics to interval arithmetics, e.g. to
> pinpoint parts of code that benefit from higher precision calculations.
>
> Another "fun part" of that, once one has the + - * / is, I guess, to
> provide overloaded functions of min max abs sqrt exp log sin cos tan and
> so on. But it would be extremely useful, I'd say.
>
> Do you have any pointers?
>
> thanks,
> Kay
Decades ago, I had a need to develop Pade' approximations for functions such
as

G = \int_0^F ln(1 + x)dx/x

I wanted the resulting approximations to be in terms of rational
coefficients, e.g., for the above integral, with p = [ln(1 + F)],

G \approx p [1 + 17 p^2 /450)]/[1 + p^2 /100] + p^2/4

I also needed to multiply and invert series expansions.

For these symbolic calculations, I found the 32-bit longs available on my 5
MHz 8086 PC inadequate. But, the 8087 coprocessor had 64 bit integers as
one of its standard types, and I wrote software in C and assembler to
operate on rational numbers. Unfortunately, I had to write in function
calls for every arithmetic operation, but I got the job done. I obtained
twice as many terms in my approximations as I was willing to derive by
hand.

I had not heard of operator overloading or objects, and Mathematica/Maple
had not become widely available.

I did look at MP and interval arithmetic at about that time. We did not have
enough CPU power to do the former, and the latter was not useful for any
but the most trivial calculations. We were interested in most probable
intervals, not the widest possible.

-- mecej4

From: Jim Xia on 28 May 2010 10:57

On May 28, 9:33 am, Jim Xia <jim...(a)hotmail.com> wrote:
> On May 26, 10:37 pm, Hifi-Comp <wenbinyu.hea...(a)gmail.com> wrote:
>
> > I am writing a code using operator overloading feature of F90/95. The
> > basic math is to replace all real(8) with two real(8) contained in a
> > type named DUAL_NUM, and overloading the corresponding calculations
> > according to newly defined data. Based on the coding, the computing
> > should not more than three times more than computing for real(8).
> > However, my simple test shows that computing for DUAL_NUM is almost
> > nine times more expensive. I hope some of your knowledgable Fortran
> > experts can help me figure out the loss of efficiency and how can I
> > make the code more efficient. Thanks alot!
>
> It's all dependent on the HW and also the compiler you're using. I
> tried your two versions of programs on a Power5 machine with XLF, the
> two versions of code don't show any difference. Note no optimization
> was used in my test.
>

Sorry, I'll take this back. Similar slow-down is also observed on P5
with XLF. I've made a mistake in previous measurement.

Jim

From: Hifi-Comp on 3 Jun 2010 00:00

On May 28, 9:25 am, Tobias Burnus <bur...(a)net-b.de> wrote:
> On 05/28/2010 03:04 PM, Hifi-Comp wrote:
>
> > I get the following error message after I execute: gfortran -Ofast -
> > march=native -flto -fwhole-program test.f90
>
> > f951.exe: error: invalid option argument '-Ofast'
>
> Well, that option is really new - not even one week old. Thus, better
> stick to -O3 -ffast-math (assuming that your programs are -ffast-math
> save).
>
> > f951.exe: error: unrecognized command line option "-flto"
> > Fatal Error: Option -fwhole-program is not supported for Fortran
> > gcc version 4.5.0 20090421 (experimental) [trunk revision 146519]
>
> That's more than one year old, i.e. you have essentially a 4.4.0-release
> compiler and thus you miss essentially all new 4.5 features:http://gcc.gnu.org/gcc-4.5/changes.html
>
> > I downloaded gfortran fromhttp://mingw-w64.sourceforge.net/. Where
> > can I download a more recent build version of gfortran for WinXP?
>
> 32bit or 64bit Windows? MinGW64 of course features the latter, but it
> also has 32bit builds.
>
> I have admittedly not a real overview about Windows builds; looking athttp://gcc.gnu.org/wiki/GFortranBinaries, I think you could try:
>
> http://sourceforge.net/projects/mingw-w64/files/
>
> MinGW (32bit): Toolchains targetting Win32 -> Personal Builds
> sezero_20100527 -> mingw-w32-bin_i686-mingw_20100527_sezero.zip
>
> MinGW (64bit): Toolchains targetting Win64 -> Personal Builds ->
> sezero_20100527 -> mingw-w64-bin_x86_64-mingw_20100527_sezero.zip
>
> The file names there follow the scheme <target>-<host>. Where "target"
> is the system for which you want to compile for (e.g. "mingw-w64"
> compiles for the 64bit version of Windows) and "host" means on which
> system you want to compile (e.g. "i686-mingw" means that you want to
> compile on 32bit windows). On that site you find also binaries to
> compile for Window (or WINE ;-) on Linux or on Darwin/MacOS.
>
> Tobias

I can get similar speed using gfortran 4.6. However, I have the
following two questions:
1. How is it possible that DNAD run is even more efficient than
Analysis run?
2. The significant speedup is enabled mainly by -fwhole-program.
However, it seems only applicable to a single file. If the real codes
have many separate files and even including some dlls, how can we
still achieve this? For example for my simple problem, if we put three
separate files dnad.f90 containing module dnad, cputime.f90 containing
module cputime, and test.f90 containg program test, how can the
compilation and linking be optimized?

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: passing a type of different KIND to a C function from Fortran ?binding
Next: ALLOCATABLE array as MPI buffer