Floating Point Multiplication - 32 and 64 bit differences [Fortran]

Prev: negative zeros and dot_product matmul
Next: F2003 Standard question to constructors (type name = generic name)

From: brady_fht on 18 May 2010 11:39

Hello,

I need some help understanding floating point operations. I am seeing
a small difference in a program when it is compiled for 32 and 64 bit
machines. Here is the program:

--------------------------------------------------------
PROGRAM DPStudy
IMPLICIT NONE
REAL(8) :: FVF
REAL(8) :: THR
REAL(8) :: Dum1, Dum2

FVF = 3.3D0
THR = 3.0D0

Dum1 = FVF * THR * DSQRT(THR)
Dum2 = FVF * DSQRT(THR) * THR

1 FORMAT(A, ES28.22)
WRITE(*,*) ''
WRITE(*,1) 'Dum1 = ', Dum1
WRITE(*,1) 'Dum2 = ', Dum2
WRITE(*,*) ''
STOP
ENDPROGRAM DPStudy
--------------------------------------------------------

Pretty simple. I am using gfortran 4.3.3 on Ubuntu 9.04. The following
shows the different compiler options I use and the resulting output
from the above program:

--------------------------------------------------------
COMPILER:
gfortran -m32 -O0 main.f90

OUTPUT:
Dum1 = 1.7147302994931884256857E+01
Dum2 = 1.7147302994931884256857E+01
--------------------------------------------------------

--------------------------------------------------------
COMPILER:
gfortran -m64 -O0 main.f90

OUTPUT:
Dum1 = 1.7147302994931880704144E+01
Dum2 = 1.7147302994931884256857E+01
--------------------------------------------------------

--------------------------------------------------------
COMPILER:
gfortran -m32 -O1 main.f90

OUTPUT:
Dum1 = 1.7147302994931880704144E+01
Dum2 = 1.7147302994931884256857E+01
--------------------------------------------------------

--------------------------------------------------------
COMPILER:
gfortran -m64 -O1 main.f90

OUTPUT:
Dum1 = 1.7147302994931880704144E+01
Dum2 = 1.7147302994931884256857E+01
--------------------------------------------------------

So it appears the order of multiplication yields a different result
which doesn't surprise me all that much. However, it looks like when
optimization is turned off for the 32 bit case the compiler re-orders
the order of multiplication. I tried going through the gfortran
documentation to understand what happens when -O1 is used but I
couldn't find a single flag that triggers this behavior. Can anybody
help me understand what is going on here?

For what it's worth - similar behavior was observed using the Intel
Fortran Compiler 11.1 for Windows.

Thank you!

From: robin on 18 May 2010 12:24

"brady_fht" <brady.m.adams(a)gmail.com> wrote in message
news:0ee467f7-747a-4b6b-a7cf-2c8174ea062a(a)e28g2000vbd.googlegroups.com...
| Hello,
|
| I need some help understanding floating point operations. I am seeing
| a small difference in a program when it is compiled for 32 and 64 bit
| machines. Here is the program:

In the first place you specified a number kind (viz. 8).
That is implementation dependent and hence non-portable.
Whether it means the same to both compilers, I don't know,
but I suspect that it doesn't.

In the second place, Runs #2-4 have given you
two results that agree to 16 significant digits,
which is what I would expect for 64-bit float precision.

As for the slightly differing results in run #2 --
further investigtion is suggested. Have you tried,
for example, substituting SQRT for DSQRT?

| --------------------------------------------------------
| PROGRAM DPStudy
| IMPLICIT NONE
| REAL(8) :: FVF
| REAL(8) :: THR
| REAL(8) :: Dum1, Dum2
|
| FVF = 3.3D0
| THR = 3.0D0
|
| Dum1 = FVF * THR * DSQRT(THR)
| Dum2 = FVF * DSQRT(THR) * THR
|
| 1 FORMAT(A, ES28.22)
| WRITE(*,*) ''
| WRITE(*,1) 'Dum1 = ', Dum1
| WRITE(*,1) 'Dum2 = ', Dum2
| WRITE(*,*) ''
| STOP
| ENDPROGRAM DPStudy
| --------------------------------------------------------
|
| Pretty simple. I am using gfortran 4.3.3 on Ubuntu 9.04. The following
| shows the different compiler options I use and the resulting output
| from the above program:
|
|
| --------------------------------------------------------
| COMPILER:
| gfortran -m32 -O0 main.f90
|
| OUTPUT:
| Dum1 = 1.7147302994931884256857E+01
| Dum2 = 1.7147302994931884256857E+01
| --------------------------------------------------------
|
| --------------------------------------------------------
| COMPILER:
| gfortran -m64 -O0 main.f90
|
| OUTPUT:
| Dum1 = 1.7147302994931880704144E+01
| Dum2 = 1.7147302994931884256857E+01
| --------------------------------------------------------
|
| --------------------------------------------------------
| COMPILER:
| gfortran -m32 -O1 main.f90
|
| OUTPUT:
| Dum1 = 1.7147302994931880704144E+01
| Dum2 = 1.7147302994931884256857E+01
| --------------------------------------------------------
|
| --------------------------------------------------------
| COMPILER:
| gfortran -m64 -O1 main.f90
|
| OUTPUT:
| Dum1 = 1.7147302994931880704144E+01
| Dum2 = 1.7147302994931884256857E+01
| --------------------------------------------------------
|
|
| So it appears the order of multiplication yields a different result
| which doesn't surprise me all that much. However, it looks like when
| optimization is turned off for the 32 bit case the compiler re-orders
| the order of multiplication. I tried going through the gfortran
| documentation to understand what happens when -O1 is used but I
| couldn't find a single flag that triggers this behavior. Can anybody
| help me understand what is going on here?
|
| For what it's worth - similar behavior was observed using the Intel
| Fortran Compiler 11.1 for Windows.
|
| Thank you!

From: Gordon Sande on 18 May 2010 13:23

On 2010-05-18 12:39:29 -0300, brady_fht <brady.m.adams(a)gmail.com> said:

> Hello,
>
> I need some help understanding floating point operations. I am seeing
> a small difference in a program when it is compiled for 32 and 64 bit
> machines.

What is a 32 or 84 bit machine? Usually it is a discussion of the length
of memory addresses and has nothing to do with arithmetic.

> Here is the program:
>
> --------------------------------------------------------
> PROGRAM DPStudy
> IMPLICIT NONE
> REAL(8) :: FVF
> REAL(8) :: THR
> REAL(8) :: Dum1, Dum2
>
> FVF = 3.3D0
> THR = 3.0D0
>
> Dum1 = FVF * THR * DSQRT(THR)
> Dum2 = FVF * DSQRT(THR) * THR
>
> 1 FORMAT(A, ES28.22)
> WRITE(*,*) ''
> WRITE(*,1) 'Dum1 = ', Dum1
> WRITE(*,1) 'Dum2 = ', Dum2
> WRITE(*,*) ''
> STOP
> ENDPROGRAM DPStudy
> --------------------------------------------------------
>
> Pretty simple. I am using gfortran 4.3.3 on Ubuntu 9.04. The following
> shows the different compiler options I use and the resulting output
> from the above program:
>
>
> --------------------------------------------------------
> COMPILER:
> gfortran -m32 -O0 main.f90
>
> OUTPUT:
> Dum1 = 1.7147302994931884256857E+01
> Dum2 = 1.7147302994931884256857E+01
> --------------------------------------------------------
>
> --------------------------------------------------------
> COMPILER:
> gfortran -m64 -O0 main.f90
>
> OUTPUT:
> Dum1 = 1.7147302994931880704144E+01
> Dum2 = 1.7147302994931884256857E+01
> --------------------------------------------------------
>
> --------------------------------------------------------
> COMPILER:
> gfortran -m32 -O1 main.f90
>
> OUTPUT:
> Dum1 = 1.7147302994931880704144E+01
> Dum2 = 1.7147302994931884256857E+01
> --------------------------------------------------------
>
> --------------------------------------------------------
> COMPILER:
> gfortran -m64 -O1 main.f90
>
> OUTPUT:
> Dum1 = 1.7147302994931880704144E+01
> Dum2 = 1.7147302994931884256857E+01
> --------------------------------------------------------
>
>
> So it appears the order of multiplication yields a different result
> which doesn't surprise me all that much.

Without parens, what order do you think is "correct"? Can you justify
your claim? Have you tried it with explicit parens?

> However, it looks like when
> optimization is turned off for the 32 bit case the compiler re-orders
> the order of multiplication. I tried going through the gfortran
> documentation to understand what happens when -O1 is used but I
> couldn't find a single flag that triggers this behavior. Can anybody
> help me understand what is going on here?

Most flags controlling low level compilation issuses for the GNU compilers
are given to the back end and are not language specific. Look up GCC.

> For what it's worth - similar behavior was observed using the Intel
> Fortran Compiler 11.1 for Windows.
>
> Thank you!

From: glen herrmannsfeldt on 18 May 2010 13:23

brady_fht <brady.m.adams(a)gmail.com> wrote:

> I need some help understanding floating point operations. I am seeing
> a small difference in a program when it is compiled for 32 and 64 bit
> machines. Here is the program:

On most machines now in use, double precision floating point
has a 53 bit significand, for 53*log10(2.) or about 16 digits.

(snip)

> Dum1 = FVF * THR * DSQRT(THR)
> Dum2 = FVF * DSQRT(THR) * THR

(snip)

The use of constant KIND values is non-portable, but most
likely in this case gives the double precision you expect.

Many compilers optimize constant expressions, or expressions
that they can figure out are constant, by computing the value
at compile time. Often that is different than the result that
would be obtained if the expressions were not constant.

You might try again, with

READ(*,*) FVF, THR

and then typing in the appropriate values.

(snip of gfortran -m32)

> --------------------------------------------------------
> COMPILER:
> gfortran -m64 -O0 main.f90

> OUTPUT:
> Dum1 = 1.7147302994931880704144E+01
> Dum2 = 1.7147302994931884256857E+01
> --------------------------------------------------------

As I said, it is possible that the compiler is doing the
calculation at compile time, more likely with -O1.

In any case, compile with the -S option and post the
resulting .s file. (I would remove the unneeded write
statements first.)

(snip of additional tests with the same result.)

> So it appears the order of multiplication yields a different result
> which doesn't surprise me all that much. However, it looks like when
> optimization is turned off for the 32 bit case the compiler re-orders
> the order of multiplication. I tried going through the gfortran
> documentation to understand what happens when -O1 is used but I
> couldn't find a single flag that triggers this behavior. Can anybody
> help me understand what is going on here?

There are a few possibilities. One, though not as likely as it
used to be, is using the x87 floating point instructions which
compute to 64 significant bits, and then rounding the result to 53.
That often is one LSB different than multiplying the 53 bit values.
On machines not built by Cray, floating point multiplication is
usually commutative, but not always associative.

> For what it's worth - similar behavior was observed using the Intel
> Fortran Compiler 11.1 for Windows.

The usual rule is that you should not depend on it being better
than that. The differences you see are in the least significant
bit, or close to it, for a 53 bit significand.

-- glen

From: Tobias Burnus on 18 May 2010 13:41

On 05/18/2010 05:39 PM, brady_fht wrote:
> I need some help understanding floating point operations. I am seeing
> a small difference in a program when it is compiled for 32 and 64 bit
> machines. Here is the program:
>
> Pretty simple. I am using gfortran 4.3.3 on Ubuntu 9.04. The following
> shows the different compiler options I use and the resulting output
> from the above program:

> gfortran -m32 -O0 main.f90
> gfortran -m64 -O0 main.f90

By default GCC on x84 (ia32, i686, ...) uses the floating-point
coprocessor "x87" (-mfpmath=386) while on x86-64 (AMD64, Intel64, em64t,
....) by default SSE is used. While SSE uses proper rounding, the x87
processor has some excess precision, which is used for intermediate
results. One way to force the rounding is to store the intermediate
result in memory (opposed to the x87 registers); in GCC/gfortran you can
use -ffloat-store to force this.*

When you use -O1 (or higher), the compiler plugs in the values of the
variables and calculates the result at compile time (with proper rounding).

If you want to see what the compiler does, you can use the
-fdump-tree-all option, which dumps a (simplified, C-syntax-like)
internal representation. The "original" and "optimized" are probably
most interesting.

> For what it's worth - similar behavior was observed using the Intel
> Fortran Compiler 11.1 for Windows.

(Newer) Intel compilers default to SSE.

* = GCC 4.5 and 4.6 also offer -fexcess-precision=, but I think this
option is not yet supported for GCC's Fortran front end. Additionally,
GCC 4.5 and 4.6 have now a compiler compile-time configuration option,
which allows to enable SSE by default (--with-fpmath=sse).
For details, see http://gcc.gnu.org/gcc-4.5/changes.html

* * *

General remark: Don't look too closely at the last digits in
floating-point calculations and do not assume that changing the order
won't change the result. I would also suggest you to read the Goldberg
paper.

a) Goldberg paper:
http://www.google.de/search?q=Goldberg+What+Every+Computer+Scientist+Should+Know+About+Floating-Point+Arithmetic

b) Monniaux paper:
http://hal.archives-ouvertes.fr/hal-00128124

c) Semantics of Floating Point Math in GCC
http://gcc.gnu.org/wiki/FloatingPointMath

d) Note on the x87 coprocessor (for GCC)
http://gcc.gnu.org/wiki/x87note
http://gcc.gnu.org/wiki/Math_Optimization_Flags

e) IEEE 754:2008 "IEEE Standard for Floating-Point Arithmetic"
http://ieeexplore.ieee.org/servlet/opac?punumber=4610933
http://www.validlab.com/754R/drafts/archive/2006-10-04.pdf (last
publicly available draft I found)

Tobias

| Next | Last
Pages: 1 2 3
Prev: negative zeros and dot_product matmul
Next: F2003 Standard question to constructors (type name = generic name)