From: mpbro on
I'm setting up the requisite buildsystem to benchmark a variety of FFT
and compiler options. Specific to the purpose of this post, I'm
testing 3D, single precision, real-to-complex-even FFTW
(sfftw_plan_dft_r2c_3d). I'm comparing ifort to gfortran, and single-
threaded to multi-threaded. I'm running an Intel quad-core x86_64
machine with Fedora Core 7.

To make a long story short: in gfortran, any use of FFTW's thread
manipulation functions (sfftw_init_threads, sfftw_plan_with_nthreads,
sfftw_cleanup_threads) causes a segmentation fault.

First let me show the main program:

!-------------------------------------------------------------------------------
program FFTW_Test

use system_time_mod

implicit none

include 'fftw3.f'

integer :: i, j, k, n1, n2, n3,
nthreads, stat
integer*8 :: planf, planb
type(timer) :: t
character(len=100) :: ffttype
real, dimension(:,:,:), allocatable :: data

#ifdef FFTW
ffttype = 'FFTW'
#elif MKL
ffttype = 'MKL'
#else
ffttype = 'UNKNOWN'
#endif

n1=750
n2=750
n3=750

call system_time_init()

!-----------------------------------------------------------------------------
! 3D in-place, single-threaded, real-to-complex FFT
!-----------------------------------------------------------------------------
write(0,*)
'================================================================='
write(0,*) '3D in-place, single-threaded, real-to-complex FFT with
',ffttype
write(0,*) 'Array size=',n1,n2,n3

allocate( data(2*(n1/2+1),n2,n3) )

do i=1,n1
do j=1,n2
do k=1,n3
data(i,j,k) = 1.0*(i+j+k)
end do
end do
end do

write(0,*) ' data(25:27,25,25)
',data(25,25,25),data(26,25,25),&
data(27,25,25)

#ifdef MULTI
nthreads = 4
#else
nthreads = 1
#endif

write(0,*) 'nthreads=',nthreads

#ifdef FFTW
write(0,*) "Using FFTW multi-threading"
call sfftw_init_threads(stat)
call sfftw_plan_with_nthreads(nthreads)
#elif MKL
write(0,*) "Using MKL multi-threading"
call mkl_set_num_threads(nthreads)
#endif

call sfftw_plan_dft_r2c_3d(planf, n1, n2, n3, data, data,
FFTW_ESTIMATE);
call sfftw_plan_dft_c2r_3d(planb, n1, n2, n3, data, data,
FFTW_ESTIMATE);

call start_timer( t )
call sfftw_execute(planf)
call sfftw_execute(planb)
call stop_timer( t )

write(0,*) 'FFT^{-1}[FFT[data(25:27,25,25)]]',data(25,25,25)/
(n1*n2*n3), &
data(26,25,25)/
(n1*n2*n3), &
data(27,25,25)/
(n1*n2*n3)

write(0,*) 'elapsed time:',t%telapsed

call sfftw_destroy_plan(planf)
call sfftw_destroy_plan(planb)
#ifdef FFTW
call sfftw_cleanup_threads()
#endif
deallocate( data )

call exit(0)

end program FFTW_Test
!-------------------------------------------------------------------------------

My apologies for the mangled whitespace and proliferation of
preprocessor directives. The system_time_mod module is simply a
(compiler-dependent) wrapper around system_clock(). I can include the
source code if you are interested. I use the fpp preprocessor with
ifort and the -x f95-cpp-input preprocessor option with gfortran.

Here is how I build the executable using ifort:

----------------------------
fpp -Difort ../../Src/system_time.f90 > fpp_system_time.f90
ifort -c -assume bscc -assume byterecl -fpp -mtune=pentium4 -O3 -
static-intel -vms -w -WB fpp_system_time.f90 -o system_time.o
fpp -DFFTW -DMULTI -I/usr/local/include FFTW_Test.f90 >
fpp_FFTW_Test.f90
ifort -c -assume bscc -assume byterecl -fpp -mtune=pentium4 -O3 -
static-intel -vms -w -WB -I/usr/local/include fpp_FFTW_Test.f90 -o
FFTW_Test.o
ifort -assume bscc -assume byterecl -fpp -mtune=pentium4 -O3 -static-
intel -vms -w -WB -I/usr/local/include FFTW_Test.o system_time.o -L/
usr/local/lib -lfftw3f -lfftw3 -lfftw3f_threads -lpthread -lm -o
FFTW_native_multithreaded_ifort
----------------------------

And here is how I build the executable using gfortran:

----------------------------
gfortran -E -x f95-cpp-input -Dgfortran ../../Src/system_time.f90 >
fpp_system_time.f90
gfortran -c -O2 -static -m64 -w fpp_system_time.f90 -o system_time.o
gfortran -E -x f95-cpp-input -DFFTW -DMULTI -I/usr/local/include
FFTW_Test.f90 > fpp_FFTW_Test.f90
gfortran -c -O2 -static -m64 -w -I/usr/local/include fpp_FFTW_Test.f90
-o FFTW_Test.o
gfortran -O2 -static -m64 -w -I/usr/local/include FFTW_Test.o
system_time.o -L/usr/local/lib -lfftw3f -lfftw3 -lfftw3f_threads -
lpthread -lm -o FFTW_native_multithreaded_gfortran
----------------------------

The ifort version works as expected, in both single-threaded and multi-
threaded mode. The gfortran version works only if I comment out the
three lines of FFTW thread manipulation code.

Would appreciate any insights. Please let me know if I have provided
sufficient information to recognize/diagnose the problem.

Regards,
Morgan