From: Tom on
Hi,
I have a code that is supposed to work as a parallel algorithm as well
as as a single-CPU job, and one routine is for collecting data on a
grid stored in arrays of the form a(-1:nx,-1:ny,-1:nz,nb) or v
(4,-1:nx,-1:ny,-1:nz,nb) into one big array of the form atot(ndim,
0:nxtot-1,0:nytot-1,0:nztot-1,nbtot) in a different routine on the
root process for write-out. This seems to work fine in the one-CPU
version, where a and atot or v and vtot have the same number of
elements (depending on the value of ndim passed to the subroutine),
but in parallel runs, the program crashes in a totally erratic way,
and I suspect that some memory corruption is going on. The only thing
left I can think of is that I pass arrays in a bad way, even though
the compiler doesn't complain about array mismatches. Here's what I
do:
In the calling routine:
real :: t(-1:nx,-1:ny,-1:nz,nb), v(4,-1:nx,-1:ny,-1:nz,nb)
call f_bindump(t,nx,ny,nz,nb,1,-1) ! ndim=1, nsel=-1
call f_bindump(v,nx,ny,nz,nb,4,-1) ! ndim=4, nsel=-1
call f_bindump(v2,nx,ny,nz,nb,9,1) ! ndim=9, nsel=1

In routine f_bindump (I know that ndim, nx, ny, nxtot, etc. are all
correct):
real, intent(in) :: a(ndim,-1:nx,-1:ny,-1:nz,nb)
real, allocatable :: atot(:,:,:,:,:)
allocate(atot(ndim,0:nxtot-1,0:nytot-1,0:nztot-1,nbtot))
if (nsel == -1) then
! pass the part of the array without the boundaries in the 2nd-4th
dim
npn=ndim*nx*ny*nz*nb
call ggather(a(1:ndim,0:nx-1,0:ny-1,0:nz-1,1:nb),atot,npn)
else
! pass slice nsel of the array without the boundaries in the 2nd-4th
dim
npn=nx*ny*nz*nb
call ggather(reshape(a(nsel,0:nx-1,0:ny-1,0:nz-1,1:nb),(/
nx,ny,nz,nb/)),atot,npn)
end if

In routine ggather of the parallel version:
real, intent(in) :: buf
real, intent(out) :: buftot
call MPI_GATHER(buf,4*n,MPI_BYTE,buftot,4*n,MPI_BYTE,
0,MPI_COMM_WORLD,ierr)

In routine ggather of the single-CPU version (dummy copying routine,
works ok):
real, intent(in) :: buf(n)
real, intent(out) :: buftot(n)
buftot=buf

Can anybody see something here that may give rise to memory
corruption? I have been running the program with all kinds of
debugging switches and with different ways of passing the array a in
the call of ggather, everything to no avail. It crashes on writing
files, but completely unpredictably, sometimes even in a different
routine, sometimes not at all, but it has always worked so far if I
comment out the call to the first calling routine, which makes me
believe that the root of all evil lies in this set of routines.
Thanks,
Tom
From: glen herrmannsfeldt on
Tom <flurboglarf(a)mailinator.com> wrote:

> I have a code that is supposed to work as a parallel algorithm as well
> as as a single-CPU job, and one routine is for collecting data on a
> grid stored in arrays of the form a(-1:nx,-1:ny,-1:nz,nb) or v
> (4,-1:nx,-1:ny,-1:nz,nb) into one big array of the form atot(ndim,
> 0:nxtot-1,0:nytot-1,0:nztot-1,nbtot) in a different routine on the
> root process for write-out. This seems to work fine in the one-CPU
> version, where a and atot or v and vtot have the same number of
> elements (depending on the value of ndim passed to the subroutine),
> but in parallel runs, the program crashes in a totally erratic way,
> and I suspect that some memory corruption is going on. The only thing
> left I can think of is that I pass arrays in a bad way, even though
> the compiler doesn't complain about array mismatches.

Without reading all the details. (I did read the previous post.)

The thing you have to watch out for is aliasing and copy-in/copy-out,
especially in the case of multiple processors running on the same data.

If on the parallel runs two different processors write to the same
data without the appropriate interlocks, you will get the wrong answer.
Sometimes the wrong data will cause the program to crash, otherwise
just give random results.

-- glen
From: robin on
"Tom" <flurboglarf(a)mailinator.com> wrote in message
news:b7e0f10a-a779-425c-a5fd-c3f7af56310b(a)a21g2000yqc.googlegroups.com...
| Hi,
| I have a code that is supposed to work as a parallel algorithm as well
| as as a single-CPU job, and one routine is for collecting data on a
| grid stored in arrays of the form a(-1:nx,-1:ny,-1:nz,nb) or v
| (4,-1:nx,-1:ny,-1:nz,nb) into one big array of the form atot(ndim,
| 0:nxtot-1,0:nytot-1,0:nztot-1,nbtot) in a different routine on the
| root process for write-out. This seems to work fine in the one-CPU
| version, where a and atot or v and vtot have the same number of
| elements (depending on the value of ndim passed to the subroutine),
| but in parallel runs, the program crashes in a totally erratic way,
| and I suspect that some memory corruption is going on. The only thing
| left I can think of is that I pass arrays in a bad way, even though
| the compiler doesn't complain about array mismatches.

You need explicit interfaces for each of the subroutines.


From: Tom on
On Dec 3, 8:42 pm, glen herrmannsfeldt <g...(a)ugcs.caltech.edu> wrote:
> The thing you have to watch out for is aliasing and copy-in/copy-out,
> especially in the case of multiple processors running on the same data.
> If on the parallel runs two different processors write to the same
> data without the appropriate interlocks, you will get the wrong answer.
Ok, but I don't see where that would happen here. The a arrays on the
individual nodes don't overlap, and I would expect that MPI_GATHER
takes care of the data not being written to the same address. The size
of atot is exactly an integer multiple of the size of the transferred
a. Isn't it the purpose of MPI_GATHER to avoid precisely the trap of
writing to the same data?
Tom
From: Tom on
On Dec 3, 9:40 pm, "robin" <robi...(a)bigpond.com> wrote:
> You need explicit interfaces for each of the subroutines.
Why? I don't think so, I don't have optional arguments or such there,
and as I said, the subroutine works well in single-CPU mode.
Thomas