From: Jacek Dziedzic on
Paul Pluzhnikov pisze:
> Jacek Dziedzic <jacek.dziedzic__no--spam__(a)gmail.com> writes:
>
>> Pretty weird, this fortran, huh? :)
>
> There is absolutely nothing special about FORTRAN; it's just
> another user-level program, and it obeys the rules of the game
> just like any other user-level program, whether written in C, C++,
> hand-coded assembly, or java.

The special thing about it is that the Fortran RTL tries to
catch signals that kill the program, while at least C and C++
runtimes do not do this (or I've never seen it). Or so I believe.

cheers,
- J.
From: Jacek Dziedzic on
fjblurt(a)yahoo.com wrote:
> I am curious about this, but I don't know any Fortran. Can you
> reproduce the problem with a minimal Fortran program and post it here,
> along with instructions on compiling, and the compiler/runtime
> versions you're using?

OK, I've written a shortest fortran program that segfaults,
compiled it with g77 and it behaves as one would expect --
it segfaults with a "Segmentation fault" which can be trapped
in gdb. When I compile it under ifort (Intel's fortran compiler),
it crashes with a SIGSEGV, but this is intercepted by the RTL.
However, gdb still successfully traps this, so it can be debugged.
Must be what Paul Pluzhnikov suggested -- when I run it in
an MPI environment, there are child processes involved, maybe
that's why then the debugger cannot catch the signal.

A transcipt of a session (txt) is here:
http://tiny.pl/2pjn

thanks,
- J.
From: Jacek Dziedzic on
Ron Ford wrote:
> Is the fortran source something you can post?

OK, I've written a shortest fortran program that segfaults,
compiled it with g77 and it behaves as one would expect --
it segfaults with a "Segmentation fault" which can be trapped
in gdb. When I compile it under ifort (Intel's fortran compiler),
it crashes with a SIGSEGV, but this is intercepted by the RTL.
However, gdb still successfully traps this, so it can be debugged.
Must be what Paul Pluzhnikov suggested -- when I run it in
an MPI environment, there are child processes involved, maybe
that's why then the debugger cannot catch the signal.

A transcipt of a session (txt) is here:
http://tiny.pl/2pjn

thanks,
- J.
From: Jacek Dziedzic on

OK, I've written a shortest fortran program that segfaults,
compiled it with g77 and it behaves as one would expect --
it segfaults with a "Segmentation fault" which can be trapped
in gdb. When I compile it under ifort (Intel's fortran compiler),
it crashes with a SIGSEGV, but this is intercepted by the RTL.
However, gdb still successfully traps this, so it can be debugged.
Must be what you suggested -- when I run it in
an MPI environment, there are child processes involved, maybe
that's why then the debugger cannot catch the signal.

A transcipt of a session (txt) is here:
http://tiny.pl/2pjn

thanks,
- J.
From: Paul Pluzhnikov on
Jacek Dziedzic <jacek.dziedzic__no--spam__(a)gmail.com> writes:

> Must be what Paul Pluzhnikov suggested -- when I run it in
> an MPI environment, there are child processes involved, maybe
> that's why then the debugger cannot catch the signal.

It's not a "may be"; it is.

The debugger can't catch a signal in a process that it is not
debugging (one of the MPI "slave" processes).

What you need to do is arrange to attach to the crashing process
before the crash. Since you know that it is "rank 0 in job 1", you
can probably arrange for the particular process that is executing
that piece of work to sleep(600), and attach gdb to it (I do not
know enough about MPI to tell you exactly how to achieve that).

Also note that TotalView debugger has specific hooks for MPI, and
(AFAIU) automatically attaches to all the "slave" jobs "out of
the box". Perhaps TotalView is a better tool for your particular
problem.

Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.