From: fjblurt on
On Jul 24, 10:08 am, Jacek Dziedzic <jacek.dziedzic__no--
spa...(a)gmail.com> wrote:
> fjbl...(a)yahoo.com pisze:
> >
>
> > What system are you on?
>
> uname -a says:
> Linux [hostname] 2.6.24-etchnhalf.1-amd64 #1 SMP Mon Jul 21 10:36:02
> UTC 2008 x86_64 GNU/Linux
>
> > On most systems I've tried (I just tried
>
> > FreeBSD), a debugger gets to see signals like SIGSEGV before the
> > program itself does.
>
> This is also the case on this system, except for Fortran
> programs.
>
> > What happens when you run gdb on a program like this?
> > [...]
>
> It traps correctly:
> Program received signal SIGSEGV, Segmentation fault.
> 0x00000000004004f7 in main () at 1.c:10
> 10 *(volatile int *)0 = 42;
>
> Pretty weird, this fortran, huh? :)
>
> - J.

Huh.

I am curious about this, but I don't know any Fortran. Can you
reproduce the problem with a minimal Fortran program and post it here,
along with instructions on compiling, and the compiler/runtime
versions you're using?
From: Paul Pluzhnikov on
Jacek Dziedzic <jacek.dziedzic__no--spam__(a)gmail.com> writes:

>> If you run the program under gdb from the start, it should stop with
>> "program received SIGSEGV" message *before* FORTRAN runtime had
>> any chance to catch/handle the signal.
>
> Yep, I know it should. It does, in fact, do that for
> my C and C++ programs.

Then it is likely that your FORTRAN program is *not* getting a
SIGSEGV; that you've misinterpreted what you observed, and misled
everybody.

[It is generally not possible for a program to receive SIGSEGV and
for debugger attached to it to not notice.]

Perhaps there are child processes involved?
Try gdb 'catch fork' command to find out.

Run your program under gdb, record your entire interaction
(e.g. using script(1)) including all input and output, and post
it here.

Or run it under 'strace -ff -o junk.trace ./a.out', and examine
the trace this generates.

Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
From: Paul Pluzhnikov on
Jacek Dziedzic <jacek.dziedzic__no--spam__(a)gmail.com> writes:

> Pretty weird, this fortran, huh? :)

There is absolutely nothing special about FORTRAN; it's just
another user-level program, and it obeys the rules of the game
just like any other user-level program, whether written in C, C++,
hand-coded assembly, or java.

Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
From: Ron Ford on
On Thu, 24 Jul 2008 19:08:31 +0200, Jacek Dziedzic posted:

> fjblurt(a)yahoo.com pisze:
> >
>> What system are you on?
>
> uname -a says:
> Linux [hostname] 2.6.24-etchnhalf.1-amd64 #1 SMP Mon Jul 21 10:36:02
> UTC 2008 x86_64 GNU/Linux
>
> > On most systems I've tried (I just tried
>> FreeBSD), a debugger gets to see signals like SIGSEGV before the
>> program itself does.
>
> This is also the case on this system, except for Fortran
> programs.
>
>> What happens when you run gdb on a program like this?
>> [...]
>
> It traps correctly:
> Program received signal SIGSEGV, Segmentation fault.
> 0x00000000004004f7 in main () at 1.c:10
> 10 *(volatile int *)0 = 42;
>
> Pretty weird, this fortran, huh? :)
>
> - J.

Is the fortran source something you can post?
--
Wealth - any income that is at least one hundred dollars more a year than
the income of one's wife's sister's husband. 6
H. L. Mencken
From: Jacek Dziedzic on
Paul Pluzhnikov pisze:
> Jacek Dziedzic <jacek.dziedzic__no--spam__(a)gmail.com> writes:
>
>>> If you run the program under gdb from the start, it should stop with
>>> "program received SIGSEGV" message *before* FORTRAN runtime had
>>> any chance to catch/handle the signal.
>> Yep, I know it should. It does, in fact, do that for
>> my C and C++ programs.
>
> Then it is likely that your FORTRAN program is *not* getting a
> SIGSEGV; that you've misinterpreted what you observed, and misled
> everybody.

I don't think I misled anyone, as the program crashes
the Fotran RTL says:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
a.out 0000000000450605 Unknown Unknown Unknown
a.out 00000000004117BB Unknown Unknown Unknown
a.out 0000000000406D37 Unknown Unknown Unknown
a.out 00000000004063A2 Unknown Unknown Unknown
libc.so.6 00002B7BE8C234CA Unknown Unknown Unknown
a.out 00000000004062EA Unknown Unknown Unknown
rank 0 in job 1 galera_44926 caused collective abort of all ranks
exit status of rank 0: killed by signal 9

> [It is generally not possible for a program to receive SIGSEGV and
> for debugger attached to it to not notice.]

OK, so how can we explain the above?

> Perhaps there are child processes involved?

Yes there are -- the program runs under MPI, so an MPI
starter script launches copies of the code on two cores.
So in fact, under gdb I run the python executable
("file python"), passing it the name of the python script
that starts the MPI jobs. There has to be a lot of forking
here.

> Try gdb 'catch fork' command to find out.
>
> Run your program under gdb, record your entire interaction
> (e.g. using script(1)) including all input and output, and post
> it here.
>
> Or run it under 'strace -ff -o junk.trace ./a.out', and examine
> the trace this generates.

I will, tomorrow.

cheers,
- J.