From: relaxmike on
Hi all,

I work on a Monte-Carlo simulation tool, which is based on the
fortran extension "system". At the core of the simulation, there is
a loop which calls an external program which simulates one
particular randomized set of parameters :

do i=1,100000
sysres = system("externaltool.exe")
enddo

The problem is that the simulation fails at a random number of of
iterations,
sometimes 32 000, sometimes less. A message appears in the console,
which depends on the run, for example :
forrtl: severe (9): permission to access file denied, unit 32,
file ....
forrtl: severe (29): file not found, unit 7, file ...
etc...

I cannot write here the full source code because it is quite
complicated.
The problem appears on Windows XP, with Intel Fortran 8.1.
Strangely, it does not appear on Windows with either gfortran or g95,
and I have no explanation for that fact.
The bug may be a bug in our source code, a bug in the compilers, or
a bug in the OS.
I suspect that the "system" extension is based directly on native
windows API with Intel and that the gfortran and g95 are using a
gnu windows interface to the native system : that would explain the
difference in the behaviour.
On Linux, with either gfortran or g95, the same problems does not
appear.

I tried several methods, including a dynamic computing of the fortran
logical units, but the problem seems to be elsewhere.

To further inquire, I tried to reproduce the problem with a smaller
set of source code. Unfortunately, I was not able to reproduce the
bug,
but, and I experienced strange behaviours.
It appears that the performances between Linux and Windows with
respect
to the "system" extension are really different.

This is the sample source code.
The external tool is replaced by the following source code :

program toto
implicit none
write(33,*) 'toto'
end program toto

which is compiled in optimized mode (called Release in Intel Fortran).
The Monte-Carlo simulation tool is replaced by the following source
code :

program appel
use ifport
implicit none
integer i , sysres
integer :: timestart, timestop, count,countrate,countmax
real :: timer_elapse
call system_clock(timestart)
do i=1,100
write(32,*) i
sysres = system("toto.exe")
write(32,*) "result:", sysres
enddo
call system_clock(timestop)
call system_clock(count,countrate,countmax)
timer_elapse = real(timestop - timestart)/countrate
write(32,*) "elapsed : ", timer_elapse
end program appel

Notice that in Intel Fortran, the system extension is provided as a
function.
The following table contains the elapsed times measured on my
"average"
Pentium D dual core PC, by experimenting several values of the number
of
iterations :

PC: Dell Optiplex GX520
OS: Windows
Compiler: IVF8
10 iterations : 0.531
100 iterations : 4.890
1000 iterations : 51.187

I compiled the same source code with g95 and the following set
of commands :

g95 -O2 -c toto.f90
g95 -O2 -o toto.exe toto.o
g95 -O2 -c appel.f90
g95 -O2 -o appel.exe appel.o

Of course, one must remove the "use ifport" line.

PC: Dell Optiplex GX520
OS: Windows
Compiler: g95
10 iterations : 0.6406
100 iterations : 4.9844
1000 iterations : 51.2034

I then made the same tests under Linux.
One only have to change the call to the executable so that it can be
found in the current directory :
sysres = system("./toto.exe")

PC: Dell Precision 380
OS: Linux
Compiler: g95
10 iterations : 0.025 (=time win32 / 25)
100 iterations : 0.3027 (=time win32 / 16)
1000 iterations : 2.5327 (=time win32 / 20)

With the same compiler, the difference of performances between
Linux and Windows is approximately 20x !
Of course, the machine is different but I think that the main
difference comes from the OS.

Any help will be appreciated.

Regards,

Michaël
From: Arjen Markus on
On 3 jul, 12:10, relaxmike <michael.bau...(a)gmail.com> wrote:
> Hi all,
>
> I work on a Monte-Carlo simulation tool, which is based on the
> fortran extension "system". At the core of the simulation, there is
> a loop which calls an external program which simulates one
> particular randomized set of parameters :
>
>   do i=1,100000
>      sysres = system("externaltool.exe")
>   enddo
>
> The problem is that the simulation fails at a random number of of
> iterations,
> sometimes 32 000, sometimes less. A message appears in the console,
> which depends on the run, for example :
> forrtl: severe (9): permission to access file denied, unit 32,
> file ....
> forrtl: severe (29): file not found, unit 7, file ...
> etc...
>
> I cannot write here the full source code because it is quite
> complicated.
> The problem appears on Windows XP, with Intel Fortran 8.1.
> Strangely, it does not appear on Windows with either gfortran or g95,
> and I have no explanation for that fact.
> The bug may be a bug in our source code, a bug in the compilers, or
> a bug in the OS.
> I suspect that the "system" extension is based directly on native
> windows API with Intel and that the gfortran and g95 are using a
> gnu windows interface to the native system : that would explain the
> difference in the behaviour.
> On Linux, with either gfortran or g95, the same problems does not
> appear.
>
> I tried several methods, including a dynamic computing of the fortran
> logical units, but the problem seems to be elsewhere.
>
> To further inquire, I tried to reproduce the problem with a smaller
> set of source code. Unfortunately, I was not able to reproduce the
> bug,
> but, and I experienced strange behaviours.
> It appears that the performances between Linux and Windows with
> respect
> to the "system" extension are really different.
>
> This is the sample source code.
> The external tool is replaced by the following source code :
>
>     program toto
>     implicit none
>     write(33,*) 'toto'
>     end program toto
>
> which is compiled in optimized mode (called Release in Intel Fortran).
> The Monte-Carlo simulation tool is replaced by the following source
> code :
>
> program appel
>   use ifport
>   implicit none
>   integer i , sysres
>   integer :: timestart, timestop, count,countrate,countmax
>   real :: timer_elapse
>   call system_clock(timestart)
>   do i=1,100
>      write(32,*) i
>      sysres = system("toto.exe")
>      write(32,*) "result:", sysres
>   enddo
>   call system_clock(timestop)
>   call system_clock(count,countrate,countmax)
>   timer_elapse = real(timestop - timestart)/countrate
>   write(32,*) "elapsed : ", timer_elapse
> end program appel
>
> Notice that in Intel Fortran, the system extension is provided as a
> function.
> The following table contains the elapsed times measured on my
> "average"
> Pentium D dual core PC, by experimenting several values of the number
> of
> iterations :
>
> PC: Dell Optiplex GX520
> OS: Windows
> Compiler: IVF8
> 10 iterations : 0.531
> 100 iterations : 4.890
> 1000 iterations : 51.187
>
> I compiled the same source code with g95 and the following set
> of commands :
>
> g95 -O2 -c toto.f90
> g95 -O2 -o toto.exe toto.o
> g95 -O2 -c appel.f90
> g95 -O2 -o appel.exe appel.o
>
> Of course, one must remove the "use ifport" line.
>
> PC: Dell Optiplex GX520
> OS: Windows
> Compiler: g95
> 10 iterations : 0.6406
> 100 iterations : 4.9844
> 1000 iterations : 51.2034
>
> I then made the same tests under Linux.
> One only have to change the call to the executable so that it can be
> found in the current directory :
>    sysres = system("./toto.exe")
>
> PC: Dell Precision 380
> OS: Linux
> Compiler: g95
> 10 iterations : 0.025 (=time win32 / 25)
> 100 iterations : 0.3027 (=time win32 / 16)
> 1000 iterations : 2.5327 (=time win32 / 20)
>
> With the same compiler, the difference of performances between
> Linux and Windows is approximately 20x !
> Of course, the machine is different but I think that the main
> difference comes from the OS.
>
> Any help will be appreciated.
>
> Regards,
>
> Michaël

I can reproduce this problem with the programs you posted.
Starting the program fails after 67000+ iterations. Odd,
but that of course does not solve the problem.

Regards,

Arjen
From: relaxmike on
Your machine must be faster than mine : I had not the
time to wait for so long...
Anyway, even if that does not solve the problem,
I feel less alone, which is the only positive point so far,
so thank you !
From: Michel Olagnon on
relaxmike wrote:
> Hi all,
>
> I work on a Monte-Carlo simulation tool, which is based on the
> fortran extension "system". At the core of the simulation, there is
> a loop which calls an external program which simulates one
> particular randomized set of parameters :
>
> do i=1,100000
> sysres = system("externaltool.exe")
> enddo
>
> The problem is that the simulation fails at a random number of of
> iterations,
> sometimes 32 000, sometimes less. A message appears in the console,
> which depends on the run, for example :
> forrtl: severe (9): permission to access file denied, unit 32,
> file ....
> forrtl: severe (29): file not found, unit 7, file ...
> etc...
>
> I cannot write here the full source code because it is quite
> complicated.
> The problem appears on Windows XP, with Intel Fortran 8.1.
> Strangely, it does not appear on Windows with either gfortran or g95,
> and I have no explanation for that fact.
> The bug may be a bug in our source code, a bug in the compilers, or
> a bug in the OS.


In my experience, this sort of call in a loop may allocate resources
(buffers, or tables of child process numbers, or whatsoever)
that may be kept (or even only not garbage collected) until the
program terminates, so it is usually better to avoid spanning
too many processes from a single program.

From: Arjen Markus on
On 3 jul, 15:38, relaxmike <michael.bau...(a)gmail.com> wrote:
> Your machine must be faster than mine : I had not the
> time to wait for so long...

Not sure :). I let it run and went on with other things
- it is a dual core machine.

> Anyway, even if that does not solve the problem,
> I feel less alone, which is the only positive point so far,
> so thank you !

I have a feeling it has to do with memory exhaustion -
each call to system() taking away a bit of memory.
Perhaps an alternative set-up might work:

Start the toto program once (in the background) and let that run
a loop, reading the input as it is provided by the MC loop.
Something along these lines:

MC:
do i = 1,many
remove file "done" (open/close with status='delete',
as you know :))
write input
write file "ready"
check for a new file "done"
enddo

Computational program:

do
wait for file "ready"
remove file "ready"
do computation
report result
write file "done"
enddo

Regards,

Arjen