|
From: relaxmike on 3 Jul 2008 06:10 Hi all, I work on a Monte-Carlo simulation tool, which is based on the fortran extension "system". At the core of the simulation, there is a loop which calls an external program which simulates one particular randomized set of parameters : do i=1,100000 sysres = system("externaltool.exe") enddo The problem is that the simulation fails at a random number of of iterations, sometimes 32 000, sometimes less. A message appears in the console, which depends on the run, for example : forrtl: severe (9): permission to access file denied, unit 32, file .... forrtl: severe (29): file not found, unit 7, file ... etc... I cannot write here the full source code because it is quite complicated. The problem appears on Windows XP, with Intel Fortran 8.1. Strangely, it does not appear on Windows with either gfortran or g95, and I have no explanation for that fact. The bug may be a bug in our source code, a bug in the compilers, or a bug in the OS. I suspect that the "system" extension is based directly on native windows API with Intel and that the gfortran and g95 are using a gnu windows interface to the native system : that would explain the difference in the behaviour. On Linux, with either gfortran or g95, the same problems does not appear. I tried several methods, including a dynamic computing of the fortran logical units, but the problem seems to be elsewhere. To further inquire, I tried to reproduce the problem with a smaller set of source code. Unfortunately, I was not able to reproduce the bug, but, and I experienced strange behaviours. It appears that the performances between Linux and Windows with respect to the "system" extension are really different. This is the sample source code. The external tool is replaced by the following source code : program toto implicit none write(33,*) 'toto' end program toto which is compiled in optimized mode (called Release in Intel Fortran). The Monte-Carlo simulation tool is replaced by the following source code : program appel use ifport implicit none integer i , sysres integer :: timestart, timestop, count,countrate,countmax real :: timer_elapse call system_clock(timestart) do i=1,100 write(32,*) i sysres = system("toto.exe") write(32,*) "result:", sysres enddo call system_clock(timestop) call system_clock(count,countrate,countmax) timer_elapse = real(timestop - timestart)/countrate write(32,*) "elapsed : ", timer_elapse end program appel Notice that in Intel Fortran, the system extension is provided as a function. The following table contains the elapsed times measured on my "average" Pentium D dual core PC, by experimenting several values of the number of iterations : PC: Dell Optiplex GX520 OS: Windows Compiler: IVF8 10 iterations : 0.531 100 iterations : 4.890 1000 iterations : 51.187 I compiled the same source code with g95 and the following set of commands : g95 -O2 -c toto.f90 g95 -O2 -o toto.exe toto.o g95 -O2 -c appel.f90 g95 -O2 -o appel.exe appel.o Of course, one must remove the "use ifport" line. PC: Dell Optiplex GX520 OS: Windows Compiler: g95 10 iterations : 0.6406 100 iterations : 4.9844 1000 iterations : 51.2034 I then made the same tests under Linux. One only have to change the call to the executable so that it can be found in the current directory : sysres = system("./toto.exe") PC: Dell Precision 380 OS: Linux Compiler: g95 10 iterations : 0.025 (=time win32 / 25) 100 iterations : 0.3027 (=time win32 / 16) 1000 iterations : 2.5327 (=time win32 / 20) With the same compiler, the difference of performances between Linux and Windows is approximately 20x ! Of course, the machine is different but I think that the main difference comes from the OS. Any help will be appreciated. Regards, Michaël
From: Arjen Markus on 3 Jul 2008 09:07 On 3 jul, 12:10, relaxmike <michael.bau...(a)gmail.com> wrote: > Hi all, > > I work on a Monte-Carlo simulation tool, which is based on the > fortran extension "system". At the core of the simulation, there is > a loop which calls an external program which simulates one > particular randomized set of parameters : > > do i=1,100000 > sysres = system("externaltool.exe") > enddo > > The problem is that the simulation fails at a random number of of > iterations, > sometimes 32 000, sometimes less. A message appears in the console, > which depends on the run, for example : > forrtl: severe (9): permission to access file denied, unit 32, > file .... > forrtl: severe (29): file not found, unit 7, file ... > etc... > > I cannot write here the full source code because it is quite > complicated. > The problem appears on Windows XP, with Intel Fortran 8.1. > Strangely, it does not appear on Windows with either gfortran or g95, > and I have no explanation for that fact. > The bug may be a bug in our source code, a bug in the compilers, or > a bug in the OS. > I suspect that the "system" extension is based directly on native > windows API with Intel and that the gfortran and g95 are using a > gnu windows interface to the native system : that would explain the > difference in the behaviour. > On Linux, with either gfortran or g95, the same problems does not > appear. > > I tried several methods, including a dynamic computing of the fortran > logical units, but the problem seems to be elsewhere. > > To further inquire, I tried to reproduce the problem with a smaller > set of source code. Unfortunately, I was not able to reproduce the > bug, > but, and I experienced strange behaviours. > It appears that the performances between Linux and Windows with > respect > to the "system" extension are really different. > > This is the sample source code. > The external tool is replaced by the following source code : > > program toto > implicit none > write(33,*) 'toto' > end program toto > > which is compiled in optimized mode (called Release in Intel Fortran). > The Monte-Carlo simulation tool is replaced by the following source > code : > > program appel > use ifport > implicit none > integer i , sysres > integer :: timestart, timestop, count,countrate,countmax > real :: timer_elapse > call system_clock(timestart) > do i=1,100 > write(32,*) i > sysres = system("toto.exe") > write(32,*) "result:", sysres > enddo > call system_clock(timestop) > call system_clock(count,countrate,countmax) > timer_elapse = real(timestop - timestart)/countrate > write(32,*) "elapsed : ", timer_elapse > end program appel > > Notice that in Intel Fortran, the system extension is provided as a > function. > The following table contains the elapsed times measured on my > "average" > Pentium D dual core PC, by experimenting several values of the number > of > iterations : > > PC: Dell Optiplex GX520 > OS: Windows > Compiler: IVF8 > 10 iterations : 0.531 > 100 iterations : 4.890 > 1000 iterations : 51.187 > > I compiled the same source code with g95 and the following set > of commands : > > g95 -O2 -c toto.f90 > g95 -O2 -o toto.exe toto.o > g95 -O2 -c appel.f90 > g95 -O2 -o appel.exe appel.o > > Of course, one must remove the "use ifport" line. > > PC: Dell Optiplex GX520 > OS: Windows > Compiler: g95 > 10 iterations : 0.6406 > 100 iterations : 4.9844 > 1000 iterations : 51.2034 > > I then made the same tests under Linux. > One only have to change the call to the executable so that it can be > found in the current directory : > sysres = system("./toto.exe") > > PC: Dell Precision 380 > OS: Linux > Compiler: g95 > 10 iterations : 0.025 (=time win32 / 25) > 100 iterations : 0.3027 (=time win32 / 16) > 1000 iterations : 2.5327 (=time win32 / 20) > > With the same compiler, the difference of performances between > Linux and Windows is approximately 20x ! > Of course, the machine is different but I think that the main > difference comes from the OS. > > Any help will be appreciated. > > Regards, > > Michaël I can reproduce this problem with the programs you posted. Starting the program fails after 67000+ iterations. Odd, but that of course does not solve the problem. Regards, Arjen
From: relaxmike on 3 Jul 2008 09:38 Your machine must be faster than mine : I had not the time to wait for so long... Anyway, even if that does not solve the problem, I feel less alone, which is the only positive point so far, so thank you !
From: Michel Olagnon on 3 Jul 2008 09:43 relaxmike wrote: > Hi all, > > I work on a Monte-Carlo simulation tool, which is based on the > fortran extension "system". At the core of the simulation, there is > a loop which calls an external program which simulates one > particular randomized set of parameters : > > do i=1,100000 > sysres = system("externaltool.exe") > enddo > > The problem is that the simulation fails at a random number of of > iterations, > sometimes 32 000, sometimes less. A message appears in the console, > which depends on the run, for example : > forrtl: severe (9): permission to access file denied, unit 32, > file .... > forrtl: severe (29): file not found, unit 7, file ... > etc... > > I cannot write here the full source code because it is quite > complicated. > The problem appears on Windows XP, with Intel Fortran 8.1. > Strangely, it does not appear on Windows with either gfortran or g95, > and I have no explanation for that fact. > The bug may be a bug in our source code, a bug in the compilers, or > a bug in the OS. In my experience, this sort of call in a loop may allocate resources (buffers, or tables of child process numbers, or whatsoever) that may be kept (or even only not garbage collected) until the program terminates, so it is usually better to avoid spanning too many processes from a single program.
From: Arjen Markus on 3 Jul 2008 09:45
On 3 jul, 15:38, relaxmike <michael.bau...(a)gmail.com> wrote: > Your machine must be faster than mine : I had not the > time to wait for so long... Not sure :). I let it run and went on with other things - it is a dual core machine. > Anyway, even if that does not solve the problem, > I feel less alone, which is the only positive point so far, > so thank you ! I have a feeling it has to do with memory exhaustion - each call to system() taking away a bit of memory. Perhaps an alternative set-up might work: Start the toto program once (in the background) and let that run a loop, reading the input as it is provided by the MC loop. Something along these lines: MC: do i = 1,many remove file "done" (open/close with status='delete', as you know :)) write input write file "ready" check for a new file "done" enddo Computational program: do wait for file "ready" remove file "ready" do computation report result write file "done" enddo Regards, Arjen |