|
Prev: write
Next: Fortran 'read' statement question
From: deltaquattro on 17 Jun 2008 11:41 Hi, I was wondering whether global array operations, introduced in f90, can have a negative impact on performance. Compare: do i=1, ntheta r(0,i) = rhub r(nr+1,i) = rmax dt(0,i) = 0.0 dft(0,i) = 0.0 dfr(0,i) = 0.0 dt(nr+1,i) = 0.0 dft(nr+1,i) = 0.0 dfr(nr+1,i) = 0.0 end do with: r(0,:) = rhub r(nr+1,:) = rmax dt(0,:) = 0.0 dft(0,:) = 0.0 dfr(0,:) = 0.0 dt(nr+1,:) = 0.0 dft(nr+1,:) = 0.0 dfr(nr+1,:) = 0.0 I found the execution time of the latter to be higher than the former, as if many DO loops were executed instead than just one. Why use global array operations then? Isn't better to stick to old plain DO loops? Thanks, regards, deltaquattro
From: Dennis Wassel on 17 Jun 2008 13:21 On 17 Jun., 17:41, deltaquattro <deltaquat...(a)gmail.com> wrote: > Hi, > > I was wondering whether global array operations, introduced in f90, > can have a negative impact on performance. > > [snip] > > I found the execution time of the latter to be higher than the former, > as if many DO loops were executed instead than just one. Why use > global array operations then? Isn't better to stick to old plain DO > loops? Thanks, > > regards, > > deltaquattro This is quite a strange observation and raises some questions: 1) What optimisation options did you use? 2) Which compiler did you use? The gcc 4.0 and 4.1 Fortran compilers for instance are still pretty much in their infancy, so one would expect bugs and strange behaviour there. Use 4.2 or 4.3 instead, if you use gfortran. 3) How did you measure execution time? I find that accuarate timing on a computer is a nontrivial task. The 'time' command on my machine shows up to 200% variance. I can only assume you used some clever and appropriately precise way of measuring. I'm not a compiler specialist but AFAIK, array operations should not usually be slower than explicit loop constructs. Why? When using array operations like -say- x = MATMUL(A,b) in contrast to two nested DO-loops, the compiler has a greater amount of information at hand about what it is you want to do, which allows it to use more aggressive optimisation methods to generate code, or to generate calls to (more or less optimised) runtime libraries; the latter is done by all compilers I know. Additionally, the gfortran compiler has the '-fexternal-blas' option which tells the compiler to automagically generate calls to an optimised vendor BLAS for certain array operations, instead of using the runtime library. I've never tried this, but using a tuned ATLAS library will surely speed things up nicely. A second benefit of array operations is their conciseness. Take the MATMUL example again: A single call opposed to two nested DO-loops. Or think of copying part of an array into another array, anything really! IMHO, a lot of scientific code completely disregards maintainability issues for the sake of the highest possible degree of code optimisation. Using array operations makes your code more concise, more readable and therefore easier to maintain in the long run! It *should* also improve, or at least not hurt, performance. Cheers, Dennis
From: James Van Buskirk on 17 Jun 2008 13:55 "deltaquattro" <deltaquattro(a)gmail.com> wrote in message news:9c706700-2861-4d17-a3b8-7e2291fa0b5f(a)2g2000hsn.googlegroups.com... > do i=1, ntheta > r(0,i) = rhub > r(nr+1,i) = rmax > dt(0,i) = 0.0 > dft(0,i) = 0.0 > dfr(0,i) = 0.0 > dt(nr+1,i) = 0.0 > dft(nr+1,i) = 0.0 > dfr(nr+1,i) = 0.0 > end do > r(0,:) = rhub > r(nr+1,:) = rmax > dt(0,:) = 0.0 > dft(0,:) = 0.0 > dfr(0,:) = 0.0 > dt(nr+1,:) = 0.0 > dft(nr+1,:) = 0.0 > dfr(nr+1,:) = 0.0 > I found the execution time of the latter to be higher than the former, > as if many DO loops were executed instead than just one. Why use > global array operations then? Isn't better to stick to old plain DO > loops? Thanks, Normally an initialization loop like this one would be faster as separate loops than one fused loop because it's faster to access memory consecutively rather than jumping around as implied by the fused loop. However in this case the loops appear to be setting boundary values so they are traversing rows rather than columns of the arrays. As a consequence the code jumps around in memory no matter what the compiler does and loop fusion can win out because it implies less loop overhead which otherwise would be of negligible importance compared to memory access considerations (assuming that the data set is too large to fit in cache). One thing to investigate is whether the r(i,j), dt(i,j), dft(i,j), and dfr(i,j) always get accessed together. If so, you could group them as a derived type and the above loop could go 4X as fast as the structure of arrays code listed above. -- write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, & 6.0134700243160014d-154/),(/'x'/)); end
From: Richard Maine on 17 Jun 2008 14:31 deltaquattro <deltaquattro(a)gmail.com> wrote: > I was wondering whether global array operations, introduced in f90, > can have a negative impact on performance. Compare: > [elided initialization with DO loops and whole array operations] > I found the execution time of the latter to be higher than the former, > as if many DO loops were executed instead than just one. Why use > global array operations then? Isn't better to stick to old plain DO > loops? Thanks, Note that the usual terminology is something more like "whole array operations" or even just an unmodified "array operations" instead of "global". The main reasons are clarity and conciseness. If it doesn't help clarity and conciseness, don't do it. That is, no doubt, an oversimplification; there are exceptions, etc. But its a good first approximation. Every once in a while they might also get you faster execution, but if that is your primary reason for using them, and you don't have specific knowledge of exactly why to expect faster execution from your paticular case, then your efforts are probably misplaced. Execution time is actually not the sole measure of code "goodness". In many cases, it isn't even particularly high on the list of important things. Sometimes it isn't on the list at all. Other times it is at the very top of the list. All generalizations are false, your mileage may vary, etc. In that regard, is execution time of an initialization such as this actually significant in your code? While possible, that would be unusual, and might suggest that the choice of algorithms is less than ideal. There can be efficient algorithms like that, but they are rare. As Dennis says, it can be tricky to even measure execution times precisely enough to time initializations like this. I'm supposing that perhaps you are just using this as an example of more "interesting" cases. In answer to Dennis, by the way, it is *VERY* common for whole array operations to be slower than DO loops. No, it is not all all strange. It is much closer to the usual state of afairs. There are a whole host of reasons. 1. Compilers have had over 5 decades of time to develop techniques of optimizing loops. Progress has been made in that time. There has only been about a decade or two (some work preceeded the f90 standard; other compilers didn't really start until later) of significant work on optimizing array expressions. Things have improved and are still improving, but it just is not at the level of experience of DO-loop optimization. 2. Array temporaries are often a big deal in whole-array expressions. A naive (aka straightforward) applicaion of the rules very often involves such array temporaries, which are expensive in time. The compiler has to do a fair amount of work to figure out whether they can be elided. See point 1. That's probably not the case for your example, but it is a common one. 3. Your example illustrates the problem of "loop fusion". The naive (again aka straightforward) application of the rules for your code example *DOES* imply separate loop for each array operation (complete with all loop overhead). That's how the operations are defined. It is an optimization for the compiler to recognize when it can usefully fuse these multiple loops. See point 1. -- Richard Maine | Good judgement comes from experience; email: last name at domain . net | experience comes from bad judgement. domain: summertriangle | -- Mark Twain
From: Dennis Wassel on 17 Jun 2008 16:05
On 17 Jun., 20:31, nos...(a)see.signature (Richard Maine) wrote: > > [snipety-snip] > > In answer to Dennis, by the way, it is *VERY* common for whole array > operations to be slower than DO loops. No, it is not all all strange. It > is much closer to the usual state of afairs. There are a whole host of > reasons. Beats me. After all, when using array operations you have additional information either readily available, or you are able to extract it fairly easy, which is not always the case in DO loops. Talk about aliasing, strides, contingent memory locations etc. But then again, "See point 1". I actually find this hard to believe, but given that I'm rather new to the delightful post-77 Fortran world and haven't really done any serious benchmarking on array operations vs DO loops, I'll gladly trust your judgement on this. Thanks for enlightening us! > 1. Compilers have had over 5 decades of time to develop techniques of > optimizing loops. Progress has been made in that time. There has only > been about a decade or two (some work preceeded the f90 standard; other > compilers didn't really start until later) of significant work on > optimizing array expressions. Things have improved and are still > improving, but it just is not at the level of experience of DO-loop > optimization. OK, here's my newfound corner of gcc development that I feel like doing, as soon as I have more time on my hands than right now. After all, despite my earlier ramblings about conciseness and maintainability, performance DOES matter in many cases that are relevant to me :) > 2. Array temporaries are often a big deal in whole-array expressions. A > naive (aka straightforward) applicaion of the rules very often involves > such array temporaries, which are expensive in time. The compiler has to > do a fair amount of work to figure out whether they can be elided. See > point 1. That's probably not the case for your example, but it is a > common one. The Intel compiler (10.1, maybe earlier versions as well) throws a warning at runtime if it finds itself needing to create an array temporary; I found myself changing pieces of my code due to those warnings. Gonna have a look if the gfortran guys already have a feature request about this... > 3. Your example illustrates the problem of "loop fusion". The naive > (again aka straightforward) application of the rules for your code > example *DOES* imply separate loop for each array operation (complete > with all loop overhead). That's how the operations are defined. It is an > optimization for the compiler to recognize when it can usefully fuse > these multiple loops. See point 1. > > -- > Richard Maine | Good judgement comes from experience; > email: last name at domain . net | experience comes from bad judgement. > domain: summertriangle | -- Mark Twain |