|
Prev: module or interface
Next: C preprocessor
From: Infinity77 on 5 May 2008 12:54 Hi All, I have an unformatted file, and I open it for reading in big endian mode. The file has a repetitive structure (is a time step report for a simulation): at the beginning of every time step, there is this pseudo-declaration: 'PARAMS' keyword number kind where "keyword" is an 8 character string, "number" is an integer which tells me how much data this time step contains, and "kind" is a 4 character string which tells me the data type (i.e., 'CHAR', 'REAL' and so on). After this declaration, the data for the current time step follows. As an example: 'PARAMS ' 4200 'REAL' 0.5372 3.7393 0.3273 .... (4200 real numbers) With the file I am working on, this 4200 stays constant for every time step in the simulation. The numerical arrays are divided into blocks of up to 1000 items each: so, in my previous example, I suppose I can not read the whole 4200 real numbers in one shot, I have to loop 4 times and read 1000 items every time, plus read the remaining 200 as last step. I have written the following routine to read and store the data from this file. Is there anything I could do to speed up the reading/ storing of the data? Thank you very much for your suggestions and comments. I apologize for the long post. Andrea. ! Begin Code subroutine ReadSMSPEC(fileName, miniSteps, dimens, matrix) ! PARAMETERS: ! fileName: the file containing the data ! miniSteps: the number of time steps in the simulation ! dimens: the number of data in each time step ! matrix: a dimens x miniStep 2D array of real data character*1000, intent(in) :: fileName integer, intent(in) :: miniSteps, dimens real(4), intent(out) :: matrix(dimens, miniSteps) integer keywordNumber, internalCount, readBuffer, matrixIndex character*8 keywordName character*4 keyworType logical feof ! Open the file open(unit=1, file=fileName, form='UNFORMATTED', convert='BIG_ENDIAN') feof = .false. ! The numerical arrays are divided into blocks of up to 1000 items each readBuffer = int(dimens/1000) do while (.not.feof) ! Read the keyword, number and kind read(1, end=40) keywordName, keywordNumber, keywordType ! I look for the 'PARAMS' keyword if (keywordName == 'PARAMS') then ! A new time step has been found ! Increase the index in the matrix matrixIndex = matrixIndex + 1 internalCount = 0 ! Loop by reading a block of 1000 items ! at a time do while (internalCount <= readBuffer) ! Read a block of 1000 items until ! you reach the end of this time step if (internalCount < readBuffer) then ! We are still far from the last block, ! Read 1000 items of data read(1, end=40) matrix(1000*internalCount +1:1000*(internalCount+1), matrixIndex) else ! We reached the last block of data: ! Read the remaining data, which can be ! less than 1000 items. read(1, end=40) matrix(1000*internalCount+1:dimens, matrixIndex) endif internalCount = internalCount + 1 enddo endif enddo 40 continue close(1) return end subroutine readSMSPEC ! End Code
From: glen herrmannsfeldt on 5 May 2008 14:16 Infinity77 wrote: > I have an unformatted file, and I open it for reading in big > endian mode. The file has a repetitive structure (is a time step > report for a simulation): at the beginning of every time step, there > is this pseudo-declaration: > 'PARAMS' > keyword number kind > where "keyword" is an 8 character string, "number" is an integer which > tells me how much data this time step contains, and "kind" is a 4 > character string which tells me the data type (i.e., 'CHAR', 'REAL' > and so on). After this declaration, the data for the current time step > follows. As an example: When describing an UNFORMATTED file, be sure to indicate the record boundaries. Looking at the code, it seems that the keyword number kind indicator is a separate record, but that isn't obvious as described here. > 'PARAMS ' 4200 'REAL' > 0.5372 3.7393 0.3273 .... (4200 real numbers) > With the file I am working on, this 4200 stays constant for every time > step in the simulation. > The numerical arrays are divided into blocks of up to 1000 items each: > so, in my previous example, I suppose I can not read the whole 4200 > real numbers in one shot, I have to loop 4 times and read 1000 items > every time, plus read the remaining 200 as last step. I do like the idea of breaking it up into blocks. Most people don't do that which may require large input buffers, or cause other complications in reading the file. > I have written the following routine to read and store the data from > this file. Is there anything I could do to speed up the reading/ > storing of the data? (snip) > ! Read the keyword, number and kind > read(1, end=40) keywordName, keywordNumber, keywordType (snip) > read(1, end=40) matrix(1000*internalCount > +1:1000*(internalCount+1), matrixIndex) You ask about speeding it up, do you have reason to believe that it is slower than it should be? I/O is limited by how fast the data comes off the disk. Unformatted should be much faster than formatted, which is good. The READ could also be: read(1, end=40) (matrix(i,matrixIndex),i=1000*internalCount+1: @ 1000*(internalCount+1)) It might be that one is faster than the other for a particular compiler and library. I probably would have used numeric values instead of character for keywordname and keyword type. A little less descriptive, but a little easier to process. I would have done the loops differently, using ordinary DO instead of DO WHILE, but that isn't likely to change the speed much. -- glen
From: glen herrmannsfeldt on 6 May 2008 17:39 Janne Blomqvist wrote: > On 2008-05-05, glen herrmannsfeldt <gah(a)ugcs.caltech.edu> wrote: >>The READ could also be: >> read(1, end=40) (matrix(i,matrixIndex),i=1000*internalCount+1: >> @ 1000*(internalCount+1)) >>It might be that one is faster than the other for a particular >>compiler and library. I said one might be faster, but not which one. > At least for gfortran the opposite is true. The implied do loop above > is handled by the frontend, IIRC converting it to (roughly) the > equivalent do loop like > do i = 1000*internalCount+1, 1000*(internalCount+1) > read (1, end=40) matrix(i,matrixIndex) > end do Well, it can't be quite like that because of the record boundaries, but one call to the I/O routine per array element I do understand. > So you have the overhead of a read statement, which is considerable, > for each element in the array. OTOH by reading array slices like the > original code, the runtime library gets a pointer to the array > descriptor (a modified descriptor describing the slice, not the > original one) and has the opportunity to read multiple elements at a > time. The usual Fortran I/O routines have one subroutine call to start, supplying unit number and possibly format information, then one call for each I/O list element (or possibly array element), then one to finish the operation. I remember when the OS/360 Fortran library was changed to do implied DO as one call instead of one per element. It was a big deal because it made newly compiled programs incompatible with old versions of the library. (The general rule is that libraries are back compatible but not forward compatible. Most of the time, though, older versions will still work.) > I think there's a PR about this in the gcc bugzilla somewhere. But > it's not really easy to fix, it would probably require either the > frontend to convert the implied do loop into the equivalent array > descriptor, or to create some kind of iterator structure and pass that > to the library, or to have some runtime interpreter for implied do > loops. All three approaches would require a fair amount of work, I > think. > Ah, here it is: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35339 > Now, of course, other compilers might do it better. For the simple cases of one array inside an implied DO it should be pretty easy, and I would expect compilers to be able to do that. For more complicated ones, READ(1) ((X(I,J),Y(J,I),I=1,N),Z(J),J=1,M) maybe I would not be surprised if it was done with one subroutine call per element. Even so, other than the subroutine call overhead it shouldn't be so slow. -- glen
From: Janne Blomqvist on 6 May 2008 04:34 On 2008-05-05, Infinity77 <andrea.gavana(a)gmail.com> wrote: > I have an unformatted file, and I open it for reading in big > endian mode. If you're not going to read/write the file on multiple platforms with different native endianness, get rid of the CONVERT= stuff. Converting endianness on the fly has much more overhead than just streaming the data directly into application memory. > The file has a repetitive structure (is a time step > report for a simulation): at the beginning of every time step, there > is this pseudo-declaration: > > 'PARAMS' > keyword number kind > > where "keyword" is an 8 character string, "number" is an integer which > tells me how much data this time step contains, and "kind" is a 4 > character string which tells me the data type (i.e., 'CHAR', 'REAL' > and so on). After this declaration, the data for the current time step > follows. As an example: > > 'PARAMS ' 4200 'REAL' > 0.5372 3.7393 0.3273 .... (4200 real numbers) > > With the file I am working on, this 4200 stays constant for every time > step in the simulation. > The numerical arrays are divided into blocks of up to 1000 items each: > so, in my previous example, I suppose I can not read the whole 4200 > real numbers in one shot, I have to loop 4 times and read 1000 items > every time, plus read the remaining 200 as last step. I'm a bit confused here. In your code you don't actually use the 'keywordNumber' variable for anything, rather you just seem to assume that the input variable 'dimens' is equal to that? And yes, with a single READ statement you can only read one record at a time. If you control the output of the file as well, then it could be slightly more efficient to have the first record contain the dimensions of the matrix, and then the second record the entire matrix, which could then be read with a single statement. Of course, this breaks down if the data set grows so large that you have to resort to, say, processing one time step at a time rather than reading the entire file into memory and then do all the processing. In any case, with tricks like avoiding endian conversion and reducing the number of read statements you can reduce the cpu overhead, but unless you have a very slow cpu or a very fast disk array, you're probably limited by disk speed anyway. -- Janne Blomqvist
From: Janne Blomqvist on 6 May 2008 04:48 On 2008-05-05, glen herrmannsfeldt <gah(a)ugcs.caltech.edu> wrote: > The READ could also be: > > read(1, end=40) (matrix(i,matrixIndex),i=1000*internalCount+1: > @ 1000*(internalCount+1)) > > It might be that one is faster than the other for a particular > compiler and library. At least for gfortran the opposite is true. The implied do loop above is handled by the frontend, IIRC converting it to (roughly) the equivalent do loop like do i = 1000*internalCount+1, 1000*(internalCount+1) read (1, end=40) matrix(i,matrixIndex) end do So you have the overhead of a read statement, which is considerable, for each element in the array. OTOH by reading array slices like the original code, the runtime library gets a pointer to the array descriptor (a modified descriptor describing the slice, not the original one) and has the opportunity to read multiple elements at a time. I think there's a PR about this in the gcc bugzilla somewhere. But it's not really easy to fix, it would probably require either the frontend to convert the implied do loop into the equivalent array descriptor, or to create some kind of iterator structure and pass that to the library, or to have some runtime interpreter for implied do loops. All three approaches would require a fair amount of work, I think. Ah, here it is: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35339 Now, of course, other compilers might do it better. -- Janne Blomqvist
|
Pages: 1 Prev: module or interface Next: C preprocessor |