From: Infinity77 on
Hi All,

I have an unformatted file, and I open it for reading in big
endian mode. The file has a repetitive structure (is a time step
report for a simulation): at the beginning of every time step, there
is this pseudo-declaration:

'PARAMS'
keyword number kind

where "keyword" is an 8 character string, "number" is an integer which
tells me how much data this time step contains, and "kind" is a 4
character string which tells me the data type (i.e., 'CHAR', 'REAL'
and so on). After this declaration, the data for the current time step
follows. As an example:

'PARAMS ' 4200 'REAL'
0.5372 3.7393 0.3273 .... (4200 real numbers)

With the file I am working on, this 4200 stays constant for every time
step in the simulation.
The numerical arrays are divided into blocks of up to 1000 items each:
so, in my previous example, I suppose I can not read the whole 4200
real numbers in one shot, I have to loop 4 times and read 1000 items
every time, plus read the remaining 200 as last step.

I have written the following routine to read and store the data from
this file. Is there anything I could do to speed up the reading/
storing of the data?

Thank you very much for your suggestions and comments. I apologize for
the long post.

Andrea.

! Begin Code

subroutine ReadSMSPEC(fileName, miniSteps, dimens, matrix)

! PARAMETERS:
! fileName: the file containing the data
! miniSteps: the number of time steps in the simulation
! dimens: the number of data in each time step
! matrix: a dimens x miniStep 2D array of real data

character*1000, intent(in) :: fileName
integer, intent(in) :: miniSteps, dimens
real(4), intent(out) :: matrix(dimens, miniSteps)

integer keywordNumber, internalCount, readBuffer, matrixIndex

character*8 keywordName
character*4 keyworType

logical feof

! Open the file
open(unit=1, file=fileName, form='UNFORMATTED', convert='BIG_ENDIAN')

feof = .false.

! The numerical arrays are divided into blocks of up to 1000 items
each
readBuffer = int(dimens/1000)

do while (.not.feof)

! Read the keyword, number and kind
read(1, end=40) keywordName, keywordNumber, keywordType

! I look for the 'PARAMS' keyword
if (keywordName == 'PARAMS') then

! A new time step has been found
! Increase the index in the matrix
matrixIndex = matrixIndex + 1

internalCount = 0

! Loop by reading a block of 1000 items
! at a time

do while (internalCount <= readBuffer)

! Read a block of 1000 items until
! you reach the end of this time step

if (internalCount < readBuffer) then

! We are still far from the last block,
! Read 1000 items of data

read(1, end=40) matrix(1000*internalCount
+1:1000*(internalCount+1), matrixIndex)

else

! We reached the last block of data:
! Read the remaining data, which can be
! less than 1000 items.

read(1, end=40) matrix(1000*internalCount+1:dimens, matrixIndex)

endif

internalCount = internalCount + 1

enddo

endif

enddo

40 continue

close(1)
return

end subroutine readSMSPEC

! End Code
From: glen herrmannsfeldt on
Infinity77 wrote:

> I have an unformatted file, and I open it for reading in big
> endian mode. The file has a repetitive structure (is a time step
> report for a simulation): at the beginning of every time step, there
> is this pseudo-declaration:

> 'PARAMS'
> keyword number kind

> where "keyword" is an 8 character string, "number" is an integer which
> tells me how much data this time step contains, and "kind" is a 4
> character string which tells me the data type (i.e., 'CHAR', 'REAL'
> and so on). After this declaration, the data for the current time step
> follows. As an example:

When describing an UNFORMATTED file, be sure to indicate the
record boundaries. Looking at the code, it seems that the
keyword number kind indicator is a separate record, but that isn't
obvious as described here.

> 'PARAMS ' 4200 'REAL'
> 0.5372 3.7393 0.3273 .... (4200 real numbers)

> With the file I am working on, this 4200 stays constant for every time
> step in the simulation.

> The numerical arrays are divided into blocks of up to 1000 items each:
> so, in my previous example, I suppose I can not read the whole 4200
> real numbers in one shot, I have to loop 4 times and read 1000 items
> every time, plus read the remaining 200 as last step.

I do like the idea of breaking it up into blocks. Most people
don't do that which may require large input buffers, or cause other
complications in reading the file.

> I have written the following routine to read and store the data from
> this file. Is there anything I could do to speed up the reading/
> storing of the data?
(snip)

> ! Read the keyword, number and kind
> read(1, end=40) keywordName, keywordNumber, keywordType

(snip)
> read(1, end=40) matrix(1000*internalCount
> +1:1000*(internalCount+1), matrixIndex)

You ask about speeding it up, do you have reason to believe
that it is slower than it should be? I/O is limited by how fast
the data comes off the disk. Unformatted should be much faster than
formatted, which is good. The READ could also be:

read(1, end=40) (matrix(i,matrixIndex),i=1000*internalCount+1:
@ 1000*(internalCount+1))

It might be that one is faster than the other for a particular
compiler and library.

I probably would have used numeric values instead of character
for keywordname and keyword type. A little less descriptive,
but a little easier to process.

I would have done the loops differently, using ordinary DO instead
of DO WHILE, but that isn't likely to change the speed much.

-- glen

From: glen herrmannsfeldt on
Janne Blomqvist wrote:

> On 2008-05-05, glen herrmannsfeldt <gah(a)ugcs.caltech.edu> wrote:
>>The READ could also be:

>> read(1, end=40) (matrix(i,matrixIndex),i=1000*internalCount+1:
>> @ 1000*(internalCount+1))

>>It might be that one is faster than the other for a particular
>>compiler and library.

I said one might be faster, but not which one.

> At least for gfortran the opposite is true. The implied do loop above
> is handled by the frontend, IIRC converting it to (roughly) the
> equivalent do loop like

> do i = 1000*internalCount+1, 1000*(internalCount+1)
> read (1, end=40) matrix(i,matrixIndex)
> end do

Well, it can't be quite like that because of the record
boundaries, but one call to the I/O routine per array element
I do understand.

> So you have the overhead of a read statement, which is considerable,
> for each element in the array. OTOH by reading array slices like the
> original code, the runtime library gets a pointer to the array
> descriptor (a modified descriptor describing the slice, not the
> original one) and has the opportunity to read multiple elements at a
> time.

The usual Fortran I/O routines have one subroutine call to start,
supplying unit number and possibly format information, then one
call for each I/O list element (or possibly array element), then
one to finish the operation.

I remember when the OS/360 Fortran library was changed to do
implied DO as one call instead of one per element. It was
a big deal because it made newly compiled programs incompatible
with old versions of the library. (The general rule is that
libraries are back compatible but not forward compatible.
Most of the time, though, older versions will still work.)

> I think there's a PR about this in the gcc bugzilla somewhere. But
> it's not really easy to fix, it would probably require either the
> frontend to convert the implied do loop into the equivalent array
> descriptor, or to create some kind of iterator structure and pass that
> to the library, or to have some runtime interpreter for implied do
> loops. All three approaches would require a fair amount of work, I
> think.

> Ah, here it is:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35339

> Now, of course, other compilers might do it better.

For the simple cases of one array inside an implied DO
it should be pretty easy, and I would expect compilers
to be able to do that. For more complicated ones,

READ(1) ((X(I,J),Y(J,I),I=1,N),Z(J),J=1,M)

maybe I would not be surprised if it was done with
one subroutine call per element. Even so, other than
the subroutine call overhead it shouldn't be so slow.

-- glen

From: Janne Blomqvist on
On 2008-05-05, Infinity77 <andrea.gavana(a)gmail.com> wrote:
> I have an unformatted file, and I open it for reading in big
> endian mode.

If you're not going to read/write the file on multiple platforms with
different native endianness, get rid of the CONVERT= stuff. Converting
endianness on the fly has much more overhead than just streaming the
data directly into application memory.

> The file has a repetitive structure (is a time step
> report for a simulation): at the beginning of every time step, there
> is this pseudo-declaration:
>
> 'PARAMS'
> keyword number kind
>
> where "keyword" is an 8 character string, "number" is an integer which
> tells me how much data this time step contains, and "kind" is a 4
> character string which tells me the data type (i.e., 'CHAR', 'REAL'
> and so on). After this declaration, the data for the current time step
> follows. As an example:
>
> 'PARAMS ' 4200 'REAL'
> 0.5372 3.7393 0.3273 .... (4200 real numbers)
>
> With the file I am working on, this 4200 stays constant for every time
> step in the simulation.
> The numerical arrays are divided into blocks of up to 1000 items each:
> so, in my previous example, I suppose I can not read the whole 4200
> real numbers in one shot, I have to loop 4 times and read 1000 items
> every time, plus read the remaining 200 as last step.

I'm a bit confused here. In your code you don't actually use the
'keywordNumber' variable for anything, rather you just seem to assume
that the input variable 'dimens' is equal to that?

And yes, with a single READ statement you can only read one record at
a time.

If you control the output of the file as well, then it could be
slightly more efficient to have the first record contain the
dimensions of the matrix, and then the second record the entire
matrix, which could then be read with a single statement. Of course,
this breaks down if the data set grows so large that you have to
resort to, say, processing one time step at a time rather than reading
the entire file into memory and then do all the processing.

In any case, with tricks like avoiding endian conversion and reducing
the number of read statements you can reduce the cpu overhead, but
unless you have a very slow cpu or a very fast disk array, you're
probably limited by disk speed anyway.

--
Janne Blomqvist
From: Janne Blomqvist on
On 2008-05-05, glen herrmannsfeldt <gah(a)ugcs.caltech.edu> wrote:
> The READ could also be:
>
> read(1, end=40) (matrix(i,matrixIndex),i=1000*internalCount+1:
> @ 1000*(internalCount+1))
>
> It might be that one is faster than the other for a particular
> compiler and library.

At least for gfortran the opposite is true. The implied do loop above
is handled by the frontend, IIRC converting it to (roughly) the
equivalent do loop like

do i = 1000*internalCount+1, 1000*(internalCount+1)
read (1, end=40) matrix(i,matrixIndex)
end do

So you have the overhead of a read statement, which is considerable,
for each element in the array. OTOH by reading array slices like the
original code, the runtime library gets a pointer to the array
descriptor (a modified descriptor describing the slice, not the
original one) and has the opportunity to read multiple elements at a
time.

I think there's a PR about this in the gcc bugzilla somewhere. But
it's not really easy to fix, it would probably require either the
frontend to convert the implied do loop into the equivalent array
descriptor, or to create some kind of iterator structure and pass that
to the library, or to have some runtime interpreter for implied do
loops. All three approaches would require a fair amount of work, I
think.

Ah, here it is:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35339

Now, of course, other compilers might do it better.


--
Janne Blomqvist
 | 
Pages: 1
Prev: module or interface
Next: C preprocessor