From: John Paine on
Hi all,

I'm struggling with a problem in trying to perform direct access read/write
to a file using Intel 11.1.

What I want to do is:

1. Open a data file
2. Write a header block to the file defining the data in the file (this
block is of known size, but unknown content when I open the file)
3. Write multiple blocks of binary data of various types sequentially to the
file and accumulate statistics about the data such as range of values, size
of data blocks etc
4. Once all of the data blocks have been written, I then want to rewrite the
header block with the updated statistics and data block sizes etc
5. Close the file

In CVF, I used to do this using the dfwin interface (I'm not working
cross-platform) to Windows the routines lcreat, lwrite, llseek and lclose
and all worked just fine. Since I am now migrating my code to the Intel
compiler, I'm cleaning up my code base and would like to eliminate the
Windows routines and use standard Fortran IO.

Simplifying the problem to the barest bones to illustrate the task
schematically, the code looks like this:

implicit none

integer*4 i,j
integer*4 numy,numx(40)

real*4 zmin,zmax
real*4 zval(40)

c 1: open the file

open(10,file='test.dat',form='binary')

c 2: initialise and write out the header

numy=40
zmin=1e32
zmax=-1e32
do j=1,numy
numx(j)=0
end do

write(10)numy,zmin,zmax
write(10)(numx(j),j=1,numy)

c 3: loop over the data

do j=1,numy

c do stuff to create numx(j) data values in array zval
c NB numx(j) is determined as part of the calculation

numx(j)=j
do i=1,numx(j)
zval(i)=i
zmin=min(zmin,zval(i))
zmax=min(zmax,zval(i))
end do

c write out the data

write(10)(zval(i),i=1,numx(j))

end do

c 4: rewrite the header

rewind(10)
write(10)numy,zmin,zmax
write(10)(numx(j),j=1,numy)

c 5: close the file

close(10)

c reopen the file and read the data

open(10,file='test.dat',form='binary')

c this step works ok and the header is read correctly

read(10)numy,zmin,zmax
read(10)(numx(j),j=1,numy)

c this loop fails with an "end-of-file during read" error

do j=1,numy
read(10)(zval(i),i=1,numx(j))
end do
close(10)

end


This all works fine, except that step 4 truncates the file after the write
so all of the data records are lost. If I omit step 4, the data in the file
is exactly what I want, but the data counts and statistics in the header
record are not valid. Note, this example is greatly simplified, so answers
along the lines of "calculate the statistics before writing the data" while
perfectly valid, will not solve my difficulty. In the real case, the data
written out in step 3 is created in the loop and is expensive to calculate
and there is a lot of it. What I really want is to be able to rewrite the
header record which will not truncate the data file. Separating the header
and data is also an option, but my preference is to keep the two together in
the one file to ensure that the data can be read by other applications
further down the processing stream with no danger of the header file being
misplaced thus rendering the data file useless.

The behavior of the 'binary' write truncating the file is documentated by
Intel. They also include a mention of "direct access" which suggests that
direct access be used. This is a possibility as I could open the file in
step 1 with access='direct',recl=1 specified. But I then need to keep track
of where I am in the file in order to specify the record number to write
out. This would be OK for the above case as I know the position of the
header and can count the bytes as they are written out. But really I'd
prefer not to keep track of this myself (unless I really have to) as it
would be so much simpler if I could somehow stop the truncation of the file
when rewriting the header. The open statement will allow the inclusion of
the Carriagecontrol='none' clause, but this does not make any difference.

Thanks in advance for any suggestions.

John



From: glen herrmannsfeldt on
John Paine <johnpaine1(a)optusnet.com.au> wrote:

> I'm struggling with a problem in trying to perform direct
> access read/write to a file using Intel 11.1.

(snip)
> 4. Once all of the data blocks have been written, I then want
> to rewrite the header block with the updated statistics and
> data block sizes etc
(snip)

> open(10,file='test.dat',form='binary')
(snip)

> write(10)numy,zmin,zmax
> write(10)(numx(j),j=1,numy)

Your subject says "Direct Access" but your sample code doesn't.

You need the ACCESS='DIRECT' and RECL= options on the OPEN,
and the REC= on the READ/WRITE statements. RECL specifies
the record length that all records will have.

If your data doesn't naturally have a fixed length then you
have to do some work to divide it up into fixed length units.
(Possibly wasting the end of some of the records.)

Note that in the case of statements like:

write(10)numy,zmin,zmax
write(10)(numx(j),j=1,numy)

Two records will be used, while it can be written:

write(10)numy,zmin,zmax,(numx(j),j=1,numy)

and use only one record. It can then be read with:


read(10)numy,zmin,zmax,(numx(j),j=1,numy)

(But note that these are not direct access I/O statements
without the REC= option.

-- glen
From: Louis Krupp on
John Paine wrote:
> Hi all,
>
> I'm struggling with a problem in trying to perform direct access
> read/write to a file using Intel 11.1.
>
> What I want to do is:
>
> 1. Open a data file
> 2. Write a header block to the file defining the data in the file (this
> block is of known size, but unknown content when I open the file)
> 3. Write multiple blocks of binary data of various types sequentially to
> the file and accumulate statistics about the data such as range of
> values, size of data blocks etc
> 4. Once all of the data blocks have been written, I then want to rewrite
> the header block with the updated statistics and data block sizes etc
> 5. Close the file
<snip>

You may be overlooking the easy way to do this:

1. Write your multiple blocks of binary data to a scratch file,
accumulating statistics as you go.

2. Rewind the scratch file.

3. Open the output data file and write the header with everything you need.

4. Read records from the scratch file and write them to the output file.

5. Close the output file. Remove the scratch file.

Unless your files are really, really big (for some definition of "big"),
this is likely to be fast enough *and* easier to code. Plus, it will
let the header size vary, which might be convenient at some point.

A couple more thoughts:

If your header size and/or format change, make sure that new versions of
the file that read the file can read old versions or at least fail
gracefully. Including a version number in the first few bytes of the
header is probably a good way to do this. You'll probably also want to
make sure that old versions of the program know which version(s) of the
file they can read.

I'm not sure I'd recommend this practice, but as you may know, PDF uses
a cross-reference and a trailer record at the end of the file. Programs
that read PDF files start by reading a bunch of bytes at the end of the
file and scanning for a magic marker string.

Louis
From: James Van Buskirk on
"John Paine" <johnpaine1(a)optusnet.com.au> wrote in message
news:4b6f518f$0$32133$afc38c87(a)news.optusnet.com.au...

> I'm struggling with a problem in trying to perform direct access
> read/write to a file using Intel 11.1.

C:\gfortran\clf\streamtest>type streamtest.f90
program streamtest
implicit none
type head
integer i
real x
character(3) c
end type head
type(head) header
integer array(10)
integer i

array = [(i,i=1,size(array))]
header = head(0,0,'0')
open(10,file='streamtest.dat',access='stream')
write(10) header, array
header = head(2,3.14,'CAT')
write(10,POS=1) header
close(10)
open(10,file='streamtest.dat',access='stream')
header = head(0,0,'0')
array = 0
read(10), header, array
write(*,*) 'header = ', header
write(*,*) 'array = ', array
end program streamtest

C:\gfortran\clf\streamtest>gfortran streamtest.f90 -ostreamtest

C:\gfortran\clf\streamtest>streamtest
header = 2 3.1400001 CAT
array = 1 2 3 4 5
6 7 8 9 10

--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end


From: Richard Maine on
John Paine <johnpaine1(a)optusnet.com.au> wrote:

> What I want to do is:
.....
> 4. Once all of the data blocks have been written, I then want to rewrite the
> header block with the updated statistics and data block sizes etc
....
> In CVF, I used to do this using the dfwin interface
...
> Since I am now migrating my code to the Intel
> compiler, I'm cleaning up my code base and would like to eliminate the
> Windows routines and use standard Fortran IO.
[example using form='binary']

Well form='binary' is not standard Fortran, so if that is the objective,
this would not achieve it even if it did act like you wanted.

> This all works fine, except that step 4 truncates the file after the write
> so all of the data records are lost....
> The behavior of the 'binary' write truncating the file is documentated by
> Intel. They also include a mention of "direct access" which suggests that
> direct access be used.

Yes, you could certainly do it with direct access, but there are lots of
complications - more than you mentioned. For a start,

> step 1 with access='direct',recl=1 specified.

The use of recl=1 with direct access is a hack recognized by some
compilers, but it is not standard. Direct access is standard, but the
common special-case interpretation of recl=1 is not. There are other
complications as well.

> Thanks in advance for any suggestions.

I'd recommend against going with direct access. It certainly can be
done; I've done that kind of thing in the past. But the many gotchas of
direct access are a large part of why I was a big pusher for stream
access in f2003.

I recommend using stream access. James gave some presumably fine example
code (I didn't check in detail, but I suspect it is fine). I just
thought I'd supply some English to supplement his Fortran. :-)

The form='binary' and access='direct',recl=1 are both nonstandard
variants of stream access. They date from before stream was
standardized. Now that stream is standardized, I recommend using it
instead of those nonstandard variants. As an additional "side" benefit,
the standard requires that it act like you want (no truncation), at
least as long as you stick to unformatted stream. (Formatted is a
different story).

The approach that Louis mentioned (using a temporary intermediate file)
can also work well; that's your choice to make.

--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain