unformatted files (again) [Fortran]

Prev: a wiki entry for gfortran
Next: BIND(C) functions in a module error

From: Arjen Markus on 9 Aug 2010 06:21

On 6 aug, 19:28, n...(a)gosset.csi.cam.ac.uk (Nick Maclaren) wrote:
> In article <4C5C40E1.2...(a)nospom.com>, Dave Allured <nos...(a)nospom.com> wrote:
>
> >Gideon wrote:
>
> >> So here's my question: is there an easy, robust, way to discover what
> >> size the header of a fortran unformatted file is on a given
> >> architecture/OS?
>
> >This is a tricky question because the internal structure of fortran
> >unformatted sequential files was never standardized. The record length
> >integers were never intended to be seen by normal users, putting the
> >whole topic outside fortran standards.
>
> That's understating the issue :-)
>
> What record-length integers? Some systems didn't have them, and
> that includes some types of file under Unices :-) Magnetic tapes
> of types that allow variable-length blocks, run-time systems that
> allow the direct use of sockets and so on.
>
> >For the compilers and unix and linux platforms within my experience, I
> >can count on the following structure of each unformatted record:
>
> > [length] [data block] [length]
>
> >Where [length] is a 4 or 8 byte integer, the byte count of the data
> >block; and [data block] is the user data from a single unformatted write
> >statement. The leading and trailing length integers for each record are
> >identical. I believe the original purpose of the trailing length was to
> >support reverse reading such as the backspace statement.
>
> That is correct, and that is the usual format. HOWEVER, I have also
> seen the following:
>
> 1) As above, but with 2 byte integers.
>
> 2) With only a preceding length (4 byte, if I recall).
>
> 3) With a header before the first record.
>
> 4) With the [length] field actually being a [junk,length] field.
>
> My guess is that all of those are now dead and buried, though.
>
> Regards,
> Nick Maclaren.

The old (but apparently not quite dead and buried) MicroSoft Fortran
compilers
(Powerstation or otherwise) used yet another format:
- The file starts with a capital K byte and ends with a byte
representing
"e - accent circonflexe" (IIRC and I can not easily type it - e^ is
the
closest I get)
- Records are built up of segments where the first byte is either u
umlaut
(u" or ASCII 129, if my memory serves me) and ends with it, and the
next 128
bytes of actual data in between or the first byte is a smaller ASCII
code (< 129),
in which case it gives the actual number of bytes of data.
- If the segment starts with u umlaut, the next segment is part of
the
same record.

(It was rather tedious to analyse these files byte by byte if you
needed
to read them via a program compiled by another compiler)

Regards,

Arjen

From: Dave Allured on 10 Aug 2010 11:29

Richard Maine wrote:
>
> dpb <none(a)non.net> wrote:
>
> > Gideon wrote:
>
> > > ... I should
> > > have prefaced this by saying that I'm mostly writing arrays of double
> > > precision numbers and then reading them into MATLAB.
> >
> > I would recommend switching to "stream" files instead of Fortran
> > unformatted for the purpose.
>
> Good point. I was originally thinking he was talking about handling
> existing files from unknown sources, in which case, that doesn't help a
> lot, as you say.
>
> For new files meant to be interoperable with non-Fortran environments,
> I'd definitely go with stream. Heck, interoperability was the "excuse" I
> used to propose stream for the f2003 standard after new proposals were
> supposed to be out of order. We already had an approved task of working
> on intertoperability and this seemed to fit under that umbrella. I was a
> little worried that this "excuse" might not fly, so I tried to do a very
> minimal version, hoping that simplicity would help. Somewhat to my
> surpise, the only complaints about my proposal were that it didn't go
> far enough (so the messier formatted stream got added - I'm still not
> sure I like that addition, but that's what a clear majority wanted).
>
> Even for existing unformatted sequential files, if you are trying to
> read them using a Fortran compiler that might have different internal
> structure conventions, stream is the way to go to "safely" read such a
> file of unknown form in order to try to deduce what the form must be.

Agreed that stream is optimal for this purpose.

For new work or when otherwise feasible, I prefer Netcdf format for
exchange of regular array data with complete independence from platform
issues. All questions of internal format go away, including endian.
This is a standard C-based library with an F90 interface. The downside
is only that you have to install the library and learn the programming
interface. There is also a free Netcdf interface for Matlab.

This has little to do with Gideon's original question, except that I
suspect that Netcdf would fit nicely into his data flow and solve
several issues.

--Dave

From: Terence on 10 Aug 2010 20:21

On Aug 11, 10:14 am, Terence <tbwri...(a)cantv.net> wrote:
I think I should have added:

Note: the tape reels, from day 1, had a tape label record at
beginning and end.
Although I do not rember much of the format, this label must have been
capable of defining what writing format was being used, and if record
length count fields were present, what technique was used (2 or 4 or
perhaps later 8 byte counts, byte direction, and more).

From: glen herrmannsfeldt on 10 Aug 2010 20:54

Terence <tbwright(a)cantv.net> wrote:
(snip)

> This is why I NEVER use sequential unformatted files in Fortran
> programs.
> By using 'FORM=BINARY' or other non-standard equivalents I can read
> what I expect.

UNFORMATTED works well for Fortran programs writing files to
be read by the same program other Fortran programs.

> In the "old" days, those record size tags on unformatted records, as
> prefix and suffix, were quite useful for the use of tape drives, where
> it made sense to wiite programs that could write or read forwards in
> the normal way, or else read records backwards (into reverse memory
> directions) to avoid the lost time in tape reel re-winds.

"Read Backwards" was available on most reel style tape drives,
but not on many current drives. I don't know that it was
ever available in Fortran.

> I seem to remember these tags in unformatted records were 16 bit
> fields on the Fortran IV mainframe compilers.

RECFM=VBS uses four byte block and record descriptors with 16 bit
lengths, and some flag bits. S/360 channels use 16 bit lengths,
but OS/360 uses signed arithmetic so only up to 32767 LRECL
is allowed. There is no unsigned load halfword in S/360.

-- glen

From: Richard Maine on 10 Aug 2010 23:58

glen herrmannsfeldt <gah(a)ugcs.caltech.edu> wrote:

> UNFORMATTED works well for Fortran programs writing files to
> be read by the same program other Fortran programs.

It works well for C interop and all kinds of other things as well.
"Unformatted" is not a synonym for "sequential unformatted", which I am
(almost) sure is what you mean. There has long (well, since f77) also
been direct access unformatted, and there is now also stream
unformatted.

Your description above applies only to sequential unformatted. Note that
Terence did correctly qualify it that way in his post.

I would say that stream unformatted is the preferred choice for
interoperability in modern Fortran. Taking the unqualified "unformatted"
to imply sequential unformatted is increasingly likely to confuse and
mislead people into thinking that what you say also applies to stream
unformatted. That is, after all, what the normal interpretation of the
English would mean.

--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: a wiki entry for gfortran
Next: BIND(C) functions in a module error