From: glen herrmannsfeldt on
Richard Maine <nospam(a)see.signature> wrote:
> onateag <gaetano.esposito(a)gmail.com> wrote:
(snip)

> I can only debug what I see. If I'm just assured that everything has
> been checked and is fine, even though I don't see any of it, that
> doesn't leave me much to go on. When someone assures me that the parts
> they didn't show me are all fine, that often tends to make me more
> suspicious of those parts, rather than less so. I don't see anything
> else in what was posted that I can help with.

> From the first post

>> $ diff tplink_working_x tplink_x
>> 5c5
>> < 0000100 0002 0000 0014 0000 523c 0006 0000 0000
>> ---
>> > 0000100 0002 0000 0014 0000 0014 0000 0000 0000

> I am slightly puzzled in that I can't quite match the single octal dump
> line shown with the test program. It looks like it might plausibly have
> the end of the second record and the beginning of a third. The 2 0 14 0
> could plausibly be the 2 at the end of the second record, followed by a
> trailing record size (14 hex = 20 dec, which would be right). But it
> doesn't look aligned right, unless the first record is longer than the
> read suggests... which might be possible; those are funny looking values
> in the first record, maybe Hollerith? The first record ought to have
> taken 32 bytes (3*8 for the data, and 2*4 for the record header and
> trailers, assuming the most common 32-bit structures). Then the second
> should have taken 28 bytes (5*4 data plus 2*4 header/trailer). But this
> is showing what loks like the end of the second record after 72 bytes,
> which seems 12 bytes too far in, if I got all the arithmetic straight.

Just the same as I thought. I thought about trying to figure that
part out, but then decided I wanted to see it.

> As Glenn says, maybe a full hex dump of the first bit of the file might
> help more; the one line in isolation isn't enough. I'm still not sure
> that would tell me enough, but it might help some.

> If the above is the end of the second record, then it looks like one of
> the files has a longish 3rd record (6523c hex = 414268 decimal bytes),
> while the other has a 3rd record with only 20 bytes of data. That would
> reasonably well match your observed ability to read 2 8-byte values from
> it, but fail reading a third. Why the file would be that way, I have no
> data to see.

Even more, X'6523C' is the right length if the variables starting
with I to N are INTEGER, and the rest DOUBLE PRECISION, as the
IMPLICIT statement indicates. That wouldn't explain why the
rest of the files compare equal, though. If an extra 20 byte
record were added, the rest of the file would be different.

I wanted to see at least if there was an X'14' where it should
be to end the mystery record.

Otherwise, if somehow the length of the next record was overwritten
with X'14', but the record itself wasn't. Strange things can happen
when two programs write to the same file at (almost) the same time.

-- glen
From: Steve Lionel on
On 3/3/2010 10:31 PM, Gaetano Esposito wrote:
> For a reason that is behind my comprehension, the file created in the
> WRITE statement, cannot be read by the READ statement.
>
> I investigated this problem for a long time comparing a working
> version of the unformatted file (which I had kept from other
> computation) to the ones which now are not working (by "working" I
> mean it is readable by the READ statement).
> It turns out that the two files are **almost** (of course) identical.
> If the unix command "cmp" is unreadable for me, if I dump the binary
> files content using "od -x> linking_file" for both (working and not-
> working) versions and "diff" them, I get just one different line:
> $ diff tplink_working_x tplink_x
> 5c5
> < 0000100 0002 0000 0014 0000 523c 0006 0000 0000
> ---
>> > 0000100 0002 0000 0014 0000 0014 0000 0000 0000

I can tell from the error message that you are probably using Intel
Fortran. What I can't tell is how you have opened the file on each end.
The error you're getting implies that on the READ, the file is opened
for sequential unformatted access and is looking for the record length
(32 bits) at the beginning of the record. Whatever it is reading is not
a correct record length and the variable list for the READ wants more
data than it looks as is being supplied.

The od output you show looks strange to me - it doesn't resemble what
I'd expect for a sequential unformatted file. It could be that you
created the file with FORM='BINARY', an extension that is similar to
F2003 ACCESS='STREAM', but are not reading it the same way.

If you would provide the additional information requested by Richard and
Glen, perhaps we might figure out something.

--
Steve Lionel
Developer Products Division
Intel Corporation
Nashua, NH

For email address, replace "invalid" with "com"

User communities for Intel Software Development Products
http://software.intel.com/en-us/forums/
Intel Software Development Products Support
http://software.intel.com/sites/support/
My Fortran blog
http://www.intel.com/software/drfortran
From: onateag on
On Mar 4, 3:17 pm, glen herrmannsfeldt <g...(a)ugcs.caltech.edu> wrote:
> Gaetano Esposito <gaetano.espos...(a)gmail.com> wrote:
> > The problem I am going to detail used to occur with "random" (i.e.
> > multiple) combination of compilers and machines/architectures. Because
> > of this uncertainty, I switched a while ago to static compilation on
> > one machine with the program running on others. At first I was
> > satisfied with this solution, but now the problem is back here, and I
> > ran out of ideas...
>
> (snip)
>
> > I investigated this problem for a long time comparing a working
> > version of the unformatted file (which I had kept from other
> > computation) to the ones which now are not working (by "working" I
> > mean it is readable by the READ statement).
> > It turns out that the two files are **almost** (of course) identical.
> > If the unix command "cmp" is unreadable for me, if I dump the binary
> > files content using "od -x > linking_file" for both (working and not-
> > working) versions and "diff" them, I get just one different line:
> > $ diff tplink_working_x tplink_x
> > 5c5
> > < 0000100 0002 0000 0014 0000 523c 0006 0000 0000
> > ---
> >> 0000100 0002 0000 0014 0000 0014 0000 0000 0000
>
> Post the first 20 lines of od -x for each file.
>
> Also, post all statements between the second and third
> READ statement in the real program.  
>
> -- glen

Here is the first 20 lines of od -x for both working and not working
linking files:

$ od -x tplink_working | head -20
0000000 0024 0000 2e31 2039 2020 2020 2020 2020
0000020 2020 2020 4f44 4255 454c 2020 2020 2020
0000040 2020 2020 0000 0000 0024 0000 0014 0000
0000060 01be 0000 a1a0 0003 0004 0000 006f 0000
0000100 0002 0000 0014 0000 523c 0006 0000 0000
0000120 ec04 412e 0000 2000 f958 4043 0000 0000
0000140 1000 4061 70a4 0a3d a3d7 400a 0000 0000
0000160 0000 0000 0000 0000 0000 0000 0000 0000
0000200 0000 0000 0000 0000 0000 2000 036e 403c
0000220 b852 851e 61eb 4058 872b d916 f7ce 400c
0000240 0000 0000 0000 0000 5c29 c28f 28f5 3ffc
0000260 0000 0000 0000 4010 0001 0000 0000 2000
0000300 20a5 3ff0 0000 0000 2000 4062 6666 6666
0000320 6666 4000 0000 0000 0000 0000 0000 0000
0000340 0000 0000 0000 0000 0000 0000 0000 0000
0000360 0000 6000 ffb1 402f 0000 0000 0000 4054
0000400 0000 0000 0000 4006 0000 0000 0000 0000
0000420 0000 0000 0000 0000 0000 0000 0000 0000
0000440 0000 0000 0000 0200 01e3 4031 0000 0000
0000460 0000 4054 0000 0000 0000 4006 0000 0000

$ od -x tplink | head -20
0000000 0024 0000 2e31 2039 2020 2020 2020 2020
0000020 2020 2020 4f44 4255 454c 2020 2020 2020
0000040 2020 2020 0000 0000 0024 0000 0014 0000
0000060 01be 0000 a1a0 0003 0004 0000 006f 0000
0000100 0002 0000 0014 0000 0014 0000 0000 0000
0000120 ec04 412e 0000 2000 f958 4043 0000 0000
0000140 1000 4061 70a4 0a3d a3d7 400a 0000 0000
0000160 0000 0000 0000 0000 0000 0000 0000 0000
0000200 0000 0000 0000 0000 0000 2000 036e 403c
0000220 b852 851e 61eb 4058 872b d916 f7ce 400c
0000240 0000 0000 0000 0000 5c29 c28f 28f5 3ffc
0000260 0000 0000 0000 4010 0001 0000 0000 2000
0000300 20a5 3ff0 0000 0000 2000 4062 6666 6666
0000320 6666 4000 0000 0000 0000 0000 0000 0000
0000340 0000 0000 0000 0000 0000 0000 0000 0000
0000360 0000 6000 ffb1 402f 0000 0000 0000 4054
0000400 0000 0000 0000 4006 0000 0000 0000 0000
0000420 0000 0000 0000 0000 0000 0000 0000 0000
0000440 0000 0000 0000 0200 01e3 4031 0000 0000
0000460 0000 4054 0000 0000 0000 4006 0000 0000

I am going to copy all the subroutines involved in the read:

MCINIT is the core one:

SUBROUTINE MCINIT (LINKMC, LOUT, LENIMC, LENRMC, IMCWRK, RMCWRK)
IMPLICIT DOUBLE PRECISION (A-H, O-Z), INTEGER (I-N)
DIMENSION IMCWRK(*), RMCWRK(*)
CHARACTER*16 VERS, PREC
LOGICAL IOK, ROK, KERR
COMMON /MCCONS/ VERS, PREC, KERR, LENI, LENR
C
COMMON /MCMCMC/ RU, PATMOS, SMALL, NKK, NO, NLITE, INLIN,
IKTDIF,
1 IPVT, NWT, NEPS, NSIG, NDIP, NPOL, NZROT, NLAM,
2 NETA, NDIF, NTDIF, NXX, NVIS, NXI, NCP,
3 NCROT, NCINT, NPARK, NBIND, NEOK, NSGM,
4 NAST, NBST, NCST, NXL, NR, NWRK, K3

SMALL = 1.0E-20
RU = 8.314E+07
PATMOS= 1.01325E+06

CALL MCLEN (LINKMC, LOUT, LI, LR)
IOK = (LENIMC .GE. LI)
ROK = (LENRMC .GE. LR)
C
IF (.NOT.IOK .OR. .NOT.ROK) THEN
IF (.NOT. IOK) WRITE (LOUT, 300) LI
IF (.NOT. ROK) WRITE (LOUT, 350) LR
STOP
ENDIF
C

REWIND LINKMC
READ (LINKMC, ERR=999) VERS, PREC, KERR
READ (LINKMC, ERR=999) LI, LR, NO, NKK, NLITE
C
NK = NO*NKK
NK2 = NO*NKK*NKK
K2 = NKK*NKK
K3 = 3*NKK
K32 = K3*K3
NKT = NO*NKK*NLITE
NWT = 1
NEPS = NWT + NKK
NSIG = NEPS + NKK
NDIP = NSIG + NKK
NPOL = NDIP + NKK
NZROT= NPOL + NKK
C
NLAM = NZROT + NKK
NETA = NLAM + NK
NDIF = NETA + NK
NTDIF= NDIF + NK2
C
NXX = NTDIF + NO*NKK*NLITE
NVIS = NXX + NKK
NXI = NVIS + NKK
NCP = NXI + NKK
NCROT= NCP + NKK
NCINT= NCROT + NKK
NPARK= NCINT + NKK
C
NBIND= NPARK + NKK
NEOK = NBIND + K2
NSGM = NEOK + K2
NAST = NSGM + K2
NBST = NAST + K2
NCST = NBST + K2
C
NXL = NCST + K2
C
NR = NXL + K32
NWRK = NR + K3
NTOT = NWRK + K3 - 1

INLIN = 1
IKTDIF= INLIN + NKK
IPVT = IKTDIF + NLITE
ITOT = IPVT + K3 - 1
READ (LINKMC, ERR=999) PATMOS, (RMCWRK(NWT+N-1),
1 RMCWRK(NEPS+N-1), RMCWRK(NSIG+N-1),
2 RMCWRK(NDIP+N-1), RMCWRK(NPOL+N-1), RMCWRK(NZROT+N-1),
3 IMCWRK(INLIN+N-1), N=1,NKK),
4 (RMCWRK(NLAM+N-1), N=1,NK), (RMCWRK(NETA+N-1), N=1,NK),
5 (RMCWRK(NDIF+N-1), N=1,NK2),
6 (IMCWRK(IKTDIF+N-1), N=1,NLITE), (RMCWRK(NTDIF+N-1), N=1,NKT)
C
C SET EPS/K AND SIG FOR ALL I,J PAIRS
C
CALL MCEPSG (NKK, RMCWRK(NEPS), RMCWRK(NSIG), RMCWRK(NDIP),
1 RMCWRK(NPOL), RMCWRK(NEOK), RMCWRK(NSGM) )
C
300 FORMAT (10X,'IMCWRK MUST BE DIMENSIONED AT LEAST ', I5)
350 FORMAT (10X,'RMCWRK MUST BE DIMENSIONED AT LEAST ', I5)
RETURN
999 WRITE (LOUT, *) ' Error reading Transport linking file...'
STOP
END


MCINIT is called by CTSRTR:
SUBROUTINE CTSTRT (LOUT, LINKCK, LINKMC, ICKWRK, RCKWRK,
1 IMCWRK, RMCWRK, KNAME, NSP, NR)


IMPLICIT DOUBLE PRECISION (A-H,O-Z)

include "vecsize"

DIMENSION ICKWRK(LENIWK), RCKWRK(LENRWK)
CHARACTER CCKWRK(LENCWK)*16
DIMENSION IMCWRK(LENIMC), RMCWRK(LENRMC)
CHARACTER KNAME(NSPMAX)*10
LOGICAL KERR

C Initialize the chemical kinetics package
CALL CKINIT (LENIWK, LENRWK, LENCWK, LINKCK, LOUT, ICKWRK,
1 RCKWRK, CCKWRK)
CALL CKINDX (ICKWRK, RCKWRK, NEL, NSP, NR, NFIT)
CALL CKSYMS (CCKWRK, LOUT, KNAME, KERR)
if (NREACMAX .lt. NR) then
write(lout,*)' Increase NREACMAX (< NREAC) :',NREACMAX, NR
stop
endif
if (NSPMAX .lt. NSP) then
write(lout,*) ' Increase NSPMAX (< NSP) :',NSPMAX, NSP
stop
endif

C Initialize the transport package
C
CALL MCINIT (LINKMC, LOUT, LENIMC, LENRMC, IMCWRK, RMCWRK)

RETURN
END

I had to include the file "vecsize" so I wouldn't use more memory than
I needed (quick and dirty way to avoid all the hassles of dynamic
allocations)

Here it is vecsize file:
PARAMETER (LENIWK=12891, LENRWK=11486, LENCWK=116,
1 LENIMC=446, LENRMC=2337984,
2 NSPMAX=111, NVAMAX=116, NP1MAX=1092,
3 NREACMAX=784,NSIM=20)

Finally here is the subroutine called in MCINIT that reads just the
first records from the LINKMC file: MCLEN:

SUBROUTINE MCLEN (LINKMC, LOUT, LI, LR)
C
C*****precision > double
IMPLICIT DOUBLE PRECISION (A-H, O-Z), INTEGER (I-N)
PARAMETER (NLIST = 3)
LOGICAL KERR, IERR, VOK, POK
CHARACTER*16 LIST(NLIST), VERS, PREC, V, P
COMMON /MCCONS/ VERS, PREC, KERR, LENI, LENR
DATA LIST/'1.7','1.8','1.9'/
C
VERS = ' '
PREC = ' '
LENI = 0
LENR = 0
LI = LENI
LR = LENR
KERR = .FALSE.
IERR = KERR
C
REWIND (LINKMC)
READ (LINKMC, ERR=999) VERS, PREC, KERR
C
VOK = .FALSE.
DO 5 N = 1, NLIST
IF (VERS .EQ. LIST(N)) VOK = .TRUE.
5 CONTINUE
C
POK = .FALSE.
C*****precision > double
IF (INDEX(PREC, 'DOUB') .GT. 0) POK = .TRUE.
C
IF (KERR .OR. (.NOT.POK) .OR. (.NOT.VOK)) THEN
IF (KERR) THEN
WRITE (LOUT,'(/A,/A)')
1 ' There is an error in the transport linking file...',
2 ' Check TRANFIT output for error conditions.'
ENDIF
IF (.NOT. VOK) THEN
WRITE (LOUT,'(/A,A)')
1 ' Transport Linking File is incompatible with Transport',
2 ' Library Version 1.7'
ENDIF
IF (.NOT. POK) THEN
WRITE (LOUT, '(/A,A)')
1 ' Precision of Transport Linking File does not agree
with',
2 ' precision of Transport Library'
ENDIF
STOP
ENDIF
C
READ (LINKMC, ERR=999) LENIMC, LENRMC, NO, NKK, NLITE
REWIND (LINKMC)
LENI = LENIMC
LENR = LENRMC
LI = LENI
LR = LENR
RETURN
999 CONTINUE
C
WRITE (LOUT, 50)
50 FORMAT
1 (' Error reading Multi-component Transport linking file.')
STOP
END

Hopefully these are helpful information.
From: onateag on
On Mar 4, 5:14 pm, Steve Lionel <steve.lio...(a)intel.invalid> wrote:
> On 3/3/2010 10:31 PM, Gaetano Esposito wrote:
>
>
>
> > For a reason that is behind my comprehension, the file created in the
> > WRITE statement, cannot be read by the READ statement.
>
> > I investigated this problem for a long time comparing a working
> > version of the unformatted file (which I had kept from other
> > computation) to the ones which now are not working (by "working" I
> > mean it is readable by the READ statement).
> > It turns out that the two files are **almost** (of course) identical.
> > If the unix command "cmp" is unreadable for me, if I dump the binary
> > files content using "od -x>  linking_file" for both (working and not-
> > working) versions and "diff" them, I get just one different line:
> > $ diff tplink_working_x tplink_x
> > 5c5
> > <  0000100 0002 0000 0014 0000 523c 0006 0000 0000
> > ---
> >> >  0000100 0002 0000 0014 0000 0014 0000 0000 0000
>
> I can tell from the error message that you are probably using Intel
> Fortran.  What I can't tell is how you have opened the file on each end..
>   The error you're getting implies that on the READ, the file is opened
> for sequential unformatted access and is looking for the record length
> (32 bits) at the beginning of the record.  Whatever it is reading is not
> a correct record length and the variable list for the READ wants more
> data than it looks as is being supplied.
>
> The od output you show looks strange to me - it doesn't resemble what
> I'd expect for a sequential unformatted file.  It could be that you
> created the file with FORM='BINARY', an extension that is similar to
> F2003 ACCESS='STREAM', but are not reading it the same way.
>
> If you would provide the additional information requested by Richard and
> Glen, perhaps we might figure out something.
>
> --
> Steve Lionel
> Developer Products Division
> Intel Corporation
> Nashua, NH
>
> For email address, replace "invalid" with "com"
>
> User communities for Intel Software Development Products
>    http://software.intel.com/en-us/forums/
> Intel Software Development Products Support
>    http://software.intel.com/sites/support/
> My Fortran blog
>    http://www.intel.com/software/drfortran

Hi Steve,
yes, I am using the Intel compiler.

I am going to report how I open the files in WRITE and READ mode:

write:
OPEN (LINKTP, FORM='UNFORMATTED', FILE='tplink')

read:
OPEN (LINKMC,STATUS='OLD',FORM='UNFORMATTED',FILE='tplink')

From: onateag on
> I can only debug what I see. If I'm just assured that everything has
> been checked and is fine, even though I don't see any of it, that
> doesn't leave me much to go on. When someone assures me that the parts
> they didn't show me are all fine, that often tends to make me more
> suspicious of those parts, rather than less so. I don't see anything
> else in what was posted that I can help with.

Richard,

I can only agree with you. I tried to report what "I thought" could
have been the core of the problem, but the assumption that I would
have been able to discern useful from useless info was not right (we
all know what happens if we "assume"). If in any way I have irritated
you and any other of you guys, I apologize for it. You Richard, Glen
and Steve always leave priceless knowledge everywhere you write, and I
greatly appreciate it.