From: robin on
"Paul van Delst" <paul.vandelst(a)noaa.gov> wrote in message news:hthajv$oqj$1(a)news.boulder.noaa.gov...
| Hello,
|
| Can some kind, patient person explain exactly what the use of static memory entails? For
| example, if one uses the "-fstatic" option for a compiler.
|
| A colleague is having an issue with some code that runs fine on one machine (#1: redhat 4,
| 32 bit, g95 v4.0), but bombs on another (#2: redhat 3, 32 bit, g95 v4.1). And by bombs I
| mean it runs to completion, but the results are full of NaN's.
|
| One of the debug suggestions I made was to *remove* the "-fstatic" switch from the
| compile. When she did that the runs bombed on both machines. She asked me why that would
| happen and I have to admit I was at a bit of a loss to explain it. My own experience is
| built around usage, not theory.

If program delivers results erratically as NaN,
the likely cause is one or more variables not being initialized.

That means that the program is in error, and needs to be corrected.

| What I did bumble on about (gleaned from various google searches) was:
|
| <quote>
| Variables that are stored in static memory are allocated when a program is first run.
| Thus, the variable remains in memory for the duration of the program and, typically, its
| value is retained between calls to procedures.
|
| Also, static variables tend to be initialized (i.e. set to zero) when the program is started.

No.
What a particular compiler might do with storage locations
reserved for variables is not relevant.
You cannot rely on variables being initialized to anything.

| Static variables behave similarly to SAVE'd variables (but I'm sure there are some
| differences... not sure what they are)
|
| Thus, if the code depends on variables being static, basically it is not very robust code
| (as you are discovering) since the behaviour of assuming initialised variables is simply
| dangerous.
| </quote>
|
| Can someone grade my response above with some additional information? I don't think static
| and SAVE are the same, but an understanding of the differences elude me.
|
| Any info appreciated.
|
| cheers,
|
| paulv


From: robin on
"Paul van Delst" <paul.vandelst(a)noaa.gov> wrote in message news:hthajv$oqj$1(a)news.boulder.noaa.gov...

I should have added that your colleague should also look for
subscript errors, as reading from outside an array will pick up
rubbish, which cold look like a NaN.


From: JB on
On 2010-05-25, Gordon Sande <Gordon.Sande(a)gmail.com> wrote:
> probably an uninitilaized
> variable. There are good
> tools for dealing with those. They tend to be developed away from the
> bazaar of open source.

Bazaar or not, one common open source tool that tends to be good at
finding use of uninitialized memory (and other memory errors) is
valgrind. However, I would guess that it's C-centric to the point of
considering SAVE'd variables without explicit initialization being
initialized to 0, assuming the compiler sets them to 0 (which AFAIK
most unix compilers do in order to take advantage of the .bss section
in the object file).


--
JB
From: Gordon Sande on
On 2010-05-26 08:32:02 -0300, JB <foo(a)bar.invalid> said:

> On 2010-05-25, Gordon Sande <Gordon.Sande(a)gmail.com> wrote:
>> probably an uninitilaized
>> variable. There are good
>> tools for dealing with those. They tend to be developed away from the
>> bazaar of open source.
>
> Bazaar or not, one common open source tool that tends to be good at
> finding use of uninitialized memory (and other memory errors) is
> valgrind. However, I would guess that it's C-centric to the point of
> considering SAVE'd variables without explicit initialization being
> initialized to 0, assuming the compiler sets them to 0 (which AFAIK
> most unix compilers do in order to take advantage of the .bss section
> in the object file).

There are various "definitions" of undefined. Many are so watered down
that it takes a skilled lawyer to show why they are not fraudulent. If
you don't notice the distinctions you can be fooled into treating them as
all the same.

The useful definition is that the variable has never been assigned a value
by the user after the variable has come into existence. The descriptions
of Valgrind I have seen to not include such a capability. If it does have
such a capability I would like to see a reference to it. Undefined variable
checking requires either hardware assistance (parity checking is quick, easy
and effective when possible) or much checking of all accessed values by the
running program (which means that the object code is bulky and slowed in the
several implementations I have seen) as a result of the compiler inserting
the extra checking. Valgrind does not seem to benefit from either mode. It
may be useful for some errors that the standard C runtime support assumes
are not present (because it must assume the programmer is correct).

I am aware of three systems that do undefined variable checking for current
Fortran. These are Lahey/Fujitsu, NAG and Salford/Silverfrost. The classic
example was WatFor which was parity based on IBM 7040 and software for
IBM 360. Salford was software based for F77. Both WatFor and Salford are
university based for fast turn around student debugging.

Another area when the "definitions" are slippery is execution profiling.
Some systems give exact line by line counts of the execution history
as a result of extensive instrumentation. Others give a sampling of the
location of the location counter with the association to the source done
by looking at the loading map of the program. Same definition but such
greatly different capabilities that one wonders how the same name can be
justified. Valgrind offers sampling. Marketing!


From: Paul van Delst on
John Paine wrote:
> It seems to me that what you are really looking for is a simple to way
> to find where the program is going wrong. This is currently masked by
> the floating-point exception handler generating a NaN and the program
> continuing on to the bitter end without generating anything useful. I
> don't know what the switch is for the g95 compiler, but for the Intel
> compiler you set the exception handler to use Underflow gives 0.0; Abort
> on other IEEE exceptions (/fpe:0). This means that the program crashes
> as soon as it encounters the operation that causes the NaN result. That
> should then help you identify where it happens and more importantly why
> it is happening.

Oh, yes, I agree. I suggested all the usual initial approaches to debugging via the compiler:
- turn all the checking/warning switches on,
- determine if there is a switch to generate signaling NaN's so the program stops
when one is generated
- determine if there is a switch to initialise reals to zero/nan/inf and run tests
for all the cases,
- use an -fimplicit-none switch if available,
- turn off the -fstatic switch.
Then run the code through gdb. Once you've got that working on a particular system, run
the code through valgrind.

The removal of the -fstatic in the compilation stage was just the particular step that
gave the first definitive results (i.e. everything broke). I was then asked what -fstatic
does and that's what precipitated my question. Given the frequency with which the term is
bandied about when discussing Fortran code, I was surprised when I found it difficult to
find unequivocal explanations of what the use of static variables in Fortran code actually
meant. (I realise the answers will be compiler and platform dependent)


> On a more subjective note, I have experienced similar problems where the
> same program behaves differently on different (but very similar)
> machines. Quite intriguing, but horrible to debug. My all-time favourite
> was the machine where the program ran fine, but only if the network
> driver was not loaded. Another favourite was the one where two instances
> of the program had to be started, the first one would not run correcly
> but could be left in the background while the second one ran just fine.
> Couldn't solve either of those, but both went away when the computers
> were retired and replaced with new ones.

Yep, I've had similar happen too. Several times. Either the machine changed (in some
fashion), or the compiler did. Drives me nuts.

cheers,

paulv