How to deal with "missing points" in arrays [Fortran]

Prev: VAX VMS Fortran Source
Next: New Intel Visual Fortran user

From: glen herrmannsfeldt on 10 Jun 2010 20:35

aruzinsky <aruzinsky(a)general-cathexis.com> wrote:
(snip)

> What is the size of your Fortran's LOGICAL data type? I only use C+
> +. Visual C++ has the data type, bool, but sizeof(bool) = 1 byte, so
> I wrote my own array class (both 1D and 2D) of 1 bit per element. To
> my surprise, access is very fast and, now, I never hesitate to use it.
> Maybe, you can do the same in Fortran.

It is likely that Fortran has a byte sized logical, though in
a structure with a real*8 alignment will pad it to either a
four or eight byte boundary on most machines. REAL*8 on a
four byte boundary is slow on many machines, though that doesn't
stop them from doing it.

-- glen

From: William Clodius on 10 Jun 2010 22:38

glen herrmannsfeldt <gah(a)ugcs.caltech.edu> wrote:

> William Clodius <wclodius(a)lost-alamos.pet> wrote:
> (snip, someone wrote)
>
> >> so this means that I wouldn't be able to test for equivalence against
> >> 9.9999e30 or 1.0E30? Alas, I forgot one shouldn't test for equivalence
> >> with floating point numbers...
>
> > Formally you cannot test for equality against them. In principle you
> > could define a parameter of the desired REAL kind in effect give it the
> > closest value to 1.0E30 and test against that. But as Richard noted that
> > is error prone.
>
> The language has no rule against equality tests for floating
> point values, but one does have to be careful.
>
> X=0.3+0.3
> if(x.ne.0.6) print *,'surprise!'
>
> Testing for the exact value assigned to a variable should
> be fairly reliable, but there are still pitfalls.
> <snip>

As Richard notes there is also the problem that the in-band flag value
could be generated accidently.

--
Bill Clodius
los the lost and net the pet to email

From: Dave Allured on 11 Jun 2010 10:57

deltaquattro wrote:
>
> Hi all,
>
> The discussion is getting very interesting, but after reading all the
> answer I am getting a little confused about which would be the best
> option. Let's try to recap:
>
> 1) William Clodius suggests an enhanced version of solution 1.: I
> address this in my reply to him.
>
> 2) Most people are against option 2. because it's not safe: in your
> experience in-line signaling is too error-prone, so I should refrain
> from doing this. Ok, point taken.
>
> 3) Richard Maine and others advocate testing for NaN with a separately
> compiled function, because that will be not "optimized out" by the
> compiler. So that should be portable enough even under F95 with TRs.
> Let's say I choose this solution: what I missing here is, how do I
> fill an array elements with a NaN? Will this be portable?
>
> a=0
> arr(i,j)=a/0

Option 2 has been the predominant solution in the climate industry for
many years, with many Fortran versions and other languages, especially
in archived and published data sets. 1 is uncommon because of excess
storage size. 3 has serious portability problems.

The usual approach for option 2 is to select a single flag value well
outside of the data range, called the "missing value". For best
results, the missing value is bundled as a piece of metadata in the same
file as the data array. This permits the use of unique missing values
for different arrays.

This also facilitates exact value comparisons, which is a common
practice for this purpose in climo. E.g. you normally would not compare
to "1e30", but rather to the associated missing_value attribute read
from the file. If the attribute and the data are generated
consistently, then exact value comparison is reasonable (my opinion),
and pitfalls mentioned elsewhere in this thread are circumvented.

Yes you do need to take care in calculations to not land on or close to
the missing value. However, most calculations, at least in climo,
either have a predictable output range, or can easily be constrained.

For general use, I personally prefer a flag value close to the extreme
negative limit of the current data type, and with a very simple decimal
representation in code. Exact binary representation is not important if
used as I described above. For example, -1.0e38 for single precision
IEEE reals. This is most likely to be far outside the useable range for
any data to be computed in single precision. Also, if there is reason
to distrust exact value comparison, you can do a one-sided test with a
nearby threshold number, e.g. less than -9.9e37 tests to be a missing
value. HTH.

--Dave A.
NOAA/PSD/CIRES Climate Analysis Branch
http://www.esrl.noaa.gov/psd/psd1/

From: Woody on 14 Jun 2010 03:54

On Jun 9, 9:31 am, deltaquattro <deltaquat...(a)gmail.com> wrote:
> 3. One could initialize the "missing" values to NaN. However, I then
> have to test for the array element being a NaN, when I produce my
> output for the user. From what I remember about Fortran and NaN,
> there's (or there was) no portable way to do this...am I wrong?

Fortran 95 has an intrinsic elemental function ieee_is_nan(x) which
returns .true. if x is an IEEE NaN. This should work with Fortran
compilers that use IEEE floating point, which I believe most do.

From: Terence on 15 Jun 2010 08:36

On Jun 10, 2:31 am, deltaquattro <deltaquat...(a)gmail.com> wrote:
> Hi,
>
> this is really more of a "numerical computing" question, so I cross-
> post to sci.math.num.analysis too. I decided to post on
> comp.lang.fortran, anyway, because here is full of computational
> scientists and anyway there are some sides of the issue specifically
> related to Fortran language.
>
> The problem is this: I am modifying a legacy code, and I need to
> compute some REAL values which I then store in large arrays. Sometimes
> it's impossible to compute these values: for example, think of
> interpolating a table to a given abscissa, it may happen that the
> abscissa falls outside the curve boundaries. I have code which checks
> for this possibility, and if this happens the interpolation is not
> performed. However, now I must "store" somewhere the information that
> interpolation was not possible for that array element, and inform the
> user of it. Since the values can be either positive or negative, I
> cannot use tricks like initializing the array element to a negative
> values.
>
> I'm sure this has happened to you before: which solution did you use?
> Basically, I can think of three ways:
>
> 1. For each REAL array, I declare a LOGICAL array of the same shape,
> which contains 0 for correct values and 1 for missing values. I guess
> that's the cleanest way, but I have a lot of arrays and I'd rather not
> declare an extra array for each of them. I know it's not a memory
> issues (obviously LOGICAL arrays don't occupy a lot of space, even if
> they do are big in my case!), but to me it seems like I'm adding
> redundant code. It would be better to declare arrays of a derived
> type, each element containing a REAL and a LOGICAL, but this would
> force me to modify the code in all the places where the arrays are
> used, and it's quite a big code.
>
> 2. I initialize a missing value to an extremely large positive or
> negative value, like 9e99. I think that's how the problem is usually
> solved in practice, isn't it? I'm a bit worried that this is not
> entirely "clean", since such values could in theory also result from
> the interpolation. However, since reasonable values of all the
> interpolated quantities are usually in the range -100/100, when this
> happens usually it is related to errors in the interpolation table
> data. So most likely it indicates an error which must be signaled to
> the user.
>
> 3. One could initialize the "missing" values to NaN. However, I then
> have to test for the array element being a NaN, when I produce my
> output for the user. From what I remember about Fortran and NaN,
> there's (or there was) no portable way to do this...am I wrong?
>
> I would really appreciate your help on this issue, since I really
> don't know which way to choose and currently I'm stuck! Thanks in
> advance,
>
> Best Regards
>
> Sergio Rossi

I haven't checked all the comments to see if my suggestions have been
made.

1) A verbose way is to use sparse matrix processing, where you only
store those values present, in a serial form, with side-to side links
of "next index" and optionally "previous backward index" or zero if
end of matrix dimension; and similar links for any other dimensions
present. This and the sparse matrix processor service set, were
actually a project requirement in my MSc. This takes a lot longer to
process, but in some siuations, needs less memory.

2) Binary switches (which I prefer, when fast hardware bit location
routines are possible), as suggested for character switches. All
again, because I automatically try to use minimum space when speed is
not vital. (Remember memory sizes in the fifties and sixties?)

3) Special specific very small real non-zero values, which can be
tested for as a word of specific bits, or as the specific value. These
can represent codes for "not given", "removed as unreliable" etc. DEC
used -0.0, which they could store, for "missing value" and had a
hardware interrupt available. These near-zero values allow correct
statistics after fisrs subtracting the "missing" count from the
overall count in statitical measure computations.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: VAX VMS Fortran Source
Next: New Intel Visual Fortran user