A question on coding style [Fortran]

Prev: Which ode solver is the best for this problem? (Duffing oscillator)
Next: IOSTAT for input from sequential formatted files with ?"incomplete" final record

From: Arjan on 4 Feb 2010 14:23

> I also put in heavy duty checks, where it is possible, that are under control
> of an DEBUG parameter in an IF statement. Leave them in as they are very handy

In a similar way, most of my functions/subroutines have a line:

INTEGER, PARAMETER :: DebugLevel = 0

The "heaviness"/verbosity of the checks activated by IF-statements can
be increased by increasing the value of DebugLevel. After some years I
really started to appreciate myself for doing this...

A.

From: Ron Shepard on 4 Feb 2010 16:26

In article
<de9d0fd6-33f8-44e4-8e03-20428eaf7451(a)21g2000yqj.googlegroups.com>,
deltaquattro <deltaquattro(a)gmail.com> wrote:

> when writing code which performs a numerical task such as for example
> interpolation, would you add the checks on the input data inside the
> sub itself, or would you demand it to other code to be called before
> the interpolation subroutine? For example, in 1D linear interpolation
> you have to lookup an ordered table for the position of a value x,
> before performing interpolation. Checking for x falling inside or
> outside the interval spanned by the table should be done inside or
> outside the interpolation sub? I'd say outside, but then when I reuse
> the code I risk forgetting the call to the checking sub...

There isn't a universal answer to this. If the checks are relatively
inexpensive (time, memory, cache, etc.), then you should add them in the
low level routine so they are always active. If they are expensive, or
in the gray area between, then you have options. You might write two
versions of the routine, one with and one without the checks. You would
call the fast one when speed is critical (e.g. inside innermost
do-loops) and where it makes sense to test the arguments outside the
low-level routine (e.g. outside of the innermost do-loops), and you
would call the safe one when speed is not critical or where nothing is
saved by testing outside of the call. Or, you might have a single
version of the routine, but do the internal tests conditionally based on
the value of an argument (or the existence of an optional argument).
This is one of the common uses for optional arguments.

The same issue applies also within the routine regarding what to do if
it detects an error. Should it return an error code, or should it abort
internally? You can write a single routine with an optional return
argument that handles both situations. If the argument is present, then
it can be set with an appropriate error code and return to the caller,
otherwise the routine can abort on the spot.

$.02 -Ron Shepard

From: glen herrmannsfeldt on 4 Feb 2010 17:05

Richard Maine <nospam(a)see.signature> wrote:
> deltaquattro <deltaquattro(a)gmail.com> wrote:

>> ps one of the reasons why I'm asking this is because I've seen a lot
>> of interp sub in libraries which don't do any checks on input data,
>> while mine are filled to the brim with checks, so I started doubting
>> my coding practices :)

> Yes, one sees lots of code that fails to do basic sanity checks. One
> also sees lots of examples of resulting failures. At times it seems like
> about half of the software security flaws out there boil down to failing
> to check such things. Buffer overrun, anyone? That's more often in C
> code, than in Fortran, but the principles apply.

As a first approximation, you should test well user supplied values,
and not so well data supplied by other parts of your program.

Say, for example, a cubic-spline routine where the first computed
the spline coefficients and another routine evaluates the splines
at a specified point. The second should not be expected to do
extensive tests on the supplied coefficients, but should test that
the user supplied interpolation point is within range. (Unless
you allow for extrapolation, but don't do that.)

You should only worry about the time taken for the tests if they
are expected to be inside deeply nested (or otherwise executed
billions of times) loops. Integer and logical tests are extremely
fast on modern, and even not so modern, processors. Floating point
is a little slower in many cases. Scalar tests are faster than
array test. Again considering the cubic spline routine which might
be executed many times with one set of computer coefficients,
extensive retesting of the coefficient arrays could be very slow.

There are still some tests that could be done, though. If you
test the first and last sets of coefficients, that protects against
some of the more obvious mistakes that someone might make. You
could also add a value near the beginning of the data structure
that should match another value near the end, and test that at
evaluation time. That protects against many cases of the user
passing the wrong data structure to the evaluation routine.
Put the length into the data structure instead of depending on
the user to resupply that each time.

This reminds me of the computed GOTO statement. In Fortran 66,
it was the programmers responsibility to be sure that the selection
value was in range. Some compilers tested it and branched to the
next statement for out of range values, but the standard didn't
require that. I believe that was changed to require the test in
Fortran 77. It seems that sometime between 1966 and 1977 the level
of testing expected changed...

-- glen

From: deltaquattro on 5 Feb 2010 07:52

On 4 Feb, 23:05, glen herrmannsfeldt <g...(a)ugcs.caltech.edu> wrote:
> Richard Maine <nos...(a)see.signature> wrote:
> > deltaquattro <deltaquat...(a)gmail.com> wrote:
> >> ps one of the reasons why I'm asking this is because I've seen a lot
> >> of interp sub in libraries which don't do any checks on input data,
> >> while mine are filled to the brim with checks, so I started doubting
> >> my coding practices :)
> > Yes, one sees lots of code that fails to do basic sanity checks. One
> > also sees lots of examples of resulting failures. At times it seems like
> > about half of the software security flaws out there boil down to failing
> > to check such things. Buffer overrun, anyone? That's more often in C
> > code, than in Fortran, but the principles apply.
>
> As a first approximation, you should test well user supplied values,
> and not so well data supplied by other parts of your program.
>
> Say, for example, a cubic-spline routine where the first computed
> the spline coefficients and another routine evaluates the splines
> at a specified point. The second should not be expected to do
> extensive tests on the supplied coefficients, but should test that
> the user supplied interpolation point is within range. (Unless
> you allow for extrapolation, but don't do that.)
>
> You should only worry about the time taken for the tests if they
> are expected to be inside deeply nested (or otherwise executed
> billions of times) loops. Integer and logical tests are extremely
> fast on modern, and even not so modern, processors. Floating point
> is a little slower in many cases. Scalar tests are faster than
> array test. Again considering the cubic spline routine which might
> be executed many times with one set of computer coefficients,
> extensive retesting of the coefficient arrays could be very slow.
>
> There are still some tests that could be done, though. If you
> test the first and last sets of coefficients, that protects against
> some of the more obvious mistakes that someone might make. You
> could also add a value near the beginning of the data structure
> that should match another value near the end, and test that at
> evaluation time. That protects against many cases of the user
> passing the wrong data structure to the evaluation routine.
> Put the length into the data structure instead of depending on
> the user to resupply that each time.
>
> This reminds me of the computed GOTO statement. In Fortran 66,
> it was the programmers responsibility to be sure that the selection
> value was in range. Some compilers tested it and branched to the
> next statement for out of range values, but the standard didn't
> require that. I believe that was changed to require the test in
> Fortran 77. It seems that sometime between 1966 and 1977 the level
> of testing expected changed...
>
> -- glen

Hi guys,

thank you all for the replies. This group is a treasure trove of
Fortran wisdom, as always :) I hadn't thought of using a Debug
parameter and/or optional input/output arguments for this issue - very
handy suggestions. Dividing the checks between checks on user supplied
data and data generated by code is also interesting, and my case would
fall under the second case.
So in the end the check stays where it is, in LinearInterp. Thnx
again,

Best Regards

deltaquattro

From: George White on 6 Feb 2010 10:02

On Thu, 4 Feb 2010, Ron Shepard wrote:

> In article
> <de9d0fd6-33f8-44e4-8e03-20428eaf7451(a)21g2000yqj.googlegroups.com>,
> deltaquattro <deltaquattro(a)gmail.com> wrote:
>
>> when writing code which performs a numerical task such as for example
>> interpolation, would you add the checks on the input data inside the
>> sub itself, or would you demand it to other code to be called before
>> the interpolation subroutine? For example, in 1D linear interpolation
>> you have to lookup an ordered table for the position of a value x,
>> before performing interpolation. Checking for x falling inside or
>> outside the interval spanned by the table should be done inside or
>> outside the interpolation sub? I'd say outside, but then when I reuse
>> the code I risk forgetting the call to the checking sub...
>
> There isn't a universal answer to this. If the checks are relatively
> inexpensive (time, memory, cache, etc.), then you should add them in the
> low level routine so they are always active. If they are expensive, or
> in the gray area between, then you have options. You might write two
> versions of the routine, one with and one without the checks. You would
> call the fast one when speed is critical (e.g. inside innermost
> do-loops) and where it makes sense to test the arguments outside the
> low-level routine (e.g. outside of the innermost do-loops), and you
> would call the safe one when speed is not critical or where nothing is
> saved by testing outside of the call. Or, you might have a single
> version of the routine, but do the internal tests conditionally based on
> the value of an argument (or the existence of an optional argument).
> This is one of the common uses for optional arguments.

If the chances of bad values are small, sometimes it is better to just go
ahead and compute garbage, then catch the bad value outside the routine.
Many functions are designed return a special value in case of errors,
e.g., a function whose value should be positive can return negative values
to signal various types of errors. When using lookup tables it is often
simple to add two extra bins for the out-of-range values and adjust the
code to map the out-of-range values to those bins. This is generally
much easier to handle in parallel processing than putting error handling
in low-level code -- you end up with 99 values that were computed quickly
but the programs stalls waiting on the one value that triggered an
error handler. This has been a problem with some implementations of
basic math libraries where exp(-bignum) triggers a slow handler to
decide whether to underflow and return 0 or trigger an exception.

> The same issue applies also within the routine regarding what to do if
> it detects an error. Should it return an error code, or should it abort
> internally? You can write a single routine with an optional return
> argument that handles both situations. If the argument is present, then
> it can be set with an appropriate error code and return to the caller,
> otherwise the routine can abort on the spot.

In some situations it is useful to keep a count of the errors so you can
provide a table. In my work (remote sensing) you have a data set with
>>10^6 records, many with missing/invalid data. You want to compute what
makes sense for each record, and keep statistics for the various data
problems so you can generate a summary table. The Slatec xerror package
provides this capability.

--
George White <aa056(a)chebucto.ns.ca> <gnw3(a)acm.org>
189 Parklea Dr., Head of St. Margarets Bay, Nova Scotia B3Z 2G6

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: Which ode solver is the best for this problem? (Duffing oscillator)
Next: IOSTAT for input from sequential formatted files with ?"incomplete" final record