From: Gordon Sande on
On 2010-02-06 11:02:25 -0400, George White <aa056(a)chebucto.ns.ca> said:

> On Thu, 4 Feb 2010, Ron Shepard wrote:
>
>> In article
>> <de9d0fd6-33f8-44e4-8e03-20428eaf7451(a)21g2000yqj.googlegroups.com>,
>> deltaquattro <deltaquattro(a)gmail.com> wrote:
>>
>>> when writing code which performs a numerical task such as for example
>>> interpolation, would you add the checks on the input data inside the
>>> sub itself, or would you demand it to other code to be called before
>>> the interpolation subroutine? For example, in 1D linear interpolation
>>> you have to lookup an ordered table for the position of a value x,
>>> before performing interpolation. Checking for x falling inside or
>>> outside the interval spanned by the table should be done inside or
>>> outside the interpolation sub? I'd say outside, but then when I reuse
>>> the code I risk forgetting the call to the checking sub...
>>
>> There isn't a universal answer to this. If the checks are relatively
>> inexpensive (time, memory, cache, etc.), then you should add them in the
>> low level routine so they are always active. If they are expensive, or
>> in the gray area between, then you have options. You might write two
>> versions of the routine, one with and one without the checks. You would
>> call the fast one when speed is critical (e.g. inside innermost
>> do-loops) and where it makes sense to test the arguments outside the
>> low-level routine (e.g. outside of the innermost do-loops), and you
>> would call the safe one when speed is not critical or where nothing is
>> saved by testing outside of the call. Or, you might have a single
>> version of the routine, but do the internal tests conditionally based on
>> the value of an argument (or the existence of an optional argument).
>> This is one of the common uses for optional arguments.
>
> If the chances of bad values are small, sometimes it is better to just
> go ahead and compute garbage, then catch the bad value outside the
> routine. Many functions are designed return a special value in case of
> errors, e.g., a function whose value should be positive can return
> negative values to signal various types of errors. When using lookup
> tables it is often
> simple to add two extra bins for the out-of-range values and adjust the
> code to map the out-of-range values to those bins. This is generally
> much easier to handle in parallel processing than putting error handling
> in low-level code -- you end up with 99 values that were computed quickly
> but the programs stalls waiting on the one value that triggered an
> error handler. This has been a problem with some implementations of
> basic math libraries where exp(-bignum) triggers a slow handler to
> decide whether to underflow and return 0 or trigger an exception.

It is often convenient to classify both users and applications as sophisicated
or not. The advice you are offering is for sophisicated users in sophisicated
applications and is perfectly reasonable. For the other three combinations
it is not so suitable. A user who has to ask has pretty clearly indicated that
they are not a sophisicated user with a sophisicated application. Recall the
standard comment on the pricing of yachts. The first instruction for obtaining
a drivers license is rarely, if ever, done with a Formula 1 car on a
closed track.
There are specialized schools for those who want to try their hand with high
performance vehicles but they assume considerable prior experience.

>> The same issue applies also within the routine regarding what to do if
>> it detects an error. Should it return an error code, or should it abort
>> internally? You can write a single routine with an optional return
>> argument that handles both situations. If the argument is present, then
>> it can be set with an appropriate error code and return to the caller,
>> otherwise the routine can abort on the spot.
>
> In some situations it is useful to keep a count of the errors so you
> can provide a table. In my work (remote sensing) you have a data set
> with
>>> 10^6 records, many with missing/invalid data. You want to compute what
> makes sense for each record, and keep statistics for the various data
> problems so you can generate a summary table. The Slatec xerror
> package provides this capability.


From: Dan Nagle on
Hello,

On 2010-02-06 10:02:25 -0500, George White <aa056(a)chebucto.ns.ca> said:

> Many functions are designed return a special value in case of errors,
> e.g., a function whose value should be positive can return negative
> values to signal various types of errors.

The use of magic numbers should be shunned.

It is all too easy to bring an out-of-range
into range when use of software is generalized,
as it surely will be eventually.

This advice applies to derived types as well
as intrinsic types. Add a valid/invalid value flag
to your derived type instead of guessing future use.

Just because the current programmer is sophisticated doesn't imply
that all future programmers will be.

--
Cheers!

Dan Nagle

From: Richard Maine on
Gordon Sande <g.sande(a)worldnet.att.net> wrote:

> On 2010-02-06 11:02:25 -0400, George White <aa056(a)chebucto.ns.ca> said:
>
> > If the chances of bad values are small, sometimes it is better to just
> > go ahead and compute garbage, then catch the bad value outside the
> > routine. Many functions are designed return a special value in case of
> > errors,

> It is often convenient to classify both users and applications as sophisicated
> or not. The advice you are offering is for sophisicated users in sophisicated
> applications and is perfectly reasonable. For the other three combinations
> it is not so suitable. A user who has to ask has pretty clearly indicated that
> they are not a sophisicated user with a sophisicated application. Recall the
> standard comment on the pricing of yachts.

Yes. I have trouble imagining ever recommending the return of such flag
values. Maybe there might be cases where it is a good idea, but if
someone is asking, I'd say that pretty much rules them out as a good
candidate. Of course, there are also cases where people do it without
asking and shouldn't have either, but it is harder for me to do much
about those.

One of the "war stories" that I'm sure I have repeated here before
relates to exactly such a poor decision, made by people who allege to
have been professionals. In processing data from the first Space Shuttle
entry, we recieved a tape (yes, that was the usual data transfer means
of the day) with what was alleged to be the "best estimated trajectory"
fr teh entry. It took only a casual glance at the data to see that it
was ludicrous. For a large portion of the entry, the values were many
orders of magnitude from plausible. It turns out that there were three
errors.

1. An error in algorithm. Atmospheric data was gathered at a handful of
points roughly along the expected flight path. One needed to interpolate
from those points to estimate the data at any particular vehicle
location. Interpolating between sparse and irregularly positioned points
on the surface of the earth can be non-trivial. They chose an algorithm
that might have sounded reasonable, but had a serious flaw. In
particular, they intended to interpolate between the two points that
most closely bracketed the current estimated vehicle location. They
selected the two points by taking the two that were closest to the
current estimated location. Unfortunately, that algorithm is only
reasonable when the spacing is at least somewhat uniform.

There was one data station in Hawaii and another an Vandenberg (on the
Pacific coast of California). After that, the stations were a *LOT* more
closely spaced. )THis was for a west-coast landing). I think the next
one was in Bakersfield, which is maybe a hundred miles from Vandenberg.
Slightly over halfway between Hawaii and Vandenberg, the two closest
stations became Vandenberg and Bakersfield. Oops.

2. The use of flag values in tables. The Bakersfield data was not
expected to be used until after the Shuttle had passed Vandenberg. No
data was gathered for Bakersfield the higher altitudes that the Shuttle
was at over the Pacific. Those table locations were filled with 999.
flag values. (I don't recall what machines were used, but they certainly
didn't have IEEE floatting point or anything else that had hardware flag
vaues like NaNs).

3. The failure to do any plausibility checking at all, inside the
interpolation routine, outside of it, or in the final product. It turns
out that, for example, 999 is not a very reasonable value for
atmospheric density in slugs per cubic feet. The standard value at sea
level is something like 0.002378, if my memory serves me. (It has been a
long time since I needed that number; surprized that it popped into my
mind at all; I wouldn't advise using it without checking). Not only did
they not check the value from the table, they evidently never even
glanced at the final results on their output tape. Using 999 for density
results in pretty "impressive" dynamic pressure values, for an example
that I recall pointing out. When I complained about the impausible
values, they explained that no, they didn't check for plausibility
because they were "computer people" instead of aero engineers. They just
computed the numbers and that it was up to us to tell them whether the
numbers looked reasonable. The concept that they might check their own
work was foreign to them.

I resolved never again to use any data from the group in question. I'm
sure they thought of themselves as professionals; people like that give
a bad name to the profession.

--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain
From: deltaquattro on
On 6 Feb, 16:02, George White <aa...(a)chebucto.ns.ca> wrote:
> On Thu, 4 Feb 2010, Ron Shepard wrote:

[..]

>
> If the chances of bad values are small, sometimes it is better to just go
> ahead and compute garbage, then catch the bad value outside the routine.
> Many functions are designed return a special value in case of errors,
> e.g., a function whose value should be positive can return negative values
> to signal various types of errors.  When using lookup tables it is often
> simple to add two extra bins for the out-of-range values and adjust the
> code to map the out-of-range values to those bins.  This is generally
> much easier to handle in parallel processing than putting error handling
> in low-level code -- you end up with 99 values that were computed quickly
> but the programs stalls waiting on the one value that triggered an
> error handler.  This has been a problem with some implementations of
> basic math libraries where exp(-bignum) triggers a slow handler to
> decide whether to underflow and return 0 or trigger an exception.

Hi, George,

great! The idea of using two extra bins for the out-of-range values
fit perfectly in my actual code structure. My Locate function searchs
the lookup table for the index of the array element xArr(i) (if any)
whose value is less than x (the interpolation point). So I can assign
i=0 if x < min(xArr), and i=n if i=n if x > max(xArr). This way I have
been able to keep Locate as a function, without having to redefine it
asubroutine just to return the error code.

>
> > The same issue applies also within the routine regarding what to do if
> > it detects an error.  Should it return an error code, or should it abort
> > internally?  You can write a single routine with an optional return
> > argument that handles both situations.  If the argument is present, then
> > it can be set with an appropriate error code and return to the caller,
> > otherwise the routine can abort on the spot.
>
> In some situations it is useful to keep a count of the errors so you can
> provide a table.  In my work (remote sensing) you have a data set with>>10^6 records, many with missing/invalid data.  You want to compute what
>
> makes sense for each record, and keep statistics for the various data
> problems so you can generate a summary table.  The Slatec xerror package
> provides this capability.
>

Interesting, I will look into that. Thanks,

Best Regards,

deltaquattro

> --
> George White <aa...(a)chebucto.ns.ca> <g...(a)acm.org>
> 189 Parklea Dr., Head of St. Margarets Bay, Nova Scotia  B3Z 2G6

From: deltaquattro on
On 6 Feb, 17:47, Dan Nagle <danna...(a)verizon.net> wrote:
> Hello,
>
> On 2010-02-06 10:02:25 -0500, George White <aa...(a)chebucto.ns.ca> said:
>
> > Many functions are designed return a special value in case of errors,
> > e.g., a function whose value should be positive can return negative
> > values to signal various types of errors.
>
> The use of magic numbers should be shunned.
>
> It is all too easy to bring an out-of-range
> into range when use of software is generalized,
> as it surely will be eventually.
>
> This advice applies to derived types as well
> as intrinsic types.  Add a valid/invalid value flag
> to your derived type instead of guessing future use.
>
> Just because the current programmer is sophisticated doesn't imply
> that all future programmers will be.
>
> --
> Cheers!
>
> Dan Nagle

Hi, Dan,

I understand your concern, as also expressed by Gordon and Richard. As
a general rule, for sure you're right. However, in the specific case
of my Locate function called by LinearInterp, as modified using
George's idea, I think it's safe to rely on out-of-range values:

IMPLICIT NONE
CONTAINS
INTEGER FUNCTION locate(xArr,x)
IMPLICIT NONE
REAL, DIMENSION(:), INTENT(IN) :: xArr
REAL, INTENT(IN) :: x
! dato un array xArr(1:n) ordinato ed un valore x, locate è
l'indice j t.c.
! x è in [xArr(j),xArr(j+1)], se xArr è crescente, altrimenti in
[xArr(j+1),xArr(j)].
! Se x < xArr(1) (x>xArr(1)) per xArr crescente (xArr decrescente)
allora j = 0
! Se x > xArr(n) (x<xArr(n)) per xArr crescente (xArr decrescente)
allora j = n
INTEGER jl,jm,ju,n
LOGICAL ascend

! Inizializzazione
n = size(xArr)
jl = 0
ju = n+1
ascend = (xArr(n) >= xArr(1))


if (x == xArr(1)) then
locate = 1
else if (x == xArr(n)) then
locate = n-1
else
! Ricerca
do while (ju-jl > 1)
jm = (ju+jl)/2
if (ascend .eqv. (x >= xArr(jm))) then
jl = jm
else
ju = jm
endif
end do
locate = jl
endif

END FUNCTION locate
END MODULE LOCATE_M


I think it's safe to add the two out-of-range bins as 0 and n (with n
being the length of xArr). The reason is that, because of the way xArr
is declared, I know that the indices of xArr go from 1 to n, and that
jl is forced to be between 1 and n-1, if x is not outside
[min(xArr),max(xArr)]. So I can use 0 and n as out-of-range values.
Clearly, in general it's highly risky, or even downright impossible,
to rely on out-of-range values to return an error flag from a
function. How could I do that, without having to rewrite all my
functions as subroutines, in order to allow for two output parameters?
Which is the solution you choose in your codes? Do you define the
function as a derived type? Thanks,

Best Regards

deltaquattro