From: Norbert Juffa on
"Steve Pope" <spope33(a)speedymail.org> wrote in message news:i2cg29$hq6$1(a)blue.rahul.net...
> Norbert Juffa <juffa(a)earthlink.net> wrote:
>
>>Below I demonstrate how this could be done for log2() in 8.24 fixed
>>point. Our input x is of the form 2^e*m, 1 <= m < 2, m = 1 + f. Using
>>a "count leading zeros" operation, we quickly establish e and f. Given
>>f, 0 <= f < 1, we compute the index i of the closest, smaller, table
>>entry in a table of values of log2(x) on [1,2). The difference between
>>the actual argument and the function argument corresponding to i shall
>>be d. Using the function value from the table, f0, as well as the next
>>two table entries, f1 and f2, we compute the coefficients a, b of a
>>parabola that fits through all three function values.
>
> The above is a valuable technique, but times have changed and in many
> DSP scenarios these days one can just plug the mantissa into
> a (larger) LUT and be done with it. This applies to log(), square(),
> and similar functions.
>
> Steve

For a table lookup method to work efficiently on most embedded processors
requires a high likelihood that the table elements are present in L1 cache.
Often there will be tables for division, sqrt(), exp(), log(), all of whom
would want to fit into L1 at the same time, leaving room for other data as
well. The L1 may be as small as 8KB or even 4KB, so keeping lookup tables
small seems very much desirable.

To achieve accuracy to full 8.24 precision one needs a table of 258*4 bytes
(~1KB) each for both exp() and log() with quadratic interpolation. Table
storage will increase significantly if linear interpolation is used, even
if an economical scheme such as bipartite tables is used. Even for hardware
implementations the use of quadratic interpolation may well be preferred to
keep ROM sizes reasonable.

I would be interested in learning in which scenarios requiring relative high
accuracy (such as here) quadratic interpolation is dispensable, and linear
interpolation or straight lookup are feasible.

-- Norbert


From: Clay on
On Jul 23, 2:42 pm, "Norbert Juffa" <ju...(a)earthlink.net> wrote:
> "Steve Pope" <spop...(a)speedymail.org> wrote in messagenews:i2cg29$hq6$1(a)blue.rahul.net...
> > Norbert Juffa <ju...(a)earthlink.net> wrote:
>
> >>Below I demonstrate how this could be done for log2() in 8.24 fixed
> >>point. Our input x is of the form 2^e*m, 1 <= m < 2, m = 1 + f. Using
> >>a "count leading zeros" operation, we quickly establish e and f. Given
> >>f, 0 <= f < 1, we compute the index i of the closest, smaller, table
> >>entry in a table of values of log2(x) on [1,2). The difference between
> >>the actual argument and the function argument corresponding to i shall
> >>be d. Using the function value from the table, f0, as well as the next
> >>two table entries, f1 and f2, we compute the coefficients a, b of a
> >>parabola that fits through all three function values.
>
> > The above is a valuable technique, but times have changed and in many
> > DSP scenarios these days one can just plug the mantissa into
> > a (larger) LUT and be done with it.  This applies to log(), square(),
> > and similar functions.
>
> > Steve
>
> For a table lookup method to work efficiently on most embedded processors
> requires a high likelihood that the table elements are present in L1 cache.
> Often there will be tables for division, sqrt(), exp(), log(), all of whom
> would want to fit into L1 at the same time, leaving room for other data as
> well. The L1 may be as small as 8KB or even 4KB, so keeping lookup tables
> small seems very much desirable.
>
> To achieve accuracy to full 8.24 precision one needs a table of 258*4 bytes
> (~1KB) each for both exp() and log() with quadratic interpolation. Table
> storage will increase significantly if linear interpolation is used, even
> if an economical scheme such as bipartite tables is used. Even for hardware
> implementations the use of quadratic interpolation may well be preferred to
> keep ROM sizes reasonable.
>
> I would be interested in learning in which scenarios requiring relative high
> accuracy (such as here) quadratic interpolation is dispensable, and linear
> interpolation or straight lookup are feasible.
>
> -- Norbert- Hide quoted text -
>
> - Show quoted text -

Years ago I made a light meter that showed relative intensity in terms
of F stops and density. Both of these are logarithmic quantities and I
used a Mot 68HC05 micro and I went for the accuracy since the
computation time being a microsecond or 10mSec just didn't matter! The
idea was to force all of the error into the sensor and not have the
processor's math add to the final error. Simplicity and tiny code was
the desirable thing. My 4 arithematic floating point routines took
only 512 bytes! When I added the input, output and log functions, I
was still under 1k bytes!

Clay
From: robert bristow-johnson on
On Jul 23, 2:42 pm, "Norbert Juffa" <ju...(a)earthlink.net> wrote:
> "Steve Pope" <spop...(a)speedymail.org> wrote in messagenews:i2cg29$hq6$1(a)blue.rahul.net...
> > Norbert Juffa <ju...(a)earthlink.net> wrote:
>
> >>Below I demonstrate how this could be done for log2() in 8.24 fixed
> >>point. Our input x is of the form 2^e*m, 1 <= m < 2, m = 1 + f. Using
> >>a "count leading zeros" operation, we quickly establish e and f. Given
> >>f, 0 <= f < 1, we compute the index i of the closest, smaller, table
> >>entry in a table of values of log2(x) on [1,2). The difference between
> >>the actual argument and the function argument corresponding to i shall
> >>be d. Using the function value from the table, f0, as well as the next
> >>two table entries, f1 and f2, we compute the coefficients a, b of a
> >>parabola that fits through all three function values.
>
> > The above is a valuable technique, but times have changed and in many
> > DSP scenarios these days one can just plug the mantissa into
> > a (larger) LUT and be done with it.  This applies to log(), square(),
> > and similar functions.
>
....
>
> For a table lookup method to work efficiently on most embedded processors
> requires a high likelihood that the table elements are present in L1 cache.
> Often there will be tables for division, sqrt(), exp(), log(), all of whom
> would want to fit into L1 at the same time, leaving room for other data as
> well. The L1 may be as small as 8KB or even 4KB, so keeping lookup tables
> small seems very much desirable.
>
> To achieve accuracy to full 8.24 precision one needs a table of 258*4 bytes
> (~1KB) each for both exp() and log() with quadratic interpolation. Table
> storage will increase significantly if linear interpolation is used, even
> if an economical scheme such as bipartite tables is used.

i had to look up "bipartite table" to get an idea about what you're
talking about. would this paper:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.36.6368&rep=rep1&type=pdf

be a good initial description of the technique? it's a little bit
intriguing, it's based on the notion that the first derivative of a
function changes much less in a segment of its domain than the "zeroth
derivative" of the function.

but i don't see quadratic interpolation in this. how is quadratic
interpolation done? do you continue to divide the input word into
three parts, use two tables, and have a quadratic function rather than
a linear one (referring to Eq. 6 in the paper i cited above) for the
least significant portion of the sum?

> Even for hardware implementations the use of quadratic
> interpolation may well be preferred to keep ROM sizes reasonable.

if it's just the two tables, why stop at quadratic?

> I would be interested in learning in which scenarios requiring relative high
> accuracy (such as here) quadratic interpolation is dispensable, and linear
> interpolation or straight lookup are feasible.

i need more context to understand what you are seeking to answer. if
it's still just two parallel table lookups (with sign-extension and
addition following), there is no reason that the "a1" coefficient
lookup need be a linear function of "x2" as is depicted in Eq. 6 of
that paper. in fact, it need not be quadratic. furthermore, you can
toss all assumptions to the wind and make some kinda MATLAB program to
optimally determine *both* of the a1 and a0 tables in such a way to
minimize error.

BTW, for all of the fiddling around you need to do to implement this
bipartite table lookup, possible sign extension, and addition in
software, you might be able to perform a decent polynomial
approximation for as few instruction cycles. the only reason i see
LUT as superior to a, say, 4th-order or 6th-order polynomial
approximation is in sheer speed when that is important and there is
plenty of memory to burn.

r b-j

From: Steve Pope on
I do not doubt the validity of the algorithms (for interpolating
down to a point where it reduces your table lookup size).

What I doubt is the significance, these days, of the vanishing
grey area between what you'd want to just do in IEEE double precision
floating point, and what you'd want to design to a target bit-width
in hardware. The idea of an in-between, general-purpose format
is what I'm not sure is still there anymore.

Steve
From: robert bristow-johnson on
On Jul 24, 1:46 am, spop...(a)speedymail.org (Steve Pope) wrote:
> I do not doubt the validity of the algorithms (for interpolating
> down to a point where it reduces your table lookup size).
>
> What I doubt is the significance, these days, of the vanishing
> grey area between what you'd want to just do in IEEE double precision
> floating point, and what you'd want to design to a target bit-width
> in hardware.  The idea of an in-between, general-purpose format
> is what I'm not sure is still there anymore.

dunno what you mean by an "in-between, general-purpose format" but,
Steve, even if you have IEEE doubles, but no std math lib, what would
you do to get exp() and log()? how would you implement the stdlib
exp() and log() or sqrt() or sin() or tan() or atan(), if you had to?

isn't that of significance to some degree? or is the source code (say
for the gnu std math lib) published somewhere? if so, i would like to
see what order of polynomial they use and the coefficients.

r b-j