From: D Yuniskis on
Hi Paul,

Paul Keinanen wrote:

[attributions elided]

>>> If the sqrt argument is in floating point format, the exponent is
>>> divided by 2 and the normalized could go through a table lookup. A
>>> single step normalization (increment exponent by one, shift right
>>> mantissa by one bit) may be required.
>> I think you can cheat even more *knowing* what subsequent operations
>> *will* be. E.g., postpone the normalization until after the sums
>> have been computed, etc. Small optimizations but they might
>> save a bit of time and precision.
>
> Adding/subtracting is easy to do in integer/fixed point arithmetic,
> but very nasty to do in floating point format due to demoralization

Agreed. Hence my suggestion to leave results "demoralized" ^^^ ;-)
at times.

> and normalization stages. Also integer/fixed to float conversion is
> nasty due to normalization.
>
> Doing sqrt((Xs-Xe)^2+(Ys-Ye)^2) might be easiest to implement by doing
> the differences in integer, take abs of both (to reduce table size)
> and use a table for squaring and using integer arithmetic for the sum
> and then convert to float for sqrt, especially if the result is needed
> in floating point representation.

Of course, "integer" is really "fixed point" as X(i) and Y(i) are
real numbers.

I don't think a naive "lets use this operator on all values"
approach will work effectively. E.g., in regions of low curvature
you can get away with bigger segment lengths (eats up a bigger
portion of the workload quicker!). And, in high curvature, you
quickly fall into the mud trying to get nice tight approximations
to sharp curves.

So, maybe use floats for the big numbers and a fixed point
scheme for the small increments. (or, two different fixed
point representations conveniently scaled :> )

> Single cycle normalisation/denormalisation in hardware requires a
> barrel shifter capable of shifting multiple bits left or right. For
> normalization (to [0.5..1.0] or to [1..2]) a priority encoder is
> required to detect the highest bit set, so that the correct shift can
> be applied in the barrel shifter.

Or, do it all fixed point and just waste bits. E.g., using
serial adders and multipliers means you just need to spend
money on flip-flops.

> In order to take the sqrt from a integer/fixed point argument and
> generate floating point sqrt, group two bits at a time starting from
> the binary point (left and right), take OR of each 2 bit group and
> feed the outputs to a priority encoder and use the barrel shifter to
> normalize the result to [0.25..1.0] or [1..4]. Use the normalized
> mantissa for table lookup and use the shift count (after shifting out
> the LSB) as the exponent.
>
>>> A suitable floating point representation would be 1 byte 2's
>>> complement exponent and 16 bit mantissa (without any hidden bit
>>> tricks).
>> The problem is deciding what constitutes "suitable" from a
>> computational aspect -- as you accumulate "length", you can't
>> sacrifice "resolution". E.g., even if N is only 10-12 bits,
>> you quickly eat up that 16 bits as you are often summing
>> very small segment lengths where you want to *carry* that
>> precision in your sum (yet are forced to discard it as
>> the accumulated length increases and consumes bits in your
>> representation).
>
> The floating point representation is very bad if you try to accumulate
> small increments to a huge sum. Sooner or later adding a small value
> does not change the sum at all.

Correct. The same is true when doing things like computing
the length of "nearly vertical/horizontal" segments (the
shorter axis gets lost when squared and added to the
squared *longer* axis)

> If you really have to use such incremental system, at least accumulate
> (say 100..1000) samples into a temporary sum, before adding to a final
> sum.

You can't even (naively) do that! I.e., you have to look at the
magnitude of the segment length in order to decide where it will
least disturb your error (i.e., if one of those "samples" ends
up being large, then accumulating it with the 99 other "small
ones" can dwarf their contributions)
From: Boudewijn Dijkstra on
Op Wed, 16 Jun 2010 07:44:10 +0200 schreef D Yuniskis
<not.going.to.be(a)seen.com>:
> Tim Wescott wrote:
>> On 06/14/2010 06:23 PM, D Yuniskis wrote:
>>
>>> I'm looking for a reasonably efficient (time + space)
>>> technique at approximating Bezier curve lengths from
>>> folks who may have already "been there"...
>>>
>>> I was thinking of subdividing the curve per De Casteljau.
>>> Then, computing the lengths of the individual segments
>>> obtained therefrom. Summing these as a lower limit
>>> on the curve length with the length of the control
>>> polygon acting as an upper limit.
>>>
>>> But, that's still a lot of number crunching! If I work
>>> in a quasi fixed point representation of the space, it
>>> still leaves me with lots of sqrt() operations (which
>>> are hard to reduce to table lookups in all but a trivial
>>> space! :< )
>>>
>>> So, obviously, I need a "trick". Something to throw away
>>> that gives me a "good enough" answer -- the sort of thing
>>> the "ideal" group wouldn't consider settling for ;-)
>> Since Bezier curves are polynomial, can't you find an exact solution
>> for each segment and solve it? Or is that what you find too expensive?
>
> You can walk the curve (varying t over [0..1]) but that
> gives you more detail than you need -- and at a very high
> cost.

It depends. If you can get rid of the sqrt, then it might be worth it to
for a grid approach (pretending to be drawing the curve), and then add up
all the single-pixel straight and diagonal (times 0.5*sqrt(2)) lengths.
But, if you're not careful, you might end up zigzagging, adding
significant amounts of error.



--
Gemaakt met Opera's revolutionaire e-mailprogramma:
http://www.opera.com/mail/
(remove the obvious prefix to reply by mail)
From: D Yuniskis on
Hi Boudewijn,

Boudewijn Dijkstra wrote:
> Op Wed, 16 Jun 2010 07:44:10 +0200 schreef D Yuniskis
> <not.going.to.be(a)seen.com>:
>> Tim Wescott wrote:
>>> On 06/14/2010 06:23 PM, D Yuniskis wrote:
>>>
>>>> I'm looking for a reasonably efficient (time + space)
>>>> technique at approximating Bezier curve lengths from
>>>> folks who may have already "been there"...
>>>>
>>>> I was thinking of subdividing the curve per De Casteljau.
>>>> Then, computing the lengths of the individual segments
>>>> obtained therefrom. Summing these as a lower limit
>>>> on the curve length with the length of the control
>>>> polygon acting as an upper limit.
>>>>
>>>> But, that's still a lot of number crunching! If I work
>>>> in a quasi fixed point representation of the space, it
>>>> still leaves me with lots of sqrt() operations (which
>>>> are hard to reduce to table lookups in all but a trivial
>>>> space! :< )
>>>>
>>>> So, obviously, I need a "trick". Something to throw away
>>>> that gives me a "good enough" answer -- the sort of thing
>>>> the "ideal" group wouldn't consider settling for ;-)
>>> Since Bezier curves are polynomial, can't you find an exact solution
>>> for each segment and solve it? Or is that what you find too expensive?
>>
>> You can walk the curve (varying t over [0..1]) but that
>> gives you more detail than you need -- and at a very high
>> cost.
>
> It depends. If you can get rid of the sqrt, then it might be worth it
> to for a grid approach (pretending to be drawing the curve), and then
> add up all the single-pixel straight and diagonal (times 0.5*sqrt(2))
> lengths. But, if you're not careful, you might end up zigzagging,
> adding significant amounts of error.

I was thinking that for very small (dx,dy) you could probably
do a table lookup or other approximation. But, even that
suffers from not being able to put a lid on the error
E.g., what if I have my control/end -points:
(0.0001,0.0001)
(0.1025,0.3077)
(0.7442,0.0330)
(0.0040,0.9908)

The "pixel" analogy doesn't work for me. There's nothing
magical about a displacement of "1" (e.g., pixel). It could
just as easily be "10" or "0.1" (don't think in terms of
displays)
From: Paul Keinanen on
On Sat, 19 Jun 2010 10:20:31 -0700, D Yuniskis
<not.going.to.be(a)seen.com> wrote:


>> Adding/subtracting is easy to do in integer/fixed point arithmetic,
>> but very nasty to do in floating point format due to demoralization
>
>Agreed. Hence my suggestion to leave results "demoralized" ^^^ ;-)
>at times.

Addition in floating point representation must be done with equal
exponents, so the smaller value needs to be denormalized to have the
same exponent as the larger.

In decimal notation 9.97E03+5.0E01 = 9.97E03+0.05E03 = 10.02E3 =
1.002E04. Of course, you could postpone the overflow normalization, if
other additions/subtractions will immediately follow and if the
arithmetic unit has sufficient headroom bits.

>
>> and normalization stages. Also integer/fixed to float conversion is
>> nasty due to normalization.
>>
>> Doing sqrt((Xs-Xe)^2+(Ys-Ye)^2) might be easiest to implement by doing
>> the differences in integer, take abs of both (to reduce table size)
>> and use a table for squaring and using integer arithmetic for the sum
>> and then convert to float for sqrt, especially if the result is needed
>> in floating point representation.
>
>Of course, "integer" is really "fixed point" as X(i) and Y(i) are
>real numbers.

Fixed point add/sub is the same as integer, in multiplication, you
just have to shift right the result by the number of fractional bits
etc.

>
>I don't think a naive "lets use this operator on all values"
>approach will work effectively. E.g., in regions of low curvature
>you can get away with bigger segment lengths (eats up a bigger
>portion of the workload quicker!). And, in high curvature, you
>quickly fall into the mud trying to get nice tight approximations
>to sharp curves.
>
>So, maybe use floats for the big numbers and a fixed point
>scheme for the small increments. (or, two different fixed
>point representations conveniently scaled :> )

As long as you avoid too frequent float/fixed or fixed/float
conversions in a single expression since (ne)normalization is
expensive using traditional instruction sets.

>> Single cycle normalisation/denormalisation in hardware requires a
>> barrel shifter capable of shifting multiple bits left or right. For
>> normalization (to [0.5..1.0] or to [1..2]) a priority encoder is
>> required to detect the highest bit set, so that the correct shift can
>> be applied in the barrel shifter.
>
>Or, do it all fixed point and just waste bits. E.g., using
>serial adders and multipliers means you just need to spend
>money on flip-flops.

Addition, subtraction, multiplication and denormalization can be done
with right shifting shift registers, however, normalization (and
division) would also require left shifting shift registers.

_Bit_ serial computing was used a lot in the tube era (and in the
HP-35 pocket scientific calculator), but I haven't seen it used more
recently.

These days, a huge number of bit serial processors might make sense if
each serial processor is controlling a single pixel or block of pixels
and some of these processors could be completely shut down to reduce
power consumption when no update is required. But otherwise, for equal
total throughput, both serial and parallel systems would require about
the same amount of arithmetic elements, but the more complex control
logic on serial processors would have to be duplicated on more
processors.

From: D Yuniskis on
Hi Peter,

Peter Dickerson wrote:

[snip]

>>> Control points
>>> (0.000000,0.000000)
>>> (0.000000,1.000000)
>>> (1.000000,1.000000)
>>> (1.000000,0.000000)
>>>
>>> sqrts length Romberg
>>> 1 1.000000
>>> 2 1.802776 2.070368
>>> 4 1.950719 2.000034
>>> 8 1.987716 2.000048
>>> 16 1.996931 2.000003
>>> 32 1.999233 2.000000
>>> 64 1.999808 2.000000
>>> 128 1.999952 2.000000
>>> 256 1.999988 2.000000
>>> 512 1.999997 2.000000
>>> 1024 1.999999 2.000000
>> I'm not sure we're on the same page, here. :-( (see below)
>> I think you are playing solely with "t" (or whatever you are
>> using as the control variable) and subdividing *that* (?)
>
> Probably not. Yes I'm using the control parameter, which is commonly called
> t.
>
>> [did you run this in Maple/Mathematica/etc.? If so, can you
>> post your code?]
>
> It's etc. I though the premise was a resource challenged system. So I wrote
> it in C - well its actually C++ but in C style. Pasted at the end, but only
> one line for the Romberg bit.

OK.

>>> so to get to 1 part per million the straightforward ('naive') method uses
>>> 1024 square roots for the result, and (assume no better way to start than
>>> from a single line segment) 1023 sqrt used getting there. The romberg
>>> method gets to a slightly better result using 48 square roots (32+16) and
>>> wastes 15. That seems like a worthwhile acceleration to me. The
>>> convergence power of h is 2 for the naive method and 4 for the romberg.
>>>
>>>>>> Unless you can constrain the shapes of the curves, there is
>>>>>> nothing you can do a priori (in your algorithm) to guarantee
>>>>>> that an approach, choice of t, etc. will work.
>>>>> it isn't an algoritm per se, its an optimization to avoid finer step
>>>>> sizes.
>>>>>> But, this computation may just end up telling you, "I can't
>>>>>> get to that point with a linear approximation as there is
>>>>>> too much curvature here". So, you just *wasted* a second.
>>>>>> Watching the algorithm creep into regions of increasing
>>>>>> curvature was *painful* (as each successive estimate got
>>>>>> "less good")
>>>>> in this case the previous and current result are used to improve the
>>>>> estimate in the hope of going to higher resolution.
>>>> Yes, as does the algorithm I proposed. If the points you end up
>>>> with don't look to be a good fit, you subdivide and try again
>>>> (recurse). Using the first control polygon and the subsequent
>>>> subdivided points as upper and lower limits on the "actual"
>>>> value lets you come up with a numerical factor that bounds
>>>> your error: i.e., the length is in [a,b] - which you can
>>>> examine at run time and use to decide if you need to further
>>>> subdivide. Watching how a and b vary with each refinement
>>>> lets you see when you are just spinning your wheels.
>>> The control points are hardly much of an upper bound in the case of a
>>> small number of control points. In my example, above, the upper board is
>>> 3.
>> No, you update the upper -- and lower! -- bounds as you subdivide
>> the curve.
>>
>> E.g., the first iteration of "my" algorithm has the length
>> of the segments in the control polygon as 3 (= 1+1+1)
>> and the length of the *chord* as 1 (obvious). So, my initial
>> estimate for length is 2 ((3+1)/2).
>>
>> Next, I subdivide the curve creating two curves:
>>
>> (0,0) (0,0.5) (0.25,0.75) (0.5,0.75)
>>
>> and
>>
>> (0.5,0.75) (0.75,0.75) (1,0.5) (1,0)
>>
>> (in this example, they are mirror images of each other)
>>
>> [apologies if I screw this up... I'm doing it in my head
>> as I type :-/ ]
>
> This is the bit I'm missing. How do you get the control points for the two
> halves from the control points of the whole? I haven't thought about this so

De Casteljau's algorithm.

Given a Start point (S), departing control point (D), arriving
control point (A), and End point (E), [i.e., these are P0 - P3],
the control points for the two half curve are:
M = (D+A)/2

L0 = S
R3 = E

L1 = (S+D)/2
R2 = (A+E)/2

L2 = (L1+M)/2
R1 = (M+R2)/2

L3 = R0 = (L2+R1)/2

(it's easier to visualize if you draw it out)

Note that L3 (=R0) is a point on the original curve
that you have just "created" at the expense of just
a few adds and shifts. L0 and R3 are the original
endpoints (S & E).

> perhaps its trivial, in which case it's clearly a good way to go. Even so,
> you can still use the Romberg approach on the results of two subdivisions.
> Its all about getting a better estimate based on two other estimates and
> knowledge of how the algorithm converges (and assumption that it does
> converge in a regular manner, but we're integrating a curve that is
> continuous and differentable)
>
>> The sum of the control polygon segments is ~1.103
>> (1/2 + sqrt(2)/4 + 1/4) and the length of the chord
>> is ~0.901 (sqrt(0.5^2 + 0.75^2)). The estimate for
>> the length of *this* curve (which is just the "left
>> half" of the original curve) would the be ~1.002
>> ((1.103 + 0.901)/2). Since the left and right halves
>> of the original curve are mirror images, the length
>> of the *right* curve will be the same leaving the
>> estimate for the total curve to be ~2.004
>>
>> [sorry, going any further than that without a calculator
>> and pen/paper is too hard for my little brain! :> ]
>>
>> Note that many of the calculations performed in the
>> first curve's assessment can be salvaged in the
>> assessment of the second curve. (I haven't sat down and
>> counted how many sqrts are *required* at each subdivision
>> of the curve and how many can be "inferred").
>>
>> The problem with all of these is coming up with criteria
>> that you can automagically use to stop the recursion.
>> E.g., I would simply use something like
>> |estimate[i+1]| < |estimate[i]| - epsilon
>> But, this assumes the process converges monotonically...
>
> Well the line segment summation is monotonic and bounded above, so that
> process must converge.

Yes.

> The straight line segment between two points is
> always the shortest, so breaking it into two segments to better follow the
> curve must be longer (or equal).
>
> the code... sorry this is going to have the formatting destroyed...
> note that I am making no attempt to optimize the Bezier point calculations
> since that wasn't the point I was making. Beziers can be calculated using
> just additions once the paraeters are convertd to the relevent form. Nor do
> I use integers. I simply fill out an array with the points befor hand to
> separate that from the integration. The Rombergy bit is just one line

OK, I'll try to take a look through it tomorrow. Thanks!
--don