From: Didi on
Paul E. Bennett wrote:
> ...
> <quote>
> A classical Forth puzzle is: What does the following do?
>
> : ???? -1 TUCK DO 2 + DUP +LOOP 2/ ;
>
> It is a standard method of calculating square root by summing
> the odd integers. It works for all unsigned integers from 0 to
> FFFFFFFF (or FFFF).
> </quote>
>
> You would have realised that the definition ???? is actually a sqrt by
> the method stated. Even in the more spread out style used by many of us
> these days this would only have been 5 lines at most.
>

Are you claiming that your unreadable 5 line algorithm which uses
summing somehow (I am not able to read that line and from what I
see I can say it is not worth learning how to read it) is in any way
superior to my 12 line example which finishes in a guaranteed
16 times through the loop for any 32 bit integer - and can be
read by any programmer? I might agree it is if the purpose is
to waste CPU and programmer time.

Dimiter

------------------------------------------------------
Dimiter Popoff Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Original message: http://groups.google.com/group/comp.arch.embedded/msg/e1532ed305256729?dmode=source


From: CBFalconer on
"Paul E. Bennett" wrote:
>
.... snip ...
>
> If you had focused on the first part;
>
> <quote>
> A classical Forth puzzle is: What does the following do?
>
> : ???? -1 TUCK DO 2 + DUP +LOOP 2/ ;
>
> It is a standard method of calculating square root by summing
> the odd integers. It works for all unsigned integers from 0 to
> FFFFFFFF (or FFFF).
> </quote>
>
> You would have realised that the definition ???? is actually a
> sqrt by the method stated. Even in the more spread out style used
> by many of us these days this would only have been 5 lines at most.

Which points out that the mathematics is more important than the
coding. Three or four lines of C can easily produce the same
effect.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.


** Posted from http://www.teranews.com **
From: James Waldby on
On Sat, 03 May 2008 20:22:47 +0000, Christopher Henrich wrote:
> ..."David T. Ashley" <dta(a)e3ft...> wrote:
>
>> What is the best integer square root algorithm for a processor with MUL
>> and DIV (integer multiplication and division) built-in?
>>
>> For one without MUL and DIV (or where the instructions operate on
>> shorter operands than would be required to form the square in
>> question)?
[...]
> Here's an algorithm for finding r = sqrt(x^2 + y^2).
>
> Assume x > y > 0. Let
> x_1 = (4 x^2 + 3)x / (4 x^2 + y^2)
> y_1 = y^3/(4 x^2 + y^2).
>
> Then x_1^2 + y_1^2 = x^2 + y^2. If the transformation x,y) \mapsto
> (x_1, y_1) is iterated, then the point converges *cubically* to (r, 0).
>
> I read this in a magazine many years ago.
>
> Adapting it to your problem, I would first find r = sqrt(x^2 + y^2),
> then s = sqrt(r^2 + z^2).

It probably would be a little better to find r = sqrt(x^2 + z^2),
then s = sqrt(r^2 + y^2), if x > y > z. On a processor with fast
multiply and divide relative to comparison and swap operations it
would make no difference, but on small cpu's the extra ops to put
x,y,z in order would cost much less on average than the potentially
avoided iteration steps.

> I would use scaled integer arithmetic, that is, represent x etc. by
> j_x/D etc. where j_x is an integer and I choose D to be some convenient
> common denominator like 65536.
From: 42Bastian Schick on
On Thu, 1 May 2008 13:34:52 -0700 (PDT), Didi <dp(a)tgi-sci.com> wrote:

>Here is it excerpted from a post I have made many years ago,
>calculates a 16 bit squre root of a 32 bit number.
> The example is in CPU32 (or CF) assembly.
>
>*
>* calc. the square root of d1.l; return it in d1.l
>* destroys d2,d3,d4
>*
>sqrtd1
> moveq #15,d2 counter
> clr.l d3 colect result here
> set loop,*
> bset d2,d3 try this
> move.w d3,d4 extra copy result so far
> mulu.w d4,d4 square
> cmp.l d1,d4 higher?
> bls keepbit branch not
> bclr d2,d3 else bit back down
>keepbit
> dbra d2,loop process all bits
> move.l d3,d1 update in d1
> rts

I guess this will benefit from the ff1 (Coldfire ISA A+ AFAIK) or
cntlzw (PowerPC) opcodes, which return the position of the first 1
bit, and could be used to setup the loop counter (ff1 / 2).




--
42Bastian
Do not email to bastian42(a)yahoo.com, it's a spam-only account :-)
Use <same-name>@monlynx.de instead !
From: Didi on
42Bastian Schick wrote:
> On Thu, 1 May 2008 13:34:52 -0700 (PDT), Didi <dp(a)tgi-sci.com> wrote:
>
> >Here is it excerpted from a post I have made many years ago,
> >calculates a 16 bit squre root of a 32 bit number.
> > The example is in CPU32 (or CF) assembly.
> >
> >*
> >* calc. the square root of d1.l; return it in d1.l
> >* destroys d2,d3,d4
> >*
> >sqrtd1
> > moveq #15,d2 counter
> > clr.l d3 colect result here
> > set loop,*
> > bset d2,d3 try this
> > move.w d3,d4 extra copy result so far
> > mulu.w d4,d4 square
> > cmp.l d1,d4 higher?
> > bls keepbit branch not
> > bclr d2,d3 else bit back down
> >keepbit
> > dbra d2,loop process all bits
> > move.l d3,d1 update in d1
> > rts
>
> I guess this will benefit from the ff1 (Coldfire ISA A+ AFAIK) or
> cntlzw (PowerPC) opcodes, which return the position of the first 1
> bit, and could be used to setup the loop counter (ff1 / 2).
>

It will indeed - it is really good to have the ff1 opcode back (as in
the 020,
there was a bfff1 there).
I am a frequent cntlzw user on the PPC, but have not used it on this
algorithm yet (never thought of it, I did it first for the CPU32 - or
was it the
HC11 - where there was no such opcode and have not changed it since).
I may do so next time I dig in that vicinity.

Dimiter

------------------------------------------------------
Dimiter Popoff Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Original message: http://groups.google.com/group/comp.arch.embedded/msg/7805bc4cf65eecab?dmode=source