|
From: Didi on 4 May 2008 10:28 Paul E. Bennett wrote: > ... > <quote> > A classical Forth puzzle is: What does the following do? > > : ???? -1 TUCK DO 2 + DUP +LOOP 2/ ; > > It is a standard method of calculating square root by summing > the odd integers. It works for all unsigned integers from 0 to > FFFFFFFF (or FFFF). > </quote> > > You would have realised that the definition ???? is actually a sqrt by > the method stated. Even in the more spread out style used by many of us > these days this would only have been 5 lines at most. > Are you claiming that your unreadable 5 line algorithm which uses summing somehow (I am not able to read that line and from what I see I can say it is not worth learning how to read it) is in any way superior to my 12 line example which finishes in a guaranteed 16 times through the loop for any 32 bit integer - and can be read by any programmer? I might agree it is if the purpose is to waste CPU and programmer time. Dimiter ------------------------------------------------------ Dimiter Popoff Transgalactic Instruments http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/ Original message: http://groups.google.com/group/comp.arch.embedded/msg/e1532ed305256729?dmode=source
From: CBFalconer on 4 May 2008 10:33 "Paul E. Bennett" wrote: > .... snip ... > > If you had focused on the first part; > > <quote> > A classical Forth puzzle is: What does the following do? > > : ???? -1 TUCK DO 2 + DUP +LOOP 2/ ; > > It is a standard method of calculating square root by summing > the odd integers. It works for all unsigned integers from 0 to > FFFFFFFF (or FFFF). > </quote> > > You would have realised that the definition ???? is actually a > sqrt by the method stated. Even in the more spread out style used > by many of us these days this would only have been 5 lines at most. Which points out that the mathematics is more important than the coding. Three or four lines of C can easily produce the same effect. -- [mail]: Chuck F (cbfalconer at maineline dot net) [page]: <http://cbfalconer.home.att.net> Try the download section. ** Posted from http://www.teranews.com **
From: James Waldby on 4 May 2008 14:26 On Sat, 03 May 2008 20:22:47 +0000, Christopher Henrich wrote: > ..."David T. Ashley" <dta(a)e3ft...> wrote: > >> What is the best integer square root algorithm for a processor with MUL >> and DIV (integer multiplication and division) built-in? >> >> For one without MUL and DIV (or where the instructions operate on >> shorter operands than would be required to form the square in >> question)? [...] > Here's an algorithm for finding r = sqrt(x^2 + y^2). > > Assume x > y > 0. Let > x_1 = (4 x^2 + 3)x / (4 x^2 + y^2) > y_1 = y^3/(4 x^2 + y^2). > > Then x_1^2 + y_1^2 = x^2 + y^2. If the transformation x,y) \mapsto > (x_1, y_1) is iterated, then the point converges *cubically* to (r, 0). > > I read this in a magazine many years ago. > > Adapting it to your problem, I would first find r = sqrt(x^2 + y^2), > then s = sqrt(r^2 + z^2). It probably would be a little better to find r = sqrt(x^2 + z^2), then s = sqrt(r^2 + y^2), if x > y > z. On a processor with fast multiply and divide relative to comparison and swap operations it would make no difference, but on small cpu's the extra ops to put x,y,z in order would cost much less on average than the potentially avoided iteration steps. > I would use scaled integer arithmetic, that is, represent x etc. by > j_x/D etc. where j_x is an integer and I choose D to be some convenient > common denominator like 65536.
From: 42Bastian Schick on 5 May 2008 02:18 On Thu, 1 May 2008 13:34:52 -0700 (PDT), Didi <dp(a)tgi-sci.com> wrote: >Here is it excerpted from a post I have made many years ago, >calculates a 16 bit squre root of a 32 bit number. > The example is in CPU32 (or CF) assembly. > >* >* calc. the square root of d1.l; return it in d1.l >* destroys d2,d3,d4 >* >sqrtd1 > moveq #15,d2 counter > clr.l d3 colect result here > set loop,* > bset d2,d3 try this > move.w d3,d4 extra copy result so far > mulu.w d4,d4 square > cmp.l d1,d4 higher? > bls keepbit branch not > bclr d2,d3 else bit back down >keepbit > dbra d2,loop process all bits > move.l d3,d1 update in d1 > rts I guess this will benefit from the ff1 (Coldfire ISA A+ AFAIK) or cntlzw (PowerPC) opcodes, which return the position of the first 1 bit, and could be used to setup the loop counter (ff1 / 2). -- 42Bastian Do not email to bastian42(a)yahoo.com, it's a spam-only account :-) Use <same-name>@monlynx.de instead !
From: Didi on 5 May 2008 11:30 42Bastian Schick wrote: > On Thu, 1 May 2008 13:34:52 -0700 (PDT), Didi <dp(a)tgi-sci.com> wrote: > > >Here is it excerpted from a post I have made many years ago, > >calculates a 16 bit squre root of a 32 bit number. > > The example is in CPU32 (or CF) assembly. > > > >* > >* calc. the square root of d1.l; return it in d1.l > >* destroys d2,d3,d4 > >* > >sqrtd1 > > moveq #15,d2 counter > > clr.l d3 colect result here > > set loop,* > > bset d2,d3 try this > > move.w d3,d4 extra copy result so far > > mulu.w d4,d4 square > > cmp.l d1,d4 higher? > > bls keepbit branch not > > bclr d2,d3 else bit back down > >keepbit > > dbra d2,loop process all bits > > move.l d3,d1 update in d1 > > rts > > I guess this will benefit from the ff1 (Coldfire ISA A+ AFAIK) or > cntlzw (PowerPC) opcodes, which return the position of the first 1 > bit, and could be used to setup the loop counter (ff1 / 2). > It will indeed - it is really good to have the ff1 opcode back (as in the 020, there was a bfff1 there). I am a frequent cntlzw user on the PPC, but have not used it on this algorithm yet (never thought of it, I did it first for the CPU32 - or was it the HC11 - where there was no such opcode and have not changed it since). I may do so next time I dig in that vicinity. Dimiter ------------------------------------------------------ Dimiter Popoff Transgalactic Instruments http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/ Original message: http://groups.google.com/group/comp.arch.embedded/msg/7805bc4cf65eecab?dmode=source
|
Next
|
Last
Pages: 1 2 Prev: Choosing Bi-port Ethernet Chip Next: Loss of synchronisation: Are the chips reliable? |