|
Prev: static typing
Next: Java is going to have closures.
From: verec on 12 Aug 2006 14:38 Hi. I have been challenged to "port" to CL the following (for now quite inaccurate) approximation of pi/2. Depending on the speed outcome, CL may get considered for the project that will be havy on mumeric computations. C++ code #include <iostream> #include <cmath> #include <ctime> static double piOver2() { double sum = std::exp(-std::pow(-10.0, 2.0)) ; sum += std::exp(-std::pow(10.0, 2)) ; for (double x = -9.99 ; x < 10.0 ; x += 0.01) { double v = std::exp(-std::pow(x, 2.0)) ; sum += v + v ; } return sum * 0.005 ; } 10,000 iterations take about 3s. I then went through to java (replacing "std::" with "Math.") and this takes about 9s. I finally coded the Lisp version (defun pi/2 () (do ((sum (+ (exp (- (expt -10.0 2.0))) (exp (- (expt 10.0 2.0))))) (x -9.99 (+ x 0.01))) ((>= x 10.0) (* sum 0.005)) (incf sum (* 2 (exp (- (expt x 2.0))))))) But I just can't tell how long 10,000 iterations would take because I lost patience after 2 minutes. So I started inserting a few (declare) and (the float) here and there: (defun pi/2-opt () (declare (optimize (safety 0) (debug 0) (speed 3))) (declare (float sum)) (declare (float x)) (do ((sum (+ (exp (- (expt -10.0 2.0))) (exp (- (expt 10.0 2.0))))) (x -9.99 (+ x 0.01))) ((>= x 10.0) (* sum 0.005)) (incf sum (the float (* 2 (exp (- (expt x 2.0)))))))) CL-USER 54 > (time (dotimes (i 10000) (pi/2-opt))) Timing the evaluation of (DOTIMES (I 10000) (PI/2-OPT)) user time = 97.631 system time = 2.084 Elapsed time = 0:02:05 Allocation = 4877356064 bytes 0 Page faults Calls to %EVAL 120050 NIL At nearly 30 times slower, I just don't stand a chance. I can probably push the high-level, DSL and other Lisp features at around 10x or 5x the speed penalty, but not at 30x. What did I misunderstand about the use of (declare (optimize ... and (the float ( ... or anything else that might affect the speed so adversely? Just in case this isn't clear, the point is not about getting pi/2 but getting numeric speed for computations of complexity similar to that used in this approximation of pi/2. Many thanks -- JFB
From: John Thingstad on 12 Aug 2006 15:12 On Sat, 12 Aug 2006 20:38:26 +0200, verec <verec(a)mac.com> wrote: I get: Error: The value #C(1.1094963563118383E-43 2.7065338742349504E-57) does not satisfy the type specifier FLOAT. Perhaps that gives you a hint. > > (defun pi/2-opt () > (declare (optimize (safety 0) (debug 0) (speed 3))) > (declare (float sum)) > (declare (float x)) > (do ((sum > (+ (exp (- (expt -10.0 2.0))) > (exp (- (expt 10.0 2.0))))) > (x -9.99 (+ x 0.01))) > ((>= x 10.0) (* sum 0.005)) > (incf sum (the float (* 2 (exp (- (expt x 2.0)))))))) > > CL-USER 54 > (time (dotimes (i 10000) (pi/2-opt))) > Timing the evaluation of (DOTIMES (I 10000) (PI/2-OPT)) > > user time = 97.631 > system time = 2.084 > Elapsed time = 0:02:05 > Allocation = 4877356064 bytes > 0 Page faults > Calls to %EVAL 120050 > NIL > > At nearly 30 times slower, I just don't stand a chance. > I can probably push the high-level, DSL and other Lisp features > at around 10x or 5x the speed penalty, but not at 30x. > > What did I misunderstand about the use of (declare (optimize ... > and (the float ( ... or anything else that might affect the > speed so adversely? > > Just in case this isn't clear, the point is not about getting > pi/2 but getting numeric speed for computations of complexity > similar to that used in this approximation of pi/2. > > Many thanks > -- > JFB > -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
From: verec on 12 Aug 2006 15:18 On 2006-08-12 20:12:06 +0100, "John Thingstad" <john.thingstad(a)chello.no> said: > On Sat, 12 Aug 2006 20:38:26 +0200, verec <verec(a)mac.com> wrote: > > I get: Error: The value #C(1.1094963563118383E-43 2.7065338742349504E-57 ) > does not satisfy the type specifier FLOAT. > > Perhaps that gives you a hint. Hmmm... CL-USER 55 > (pi/2-opt) #C(1.7724538 -7.747651E-8) CL-USER 56 > Are type-casts supposed to be implementation dependent? I thought (erroneously?) that (the float XYZ) would just drop the imaginary part, precisely because that's the purpose of a cast? Or should I use (coerce ... in addition? FYI, I'm using LispWorks 4.4.6 PE on a G4. Many thanks -- JFB >> (defun pi/2-opt () >> (declare (optimize (safety 0) (debug 0) (speed 3))) >> (declare (float sum)) >> (declare (float x)) >> (do ((sum >> (+ (exp (- (expt -10.0 2.0))) >> (exp (- (expt 10.0 2.0))))) >> (x -9.99 (+ x 0.01))) >> ((>= x 10.0) (* sum 0.005)) >> (incf sum (the float (* 2 (exp (- (expt x 2.0)))))))) >> >> CL-USER 54 > (time (dotimes (i 10000) (pi/2-opt))) >> Timing the evaluation of (DOTIMES (I 10000) (PI/2-OPT)) >> >> user time = 97.631 >> system time = 2.084 >> Elapsed time = 0:02:05 >> Allocation = 4877356064 bytes >> 0 Page faults >> Calls to %EVAL 120050 >> NIL >> >> At nearly 30 times slower, I just don't stand a chance. >> I can probably push the high-level, DSL and other Lisp features >> at around 10x or 5x the speed penalty, but not at 30x. >> >> What did I misunderstand about the use of (declare (optimize ... >> and (the float ( ... or anything else that might affect the >> speed so adversely? >> >> Just in case this isn't clear, the point is not about getting >> pi/2 but getting numeric speed for computations of complexity >> similar to that used in this approximation of pi/2.
From: pkhuong on 12 Aug 2006 15:33 verec wrote: > Hi. > > I have been challenged to "port" to CL the following (for now > quite inaccurate) approximation of pi/2. Depending on the > speed outcome, CL may get considered for the project that > will be havy on mumeric computations. > > C++ code > #include <iostream> > #include <cmath> > #include <ctime> > > static double piOver2() { > double sum = std::exp(-std::pow(-10.0, 2.0)) ; > sum += std::exp(-std::pow(10.0, 2)) ; > > for (double x = -9.99 ; x < 10.0 ; x += 0.01) { > double v = std::exp(-std::pow(x, 2.0)) ; > sum += v + v ; > } > > return sum * 0.005 ; > } > > 10,000 iterations take about 3s. > > I then went through to java (replacing "std::" with "Math.") > and this takes about 9s. > > I finally coded the Lisp version > > (defun pi/2 () > (do ((sum > (+ (exp (- (expt -10.0 2.0))) > (exp (- (expt 10.0 2.0))))) > (x -9.99 (+ x 0.01))) > ((>= x 10.0) (* sum 0.005)) > (incf sum (* 2 (exp (- (expt x 2.0))))))) > > But I just can't tell how long 10,000 iterations would > take because I lost patience after 2 minutes. > > So I started inserting a few (declare) and (the float) > here and there: > > (defun pi/2-opt () > (declare (optimize (safety 0) (debug 0) (speed 3))) > (declare (float sum)) > (declare (float x)) > (do ((sum > (+ (exp (- (expt -10.0 2.0))) > (exp (- (expt 10.0 2.0))))) > (x -9.99 (+ x 0.01))) > ((>= x 10.0) (* sum 0.005)) > (incf sum (the float (* 2 (exp (- (expt x 2.0)))))))) > > CL-USER 54 > (time (dotimes (i 10000) (pi/2-opt))) > Timing the evaluation of (DOTIMES (I 10000) (PI/2-OPT)) > > user time = 97.631 > system time = 2.084 > Elapsed time = 0:02:05 > Allocation = 4877356064 bytes > 0 Page faults > Calls to %EVAL 120050 > NIL > > At nearly 30 times slower, I just don't stand a chance. > I can probably push the high-level, DSL and other Lisp features > at around 10x or 5x the speed penalty, but not at 30x. > > What did I misunderstand about the use of (declare (optimize ... > and (the float ( ... or anything else that might affect the > speed so adversely? 1. Note that the call to TIME also tells you that your function was interpreted, not compiled. 2. I pasted your code in SBCL's REPL. Here's what I got: ; caught WARNING: ; These variables are undefined: ; SUM X The declares must be in the scope of the declared variables, like so: (read the standard to know where to put your declarations) (defun pi/2-opt () (declare (optimize (safety 0) (debug 0) (speed 3))) (do ((sum (+ (exp (- (expt -10.0 2.0))) (exp (- (expt 10.0 2.0))))) (x -9.99 (+ x 0.01))) ((>= x 10.0) (* sum 0.005)) (declare (type single-float x sum)) (incf sum (the float (* 2 (exp (- (expt x 2.0)))))))) 10000 iterations take ~2.9 seconds on a P-M 1.6 GHz. 3. I don't know which compiler you intend to use, but python is a bit moronic when it comes to side-effected variables. Here's a side-effect-free version that gives the same answer (but as doubles, not single floats). In this case, 10000 iterations also take ~2.9 seconds, but I prefer this version :) (defun pi/2 () (declare (optimize (speed 3) (safety 0))) (labels ((inner (x acc) (declare (type double-float x acc)) (if (>= x 10d0) acc (inner (+ 0.01 x) (+ acc (* 2 (exp (- (expt x 2))))))))) (* 0.005d0 (inner -9.99d0 (+ (exp (- (expt -10.0d0 2))) (exp (- (expt 10.0d0 2)))))))) 4. It is sometimes useful to compile different versions and then examine their assembly with DISASSEMBLE. For heavily numeric code, one doesn't have to understand the innards of the CL implementation, only normal x86 + x87. Good Luck, Paul Khuong
From: verec on 12 Aug 2006 16:07
On 2006-08-12 20:33:51 +0100, pkhuong(a)gmail.com said: >> (defun pi/2-opt () >> (declare (optimize (safety 0) (debug 0) (speed 3))) >> (declare (float sum)) >> (declare (float x)) >> (do ((sum >> (+ (exp (- (expt -10.0 2.0))) >> (exp (- (expt 10.0 2.0))))) >> (x -9.99 (+ x 0.01))) >> ((>= x 10.0) (* sum 0.005)) >> (incf sum (the float (* 2 (exp (- (expt x 2.0)))))))) > 1. Note that the call to TIME also tells you that your function was > interpreted, not compiled. CL-USER 58 > (compile 'pi/2-opt) ;;;*** Warning in PI/2-OPT: The definition of PI/2-OPT is already compiled. PI/2-OPT ((PI/2-OPT #<STYLE-WARNING 100B2BDF>)) NIL It already was compiled :-( > 2. I pasted your code in SBCL's REPL. Here's what I got: > ; caught WARNING: > ; These variables are undefined: > ; SUM X > The declares must be in the scope of the declared variables, like so: > > (read the standard to know where to put your declarations) > > (defun pi/2-opt () > (declare (optimize (safety 0) (debug 0) (speed 3))) > (do ((sum > (+ (exp (- (expt -10.0 2.0))) > (exp (- (expt 10.0 2.0))))) > (x -9.99 (+ x 0.01))) > ((>= x 10.0) (* sum 0.005)) > (declare (type single-float x sum)) > (incf sum (the float (* 2 (exp (- (expt x 2.0)))))))) Excellent. Thanks. I tried. CL-USER 59 > (pi/2-opt) 1.3412517E-31 Ooops. definitely incorrect. So I rewrote the declare part as: (declare (type float x sum)) CL-USER 60 > (pi/2-opt) #C(1.7724538 -7.747651E-8) OK. Back in the game :-) CL-USER 61 > (time (dotimes (i 10000) (pi/2-opt))) Timing the evaluation of (DOTIMES (I 10000) (PI/2-OPT)) user time = 99.182 "-( > > 10000 iterations take ~2.9 seconds on a P-M 1.6 GHz. > > 3. I don't know which compiler you intend to use, but python is a bit > moronic when it comes to side-effected variables. Here's a > side-effect-free version that gives the same answer (but as doubles, > not single floats). In this case, 10000 iterations also take ~2.9 > seconds, but I prefer this version :) > > (defun pi/2 () > (declare (optimize (speed 3) (safety 0))) > (labels ((inner (x acc) > (declare (type double-float x acc)) > (if (>= x 10d0) > acc > (inner (+ 0.01 x) > (+ acc (* 2 (exp (- (expt x 2))))))))) > (* 0.005d0 (inner -9.99d0 > (+ (exp (- (expt -10.0d0 2))) > (exp (- (expt 10.0d0 2)))))))) CL-USER 63 > (pi/2-PK) 1.7724538905229454D0 Great. Correct too. CL-USER 64 > (time (dotimes (i 10000) (pi/2-PK))) Timing the evaluation of (DOTIMES (I 10000) (PI/2-PK)) user time = 43.500 Your version definitevly halves the time required! That's execellent, but nowhere near the speed you quote for your compiler/OS combo. > 4. It is sometimes useful to compile different versions and then > examine their assembly with DISASSEMBLE. For heavily numeric code, one > doesn't have to understand the innards of the CL implementation, only > normal x86 + x87. I had tried to (disassemble 'pi/2-opt) but nearly fainted at the sight of 104 lines of assembly :-( > Good Luck, Thank you very much! The lesson here for me is that I should first try another compiler, as I just can't see why a copy and paste of your version would run 15 times slower on my setup. (LW 4.4.6 on a G4 @1.5GHz ... the fact that the disassembly shows 104 lines is probably a good hint that no optimization is applied at all in the PE of LW?) > Paul Khuong -- JFB |