From: verec on
Hi.

I have been challenged to "port" to CL the following (for now
quite inaccurate) approximation of pi/2. Depending on the
speed outcome, CL may get considered for the project that
will be havy on mumeric computations.

C++ code
#include <iostream>
#include <cmath>
#include <ctime>

static double piOver2() {
double sum = std::exp(-std::pow(-10.0, 2.0)) ;
sum += std::exp(-std::pow(10.0, 2)) ;

for (double x = -9.99 ; x < 10.0 ; x += 0.01) {
double v = std::exp(-std::pow(x, 2.0)) ;
sum += v + v ;
}

return sum * 0.005 ;
}

10,000 iterations take about 3s.

I then went through to java (replacing "std::" with "Math.")
and this takes about 9s.

I finally coded the Lisp version

(defun pi/2 ()
(do ((sum
(+ (exp (- (expt -10.0 2.0)))
(exp (- (expt 10.0 2.0)))))
(x -9.99 (+ x 0.01)))
((>= x 10.0) (* sum 0.005))
(incf sum (* 2 (exp (- (expt x 2.0)))))))

But I just can't tell how long 10,000 iterations would
take because I lost patience after 2 minutes.

So I started inserting a few (declare) and (the float)
here and there:

(defun pi/2-opt ()
(declare (optimize (safety 0) (debug 0) (speed 3)))
(declare (float sum))
(declare (float x))
(do ((sum
(+ (exp (- (expt -10.0 2.0)))
(exp (- (expt 10.0 2.0)))))
(x -9.99 (+ x 0.01)))
((>= x 10.0) (* sum 0.005))
(incf sum (the float (* 2 (exp (- (expt x 2.0))))))))

CL-USER 54 > (time (dotimes (i 10000) (pi/2-opt)))
Timing the evaluation of (DOTIMES (I 10000) (PI/2-OPT))

user time = 97.631
system time = 2.084
Elapsed time = 0:02:05
Allocation = 4877356064 bytes
0 Page faults
Calls to %EVAL 120050
NIL

At nearly 30 times slower, I just don't stand a chance.
I can probably push the high-level, DSL and other Lisp features
at around 10x or 5x the speed penalty, but not at 30x.

What did I misunderstand about the use of (declare (optimize ...
and (the float ( ... or anything else that might affect the
speed so adversely?

Just in case this isn't clear, the point is not about getting
pi/2 but getting numeric speed for computations of complexity
similar to that used in this approximation of pi/2.

Many thanks
--
JFB

From: John Thingstad on
On Sat, 12 Aug 2006 20:38:26 +0200, verec <verec(a)mac.com> wrote:

I get: Error: The value #C(1.1094963563118383E-43 2.7065338742349504E-57)
does not satisfy the type specifier FLOAT.

Perhaps that gives you a hint.

>
> (defun pi/2-opt ()
> (declare (optimize (safety 0) (debug 0) (speed 3)))
> (declare (float sum))
> (declare (float x))
> (do ((sum
> (+ (exp (- (expt -10.0 2.0)))
> (exp (- (expt 10.0 2.0)))))
> (x -9.99 (+ x 0.01)))
> ((>= x 10.0) (* sum 0.005))
> (incf sum (the float (* 2 (exp (- (expt x 2.0))))))))
>
> CL-USER 54 > (time (dotimes (i 10000) (pi/2-opt)))
> Timing the evaluation of (DOTIMES (I 10000) (PI/2-OPT))
>
> user time = 97.631
> system time = 2.084
> Elapsed time = 0:02:05
> Allocation = 4877356064 bytes
> 0 Page faults
> Calls to %EVAL 120050
> NIL
>
> At nearly 30 times slower, I just don't stand a chance.
> I can probably push the high-level, DSL and other Lisp features
> at around 10x or 5x the speed penalty, but not at 30x.
>
> What did I misunderstand about the use of (declare (optimize ...
> and (the float ( ... or anything else that might affect the
> speed so adversely?
>
> Just in case this isn't clear, the point is not about getting
> pi/2 but getting numeric speed for computations of complexity
> similar to that used in this approximation of pi/2.
>
> Many thanks
> --
> JFB
>


--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
From: verec on
On 2006-08-12 20:12:06 +0100, "John Thingstad" <john.thingstad(a)chello.no> said:

> On Sat, 12 Aug 2006 20:38:26 +0200, verec <verec(a)mac.com> wrote:
>
> I get: Error: The value #C(1.1094963563118383E-43 2.7065338742349504E-57 )
> does not satisfy the type specifier FLOAT.
>
> Perhaps that gives you a hint.

Hmmm...

CL-USER 55 > (pi/2-opt)
#C(1.7724538 -7.747651E-8)

CL-USER 56 >

Are type-casts supposed to be implementation dependent?

I thought (erroneously?) that (the float XYZ) would
just drop the imaginary part, precisely because that's
the purpose of a cast? Or should I use (coerce ... in
addition?

FYI, I'm using LispWorks 4.4.6 PE on a G4.

Many thanks
--
JFB

>> (defun pi/2-opt ()
>> (declare (optimize (safety 0) (debug 0) (speed 3)))
>> (declare (float sum))
>> (declare (float x))
>> (do ((sum
>> (+ (exp (- (expt -10.0 2.0)))
>> (exp (- (expt 10.0 2.0)))))
>> (x -9.99 (+ x 0.01)))
>> ((>= x 10.0) (* sum 0.005))
>> (incf sum (the float (* 2 (exp (- (expt x 2.0))))))))
>>
>> CL-USER 54 > (time (dotimes (i 10000) (pi/2-opt)))
>> Timing the evaluation of (DOTIMES (I 10000) (PI/2-OPT))
>>
>> user time = 97.631
>> system time = 2.084
>> Elapsed time = 0:02:05
>> Allocation = 4877356064 bytes
>> 0 Page faults
>> Calls to %EVAL 120050
>> NIL
>>
>> At nearly 30 times slower, I just don't stand a chance.
>> I can probably push the high-level, DSL and other Lisp features
>> at around 10x or 5x the speed penalty, but not at 30x.
>>
>> What did I misunderstand about the use of (declare (optimize ...
>> and (the float ( ... or anything else that might affect the
>> speed so adversely?
>>
>> Just in case this isn't clear, the point is not about getting
>> pi/2 but getting numeric speed for computations of complexity
>> similar to that used in this approximation of pi/2.


From: pkhuong on
verec wrote:
> Hi.
>
> I have been challenged to "port" to CL the following (for now
> quite inaccurate) approximation of pi/2. Depending on the
> speed outcome, CL may get considered for the project that
> will be havy on mumeric computations.
>
> C++ code
> #include <iostream>
> #include <cmath>
> #include <ctime>
>
> static double piOver2() {
> double sum = std::exp(-std::pow(-10.0, 2.0)) ;
> sum += std::exp(-std::pow(10.0, 2)) ;
>
> for (double x = -9.99 ; x < 10.0 ; x += 0.01) {
> double v = std::exp(-std::pow(x, 2.0)) ;
> sum += v + v ;
> }
>
> return sum * 0.005 ;
> }
>
> 10,000 iterations take about 3s.
>
> I then went through to java (replacing "std::" with "Math.")
> and this takes about 9s.
>
> I finally coded the Lisp version
>
> (defun pi/2 ()
> (do ((sum
> (+ (exp (- (expt -10.0 2.0)))
> (exp (- (expt 10.0 2.0)))))
> (x -9.99 (+ x 0.01)))
> ((>= x 10.0) (* sum 0.005))
> (incf sum (* 2 (exp (- (expt x 2.0)))))))
>
> But I just can't tell how long 10,000 iterations would
> take because I lost patience after 2 minutes.
>
> So I started inserting a few (declare) and (the float)
> here and there:
>
> (defun pi/2-opt ()
> (declare (optimize (safety 0) (debug 0) (speed 3)))
> (declare (float sum))
> (declare (float x))
> (do ((sum
> (+ (exp (- (expt -10.0 2.0)))
> (exp (- (expt 10.0 2.0)))))
> (x -9.99 (+ x 0.01)))
> ((>= x 10.0) (* sum 0.005))
> (incf sum (the float (* 2 (exp (- (expt x 2.0))))))))
>
> CL-USER 54 > (time (dotimes (i 10000) (pi/2-opt)))
> Timing the evaluation of (DOTIMES (I 10000) (PI/2-OPT))
>
> user time = 97.631
> system time = 2.084
> Elapsed time = 0:02:05
> Allocation = 4877356064 bytes
> 0 Page faults
> Calls to %EVAL 120050
> NIL
>
> At nearly 30 times slower, I just don't stand a chance.
> I can probably push the high-level, DSL and other Lisp features
> at around 10x or 5x the speed penalty, but not at 30x.
>
> What did I misunderstand about the use of (declare (optimize ...
> and (the float ( ... or anything else that might affect the
> speed so adversely?

1. Note that the call to TIME also tells you that your function was
interpreted, not compiled.

2. I pasted your code in SBCL's REPL. Here's what I got:
; caught WARNING:
; These variables are undefined:
; SUM X
The declares must be in the scope of the declared variables, like so:

(read the standard to know where to put your declarations)

(defun pi/2-opt ()
(declare (optimize (safety 0) (debug 0) (speed 3)))
(do ((sum
(+ (exp (- (expt -10.0 2.0)))
(exp (- (expt 10.0 2.0)))))
(x -9.99 (+ x 0.01)))
((>= x 10.0) (* sum 0.005))
(declare (type single-float x sum))
(incf sum (the float (* 2 (exp (- (expt x 2.0))))))))

10000 iterations take ~2.9 seconds on a P-M 1.6 GHz.

3. I don't know which compiler you intend to use, but python is a bit
moronic when it comes to side-effected variables. Here's a
side-effect-free version that gives the same answer (but as doubles,
not single floats). In this case, 10000 iterations also take ~2.9
seconds, but I prefer this version :)

(defun pi/2 ()
(declare (optimize (speed 3) (safety 0)))
(labels ((inner (x acc)
(declare (type double-float x acc))
(if (>= x 10d0)
acc
(inner (+ 0.01 x)
(+ acc (* 2 (exp (- (expt x 2)))))))))
(* 0.005d0 (inner -9.99d0
(+ (exp (- (expt -10.0d0 2)))
(exp (- (expt 10.0d0 2))))))))

4. It is sometimes useful to compile different versions and then
examine their assembly with DISASSEMBLE. For heavily numeric code, one
doesn't have to understand the innards of the CL implementation, only
normal x86 + x87.

Good Luck,
Paul Khuong

From: verec on
On 2006-08-12 20:33:51 +0100, pkhuong(a)gmail.com said:

>> (defun pi/2-opt ()
>> (declare (optimize (safety 0) (debug 0) (speed 3)))
>> (declare (float sum))
>> (declare (float x))
>> (do ((sum
>> (+ (exp (- (expt -10.0 2.0)))
>> (exp (- (expt 10.0 2.0)))))
>> (x -9.99 (+ x 0.01)))
>> ((>= x 10.0) (* sum 0.005))
>> (incf sum (the float (* 2 (exp (- (expt x 2.0))))))))
> 1. Note that the call to TIME also tells you that your function was
> interpreted, not compiled.

CL-USER 58 > (compile 'pi/2-opt)
;;;*** Warning in PI/2-OPT: The definition of PI/2-OPT is already compiled.
PI/2-OPT
((PI/2-OPT #<STYLE-WARNING 100B2BDF>))
NIL

It already was compiled :-(

> 2. I pasted your code in SBCL's REPL. Here's what I got:
> ; caught WARNING:
> ; These variables are undefined:
> ; SUM X
> The declares must be in the scope of the declared variables, like so:
>
> (read the standard to know where to put your declarations)
>
> (defun pi/2-opt ()
> (declare (optimize (safety 0) (debug 0) (speed 3)))
> (do ((sum
> (+ (exp (- (expt -10.0 2.0)))
> (exp (- (expt 10.0 2.0)))))
> (x -9.99 (+ x 0.01)))
> ((>= x 10.0) (* sum 0.005))
> (declare (type single-float x sum))
> (incf sum (the float (* 2 (exp (- (expt x 2.0))))))))

Excellent. Thanks. I tried.

CL-USER 59 > (pi/2-opt)
1.3412517E-31

Ooops. definitely incorrect. So I rewrote the declare part
as:

(declare (type float x sum))

CL-USER 60 > (pi/2-opt)
#C(1.7724538 -7.747651E-8)

OK. Back in the game :-)

CL-USER 61 > (time (dotimes (i 10000) (pi/2-opt)))
Timing the evaluation of (DOTIMES (I 10000) (PI/2-OPT))

user time = 99.182

"-(

>
> 10000 iterations take ~2.9 seconds on a P-M 1.6 GHz.
>
> 3. I don't know which compiler you intend to use, but python is a bit
> moronic when it comes to side-effected variables. Here's a
> side-effect-free version that gives the same answer (but as doubles,
> not single floats). In this case, 10000 iterations also take ~2.9
> seconds, but I prefer this version :)
>
> (defun pi/2 ()
> (declare (optimize (speed 3) (safety 0)))
> (labels ((inner (x acc)
> (declare (type double-float x acc))
> (if (>= x 10d0)
> acc
> (inner (+ 0.01 x)
> (+ acc (* 2 (exp (- (expt x 2)))))))))
> (* 0.005d0 (inner -9.99d0
> (+ (exp (- (expt -10.0d0 2)))
> (exp (- (expt 10.0d0 2))))))))

CL-USER 63 > (pi/2-PK)
1.7724538905229454D0

Great. Correct too.

CL-USER 64 > (time (dotimes (i 10000) (pi/2-PK)))
Timing the evaluation of (DOTIMES (I 10000) (PI/2-PK))

user time = 43.500

Your version definitevly halves the time required!
That's execellent, but nowhere near the speed you
quote for your compiler/OS combo.

> 4. It is sometimes useful to compile different versions and then
> examine their assembly with DISASSEMBLE. For heavily numeric code, one
> doesn't have to understand the innards of the CL implementation, only
> normal x86 + x87.

I had tried to (disassemble 'pi/2-opt) but nearly fainted at the
sight of 104 lines of assembly :-(

> Good Luck,

Thank you very much! The lesson here for me is that I should
first try another compiler, as I just can't see why a copy and
paste of your version would run 15 times slower on my setup.

(LW 4.4.6 on a G4 @1.5GHz ... the fact that the disassembly shows
104 lines is probably a good hint that no optimization is applied
at all in the PE of LW?)

> Paul Khuong
--
JFB

 |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8
Prev: static typing
Next: Java is going to have closures.