From: Ron Garret on
In article
<fded6479-a152-4f50-a7da-68706f9d6f42(a)m16g2000yqc.googlegroups.com>,
Günther Thomsen <guenthert(a)gmail.com> wrote:

> On Jan 17, 8:19 am, Ron Garret <rNOSPA...(a)flownet.com> wrote:
> > In article
> > <98666562-db92-4a47-849d-50bf2c3d1...(a)b2g2000yqi.googlegroups.com>,
> >
> >
> >
> >  piscesboy <oraclmas...(a)gmail.com> wrote:
> > > On Jan 17, 9:32 am, Tamas K Papp <tkp...(a)gmail.com> wrote:
> > > > On Sun, 17 Jan 2010 13:38:57 +0200, Captain Obvious wrote:
> > > > > RU> as the key.  It conses on every hash lookup, but if that's
> > > > >  RU> acceptable...
> >
> > > > > It is pretty easy to avoid consing, in single threaded case, at least.
> >
> > > > > (let ((the-cons (cons nil nil)))
> > > > >   (defun lookup-compound-key (ht k1 k2)
> > > > >     (setf (car the-cons) k1
> > > > >             (cdr the-cons) k2)
> > > > >     (prog1
> > > > >            (gethash the-cons ht)
> > > > >            (setf (car the-cons) nil
> > > > >                    (cdr the-cons) nil))))
> >
> > > > > Thread-safe solution would probably require thread-local variables.
> >
> > > > Dude, it is still mid-January, but you are very likely to win the
> > > > "Optimization of the Year Award" in 2010: it is ugly, and results in a
> > > > much longer runtime.
> >
> > > > (let ((the-cons (cons nil nil)))
> > > >   (defun ugly-lookup (ht k1 k2)
> > > >     (setf (car the-cons) k1
> > > >           (cdr the-cons) k2)
> > > >     (prog1
> > > >         (gethash the-cons ht)
> > > >       (setf (car the-cons) nil
> > > >             (cdr the-cons) nil))))
> >
> > > > (defun lookup (ht k1 k2)
> > > >   (gethash (cons k1 k2) ht))
> >
> > > > (defparameter *ht* (make-hash-table :test #'equal))
> > > > (setf (gethash (cons 5 10) *ht*) 5)
> >
> > > > (time
> > > >  (dotimes (i 10000000)
> > > >    (lookup *ht* 5 10)))        
> >
> > > > Evaluation took:
> > > >   0.923 seconds of real time
> > > >   0.920058 seconds of total run time (0.916058 user, 0.004000 system)
> > > >   [ Run times consist of 0.020 seconds GC time, and 0.901 seconds non-GC
> > > > time. ]
> > > >   99.67% CPU
> > > >   2,330,460,770 processor cycles
> > > >   80,068,320 bytes consed
> >
> > > > (time
> > > >  (dotimes (i 10000000)
> > > >    (ugly-lookup *ht* 5 10)))
> >
> > > > Evaluation took:
> > > >   3.464 seconds of real time
> > > >   3.452216 seconds of total run time (3.452216 user, 0.000000 system)
> > > >   99.65% CPU
> > > >   8,753,522,082 processor cycles
> > > >   354,272 bytes consed
> >
> > > > I really wish that people who post "optimized" CL code would at least
> > > > benchmark it.
> >
> > > > Tamas
> >
> > > Strange. The ugly cons lookup had 354,272 bytes consed, yet took 3.464
> > > seconds to execute. But the pretty lookup had 80,068,320 bytes consed
> > > and took fewer processor cycles in 0.923 seconds.
> >
> > > The ugly cons had fewer bytes consed than the pretty cons...?
> >
> > One of the many reasons it is good to write code in a functional style
> > is that it allows some important memory-management optimizations.  A
> > good generational GC can be very efficient even for code that conses a
> > lot as long as the code is purely functional.  But as soon as you do a
> > SETF, the assumptions that underlie those optimizations break and the GC
> > has to fall back on less efficient techniques.  That is why code that
> > conses more but is written in a purely functional style can often be
> > faster than code that conses less but mutates data.
> >
> > rg
>
> True, but also GC can be much faster if there is plenty of available
> RAM. Under memory pressure GC might not fare quite as well. I'd think
> it's a safe bet, that Tamas' host had much more than 80,068,320B
> available. Actually it isn't obvious, whether there was any GC
> activity at all (clearly the best possible outcome ;-) .

Certainly. But it's an even safer bet that his machine had more than
354,272 bytes available.

rg
From: Tamas K Papp on
On Sun, 17 Jan 2010 23:37:26 +0200, Captain Obvious wrote:

> TKP> I really wish that people who post "optimized" CL code would at
> least
> TKP> benchmark it.
>
> I did not say it is faster, I said that it does not cons. :) And your
> benchmark pretty much proved it. Think this is merely an excersise
> rather than an optimization.
>
> If one really wants to optimize something, he should try it with his
> concrete application, implementation and data. And he should probably
> try different appoaches, as it is too complex to say what is faster
> theoretically.
>
> Now, if we want compare performance, I would change ugly-lookup a bit:
>
> (let ((the-cons (cons nil nil)))
> (defun ugly-lookup (ht k1 k2)
> (setf (car the-cons) k1
> (cdr the-cons) k2)
> (gethash the-cons ht)))
>
> That is, if you do shitloads of lookups, you don't need to bother
> cleaning up the cons.
>
> Now let's compare:
>
> CL-USER> (time
> (dotimes (i 10000000)
> (ugly-lookup *ht* 5 10)))
> Evaluation took:
> 1.83 seconds of real time
> 1.824114 seconds of user run time
> 0.004 seconds of system run time
> 0 calls to %EVAL
> 0 page faults and
> 0 bytes consed.
> NIL
>
> CL-USER> (time
> (dotimes (i 10000000)
> (lookup *ht* 5 10)))
> Evaluation took:
> 1.974 seconds of real time
> 1.92012 seconds of user run time
> 0.056003 seconds of system run time
> [Run times include 0.02 seconds GC run time.] 0 calls to %EVAL
> 0 page faults and
> 80,001,544 bytes consed.
>
> Now my version is faster by whopping 7%!

Nope. If you do a shitload of lookups, you just inline the bloody
thing. For the sake of fairness (haha, see below), let's inline both
versions. I evaluated this in SBSL:

(declaim (optimize speed)
(inline lookup ugly-lookup))
(let ((the-cons (cons nil nil)))
(defun ugly-lookup (ht k1 k2)
(setf (car the-cons) k1
(cdr the-cons) k2)
(gethash the-cons ht)))

(defun lookup (ht k1 k2)
(gethash (cons k1 k2) ht))

(defparameter *ht* (make-hash-table :test #'equal))
(setf (gethash (cons 5 10) *ht*) 5)

(defun test1 ()
(time
(dotimes (i 10000000)
(lookup *ht* 5 10))))

(defun test2 ()
(time
(dotimes (i 10000000)
(ugly-lookup *ht* 5 10))))

(time (test1))
(time (test2))

I compiled everything of course. And the result is:

Evaluation took:
0.024 seconds of real time
0.028001 seconds of total run time (0.024001 user, 0.004000 system)
116.67% CPU
61,855,802 processor cycles
29,064 bytes consed

Evaluation took:
2.067 seconds of real time
2.068129 seconds of total run time (2.068129 user, 0.000000 system)
100.05% CPU
5,222,531,767 processor cycles
204,888 bytes consed

So lookup is 100x faster (!) than ugly lookup (to be fair, inlining
made the ugly version about 25% slower on my machine, but that is a
small difference). And it conses very little. Why is this? SBCL
tells you that

; in: LET ((THE-CONS (CONS NIL NIL)))
; (LET ((THE-CONS (CONS NIL NIL)))
; (DEFUN UGLY-LOOKUP (HT K1 K2)
; (SETF (CAR THE-CONS) K1
; (CDR THE-CONS) K2)
; (GETHASH THE-CONS HT)))
;
; note: lexical environment too hairy, can't inline DEFUN UGLY-LOOKUP

So there are a few lessons in here:

1) If you want speed, ask the compiler first. (declaim (optimize
speed)) gave me roughly 50% speedup on for both versions.

2) If you want more speed, inline simple stuff.

3) Note that 1) and 2) didn't ask you for any hairy, unreadable code,
just a few simple declarations. So keep your code simple. People who
have developed your compiler optimized for common cases. Don't try to
second-guess your compiler without benchmarking.

I have been working on a numerical library during the weekend, giving
it some final polish. I spend Saturday afternoon ripping out the
"optimized" code I put in earlier, it turned out that my "clever"
hacks were actually quite stupid: little or no speed gain, compromised
maintainability. So I am very skeptical of hacks like you posted: the
usually don't improve anything, rather the opposite. That said, I am
also very sympathetic, I do stuff that is 10x more stupid sometimes.

Cheers,

Tamas
From: Tamas K Papp on
On Sun, 17 Jan 2010 21:15:56 -0800, Ron Garret wrote:

> In article
> <fded6479-a152-4f50-a7da-68706f9d6f42(a)m16g2000yqc.googlegroups.com>,
> Günther Thomsen <guenthert(a)gmail.com> wrote:
>
>> True, but also GC can be much faster if there is plenty of available
>> RAM. Under memory pressure GC might not fare quite as well. I'd think
>> it's a safe bet, that Tamas' host had much more than 80,068,320B
>> available. Actually it isn't obvious, whether there was any GC activity
>> at all (clearly the best possible outcome ;-) .
>
> Certainly. But it's an even safer bet that his machine had more than
> 354,272 bytes available.

As we all know, 640k should be enough :-)

Tamas


From: Frode V. Fjeld on
One approach I've tried is something like this:

(defun double-hash (table key1 key2)
(car (find-if (lambda (x)
(and (eq key1 (second x))
(eq key2 (third x))))
(gethash (logior (sxhash key1) (sxhash key2))
table))))

I.e. to hash over some combination of each key's sxhash (taking care not
to combine them so as to generate a bignum).

Hm.. I guess it can even scale neatly to variable arguments:

(defun multi-hash (table &rest keys)
(declare (dynamic-extent keys))
(car (find keys
(gethash (sxhash keys) table)
:test #'equal
:key #'cdr)))


The corresponding setters are left as an exercise to the reader. (And
the code above is untested, sketches really.)

I have no idea really how well this actually performs, I'd expect this
to vary quite a bit across lisps and the details of how this technique
is used.

--
Frode V. Fjeld
From: Robert Uhl on
"Frode V. Fjeld" <frode(a)netfonds.no> writes:
>
> I.e. to hash over some combination of each key's sxhash (taking care not
> to combine them so as to generate a bignum).

Note that SXHASH doesn't care about anything more than the top level of
a list:

cl-user> (sxhash '(1 2 3 (4)))
224526978
cl-user> (sxhash '(1 2 3 (4 5)))
224526978

--
Robert A. Uhl
Thesaurus: ancient reptile with an excellent vocabulary.
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6
Prev: SBCL optimization
Next: what do lispers think of bigtable?