From: Tamas K Papp on
On Thu, 15 Jul 2010 22:58:23 +0300, Captain Obvious wrote:

> But I'd say it is actually pretty fast -- library I'm using can do SVD
> of 17 seconds worth of a sound in 2 seconds on 2 GHz CPU. Faster than
> realtime.

I am just curious, which library are you using?

LLA currently uses LAPACK's divide-and-conquer algorithm for SVD by
default, which I found to be pretty fast, even for large matrices.
LAPACK's lower-level SVD functions also have other possible speedups,
if you only need some of the singular values/vectors etc. It is
really amazing how fast SVD is these days, nowadays I tend to use it
in places where I would have used other decompositions (QR, Cholesky)
before, it is so stable and robust.

Tamas
From: Captain Obvious on
??>> But I'd say it is actually pretty fast -- library I'm using can do SVD
??>> of 17 seconds worth of a sound in 2 seconds on 2 GHz CPU. Faster than
??>> realtime.

TKP> I am just curious, which library are you using?

PROPACK.
From the description:
"The software package PROPACK contains a set of functions for computing the
singular value decomposition of large and sparse or structured matrices. The
SVD routines are based on the Lanczos bidiagonalization algorithm with
partial reorthogonalization (BPRO). The Lanczos routines can also be used
directly, and form the basis of efficient algorithms for solving linear
systems of equations and linear least squares problems, in particular for
systems with multiple right-hand sides."

I was using it for large sparse matrices and it worked pretty well.

Now as for compression I have dense matrices perhaps I could switch to
another library, but I didn't bother, as I already have glue code to run
PROPACK and it works good enough.

TKP> LLA currently uses LAPACK's divide-and-conquer algorithm for SVD by
TKP> default, which I found to be pretty fast, even for large matrices.
TKP> LAPACK's lower-level SVD functions also have other possible speedups,
TKP> if you only need some of the singular values/vectors etc. It is
TKP> really amazing how fast SVD is these days, nowadays I tend to use it
TKP> in places where I would have used other decompositions (QR, Cholesky)
TKP> before, it is so stable and robust.

Can you please check how fast it will do, say, 256*2930 matrix?
I'd consider replacement to a thing I have now.

It's not like I think PROPACK is bad, just my glue code use files (!) to
communicate with it and writing to files is the most time consuming
operation.
And I guess it might be easier to swich to another library than to rewrite
glue code using FFI myself :).

From: Tamas K Papp on
On Fri, 16 Jul 2010 16:19:25 +0300, Captain Obvious wrote:

> ??>> But I'd say it is actually pretty fast -- library I'm using can do
> SVD
> ??>> of 17 seconds worth of a sound in 2 seconds on 2 GHz CPU. Faster
> than ??>> realtime.
>
> TKP> I am just curious, which library are you using?
>
> PROPACK.
> From the description:
> "The software package PROPACK contains a set of functions for computing
> the singular value decomposition of large and sparse or structured
> matrices. The SVD routines are based on the Lanczos bidiagonalization
> algorithm with partial reorthogonalization (BPRO). The Lanczos routines
> can also be used directly, and form the basis of efficient algorithms
> for solving linear systems of equations and linear least squares
> problems, in particular for systems with multiple right-hand sides."
>
> I was using it for large sparse matrices and it worked pretty well.
>
> Now as for compression I have dense matrices perhaps I could switch to
> another library, but I didn't bother, as I already have glue code to run
> PROPACK and it works good enough.
>
> TKP> LLA currently uses LAPACK's divide-and-conquer algorithm for SVD
> by TKP> default, which I found to be pretty fast, even for large
> matrices. TKP> LAPACK's lower-level SVD functions also have other
> possible speedups, TKP> if you only need some of the singular
> values/vectors etc. It is TKP> really amazing how fast SVD is these
> days, nowadays I tend to use it TKP> in places where I would have used
> other decompositions (QR, Cholesky) TKP> before, it is so stable and
> robust.
>
> Can you please check how fast it will do, say, 256*2930 matrix? I'd
> consider replacement to a thing I have now.
>
> It's not like I think PROPACK is bad, just my glue code use files (!) to
> communicate with it and writing to files is the most time consuming
> operation.
> And I guess it might be easier to swich to another library than to
> rewrite glue code using FFI myself :).

Sparse matrices are a different issue, the have special algorithms,
which are likely to be a bit faster than general ones.

I am not entirely sure what you need from the SVD (do you need both U
and V^T?), so I ran a whole lot of different benchmarks (included
below). Even if you want to stick with PROPACK for now, LLA has a set
of macros which make interfacing to FORTRAN a lot easier (see
linear-algebra.lisp for examples).

Here is the code I ran:

(in-package #:common-lisp-user)
(require :lla)

(defparameter *m* (lla:make-matrix 256 2930 :double
:initial-element (lambda ()
(random 100d0))))

(defmacro time* (form comment)
`(progn
(format t "Timing ~A (~A)" ',form ,comment)
(time ,form)))

(time* (lla:svd *m*) "singular values only")
(time* (lla:svd *m* :left :singular) "singular vectors from U")
(time* (lla:svd *m* :left :all) "all vectors from U")
(time* (lla:svd *m* :right :singular) "singular vectors from V^T")
(time* (lla:svd *m* :right :all) "all vectors from V^T")
(time* (lla:svd *m* :left :singular :right :singular) "singular vector from both U and V^T")

Here are the results (Dell Latitude laptop, Intel(R) Core(TM)2 Duo CPU
P8700 @ 2.53GHz, 3 MB cache, single thread):

Timing (SVD *M*) (singular values only)
Evaluation took:
0.648 seconds of real time
0.640000 seconds of total run time (0.640000 user, 0.000000 system)
98.77% CPU
1,637,341,188 processor cycles
18,037,008 bytes consed

Timing (SVD *M* LEFT SINGULAR) (singular vectors from U)
Evaluation took:
0.742 seconds of real time
0.740000 seconds of total run time (0.740000 user, 0.000000 system)
99.73% CPU
1,873,691,270 processor cycles
18,566,016 bytes consed

Timing (SVD *M* LEFT ALL) (all vectors from U)
Evaluation took:
0.747 seconds of real time
0.740000 seconds of total run time (0.730000 user, 0.010000 system)
99.06% CPU
1,889,345,019 processor cycles
18,560,688 bytes consed

Timing (SVD *M* RIGHT SINGULAR) (singular vectors from V^T)
Evaluation took:
1.595 seconds of real time
1.600000 seconds of total run time (1.600000 user, 0.000000 system)
100.31% CPU
4,030,383,369 processor cycles
24,028,528 bytes consed

Timing (SVD *M* RIGHT ALL) (all vectors from V^T)
Evaluation took:
8.161 seconds of real time
8.170000 seconds of total run time (8.130000 user, 0.040000 system)
100.11% CPU
20,623,094,542 processor cycles
86,801,424 bytes consed

Timing (SVD *M* LEFT SINGULAR RIGHT SINGULAR) (singular vector from both U and V^T)
Evaluation took:
1.656 seconds of real time
1.650000 seconds of total run time (1.650000 user, 0.000000 system)
99.64% CPU
4,185,266,126 processor cycles
24,551,440 bytes consed

Best,

Tamas
From: Captain Obvious on
TKP> Even if you want to stick with PROPACK for now, LLA has a set
TKP> of macros which make interfacing to FORTRAN a lot easier (see
TKP> linear-algebra.lisp for examples).

Cool, I'll check this if I return to huge sparse matrices case.

TKP> I am not entirely sure what you need from the SVD (do you need both U
TKP> and V^T?)

Yep, in this particular case both of them, and singular values too.

TKP> Timing (SVD *M* LEFT SINGULAR RIGHT SINGULAR) (singular vector from
TKP> both U and V^T) Evaluation took:
TKP> 1.656 seconds of real time
TKP> 1.650000 seconds of total run time (1.650000 user, 0.000000 system)
TKP> 99.64% CPU
TKP> 4,185,266,126 processor cycles
TKP> 24,551,440 bytes consed

I guess that's it. Approximately same time as with PROPACK.

TKP> LAPACK's lower-level SVD functions also have other possible speedups,
TKP> if you only need some of the singular values/vectors etc.

Do you mean that it is already implemented and it is easy to enable it?
From: Tamas K Papp on
On Sun, 18 Jul 2010 21:01:29 +0300, Captain Obvious wrote:

> TKP> LAPACK's lower-level SVD functions also have other possible
> speedups, TKP> if you only need some of the singular values/vectors
> etc.
>
> Do you mean that it is already implemented and it is easy to enable it?

I realize I have been thinking about eigenvalue/vector problems, not
SVD. The divide-and-conquer algorithm is about as fast as it gets on
dense matrices.

Tamas