fast matrix multiplication [Fortran]

Prev: Anybody use GPPTOOL to do programming before
Next: Multiple Key Sort in Fortran

From: Andrea Campera on 12 Jan 2006 03:26

I know that matrix multiplication alforithm can be lower than n^3, for
example thew Strassen algorithm is n^2.81 and there exists algorithm of
order n^2.3. I have also read somewhere that those algorithm are slightly
unstable in some cases. I have also tried with MATMUL (f90) but the code
runs slower. MAy there is a smart way to resolve (I+A*B)^(-1). Thanks again

Andrea

"Gordon Sande" <g.sande(a)worldnet.att.net> ha scritto nel messaggio
news:2006011109410016807-gsande(a)worldnetattnet...
> On 2006-01-11 04:14:42 -0400, "Andrea Campera"
> <andrea.campera(a)iet.unipi.it> said:
>
>> To be more precise I have to do an operation like (A*B+I)^ -1. up to now
>> I have used
>>
>> call multi(...)
>> call summa(...)
>> call gaussj(...)
>>
>> I post also the profile:
>>
>> Each sample counts as 0.01 seconds.
>> % cumulative self self total
>> time seconds seconds calls Ks/call Ks/call name
>
> <snip>
>
>> if I increase the dimension Nc of the matrices gaussj and multi become
>> the most time consuming routines. I have seen the subroutine cgemm.f of
>> Blas for the multiplication, but what about the inversion? is ther
>> anything faster than gaussj? I use g77 (3.2 or 3.4) Pentium IV or Xeon or
>> Opteron or AMD64 (we have several machines).
>>
>> Andrea
>
>
> The fastest way to invert matrices is to not invert matrices. This
> apparent
> piece of self contradictory advice really means that most uses of
> inversion
> are better done some other way.
>
> Solving is usually what is wanted and that can be better done by factoring
> and solving with the the factors. The question is "Why to you think you
> actually need the inverse?" Other that your simple derivation in matrix
> algebra.
>
> Matrix multiply will always be n^3 and inversion and/or solving will
> also be n^3. You can fiddle the coefficient of multiply by polishing
> code, or having someone else do it for you in a library. You can also
> polish inversion but beware the "highly polished" codes that pay no
> attention to numerical stability. Many Gauss-Jordan matrix inversion
> programs are the trivial algebra coded naively so you have to read and
> understand the documentation. Factoring has a bigger reduction in
> coefficient over inversion than you are ever going to see with polishing.
> And it may be that with a bit of insight into you real problem that
> there are other things that can be done. Thee are examples of easy
> problems with awkward derivations that are turned into awkward, and
> numerically harder, problems with easy derivations. For more details
> consult a competent numerical analyst.
>
>
>
>

From: Joost on 12 Jan 2006 04:03

You might want to read this paper:

http://surj.stanford.edu/2004/pdfs/kakaradov.pdf
'Ultra-Fast Matrix Multiplication:An Empirical Analysis of Highly
Optimized Vector Algorithms'

It has the following interesting sentence:

The value of n for which Strassen's algorithm (in this
implementation) becomes
better than the naive algorithm is n = 60,750.

Having said this, I have the following timings comparing multi with
cgemm on an opteron using GOTO blas:

Nc=351
multi 0.28795699999999996
CGEMM 0.04499200000000003

Nc=1351
multi 35.40061800000001
CGEMM 2.246658999999994

Cheers,

Joost

From: David Jones on 12 Jan 2006 04:51

Andrea Campera wrote:
> I know that matrix multiplication alforithm can be lower than n^3,
for
> example thew Strassen algorithm is n^2.81 and there exists algorithm
> of order n^2.3. I have also read somewhere that those algorithm are
> slightly unstable in some cases. I have also tried with MATMUL (f90)
> but the code runs slower. MAy there is a smart way to resolve
> (I+A*B)^(-1). Thanks again
>
The result wanted is

(I+A*B)^(-1)= I-A*( (I+B*A)^(-1) )*B

savings arise if B*A is smaller in dimension than A*B.

From: Andrea Campera on 12 Jan 2006 05:00

"unfortunately" A and B are square matrices, hence A*B and B*A have the same
dimensions. Using cgemm or multi plus summa to do (I+A*B) with Nc=300 gives
both 0.34 seconds on a 2.4 GHz pentium IV. but I don't use goto BLAS. for
N=1000 cgemm and multi plus summa gives, respectively, 8.5 and 13.2 seconds.
May be implemented goto BLAS can save lots of time in the multiplication.

Andrea

"David Jones" <dajxxx(a)ceh.ac.uk> ha scritto nel messaggio
news:43c6268d$1(a)news.nwl.ac.uk...
> Andrea Campera wrote:
>> I know that matrix multiplication alforithm can be lower than n^3,
> for
>> example thew Strassen algorithm is n^2.81 and there exists algorithm
>> of order n^2.3. I have also read somewhere that those algorithm are
>> slightly unstable in some cases. I have also tried with MATMUL (f90)
>> but the code runs slower. MAy there is a smart way to resolve
>> (I+A*B)^(-1). Thanks again
>>
> The result wanted is
>
> (I+A*B)^(-1)= I-A*( (I+B*A)^(-1) )*B
>
> savings arise if B*A is smaller in dimension than A*B.
>
>

From: Joost on 12 Jan 2006 06:27

> Using cgemm or multi plus summa to do (I+A*B) with Nc=300 gives
> both 0.34 seconds on a 2.4 GHz pentium IV. but I don't use goto BLAS.
> May be implemented goto BLAS can save lots of time in the multiplication.

:-) The last sentence makes the point...

In my opinion, BLAS is just a specification of an interface to some
basic linear algebra operations, the implementation made available in
netlib is just a reference implementation that provides 'correct
results' in a reasonable time, but not fast (i.e. the fortran code for
cgemm in the reference implementation is very similar to your multi).
Certain specific implementations of BLAS do much better when it comes
to the 'fast' bit. In these implementations people make sure that the
operations are ordered in such a way the code is optimal for the CPU.

Joost

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: Anybody use GPPTOOL to do programming before
Next: Multiple Key Sort in Fortran