From: andrea campera on
Hi all,

I have a question regardin matrix multipllication. In my program I call
a subroutine to multiplies to n*n matrices thousands of times

SUBROUTINE multi(Nc,m1,m2,mtot)
INTEGER i,j,l,Nc
complex aa,m1(Nc,Nc),m2(Nc,Nc),mtot(Nc,Nc)
do i=1,Nc
do j=1,Nc
aa=0.
do l=1,Nc
aa=aa+m1(i,l)*m2(l,j)
enddo
mtot(i,j)=aa
enddo
enddo
return
end

my question is: is there a faster algorithm to solve this simple
multiplication? where can I find the code (fortran 77 if possible)?
Thanks in advance for any help

Regards

Andrea

From: Dave Seaman on
On 10 Jan 2006 15:13:16 -0800, andrea campera wrote:
> Hi all,

> I have a question regardin matrix multipllication. In my program I call
> a subroutine to multiplies to n*n matrices thousands of times

> SUBROUTINE multi(Nc,m1,m2,mtot)
> INTEGER i,j,l,Nc
> complex aa,m1(Nc,Nc),m2(Nc,Nc),mtot(Nc,Nc)
> do i=1,Nc
> do j=1,Nc
> aa=0.
> do l=1,Nc
> aa=aa+m1(i,l)*m2(l,j)
> enddo
> mtot(i,j)=aa
> enddo
> enddo
> return
> end

> my question is: is there a faster algorithm to solve this simple
> multiplication? where can I find the code (fortran 77 if possible)?
> Thanks in advance for any help

<http://www.netlib.org/lapack/>



--
Dave Seaman
U.S. Court of Appeals to review three issues
concerning case of Mumia Abu-Jamal.
<http://www.mumia2000.org/>
From: Gordon Sande on
On 2006-01-10 19:13:16 -0400, "andrea campera" <andreacampera(a)gmail.com> said:

> Hi all,
>
> I have a question regardin matrix multipllication. In my program I call
> a subroutine to multiplies to n*n matrices thousands of times
>
> SUBROUTINE multi(Nc,m1,m2,mtot)
> INTEGER i,j,l,Nc
> complex aa,m1(Nc,Nc),m2(Nc,Nc),mtot(Nc,Nc)
> do i=1,Nc
> do j=1,Nc
> aa=0.
> do l=1,Nc
> aa=aa+m1(i,l)*m2(l,j)
> enddo
> mtot(i,j)=aa
> enddo
> enddo
> return
> end
>
> my question is: is there a faster algorithm to solve this simple
> multiplication? where can I find the code (fortran 77 if possible)?
> Thanks in advance for any help
>
> Regards
>
> Andrea

You can either

1. realize that this is not worth trying to improve on for your
immediate needs. In fact thousands is not very interesting
anymore for things faster than hand operated abacuses. Is
five days programming and debugging really worth saving a
couple hours time on a computer that cost less that $2000?
I am not in favor of N factorial methods (like Cramer's rule
of algebra) when there are standard N cubed methods around
but I do believe that some elementary but hard cost benefit
accounting should be done.

2. polish your code so it does not have cache problems, bad compiler
options or other arcane issues that would require you to tell us
what computer, compiler, etc you are using. A first cure of this
sort is to stop using interpreted Basic (or Java) but you are
already using a Fortran compiler given where you asked the
question. See above on cost benefit accounting.

3. use a "fast matrix multiply" of the Stassen or Pan type. The trouble
is they are rather complicated and only come into their own for
rather big problems and have too much overhead for small problems.
(But they really give you good bragging rights in some circles!)
See above on cost benefit accounting. Strassen is N to about 2.7
and the last I noticed Pan had it down to N to about 2.3 (IIRC).







From: Joost on
use optimised blas library routines (i.e. not netlib) for the
multiplication. You're looking for the routine cgemm. You can (freely)
obtain GOTO blas, ATLAS blas, MKL, ACML, ... and they are f77
compatible. They will, depending on your Nc and computer be 2-100 times
faster than what you have now.

Joost

From: Andrea Campera on
To be more precise I have to do an operation like (A*B+I)^ -1. up to now I
have used

call multi(...)
call summa(...)
call gaussj(...)

I post also the profile:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ks/call Ks/call name
35.98 6490.45 6490.45 3577 0.00 0.00 multi_
30.85 12055.08 5564.63 100 0.06 0.06 density_
18.67 15422.26 3367.18 695 0.00 0.00 gaussj_
4.92 16309.68 887.42 101 0.01 0.01 tqli_
4.22 17070.11 760.43 1 0.76 0.91 dens_
3.65 17728.57 658.46 1 0.66 16.15 calcoladens_
0.93 17896.45 167.88 575596800 0.00 0.00 modul_
0.21 17934.09 37.64 101 0.00 0.01 scho1_
0.11 17954.49 20.40 1 0.02 0.95 solvscho_
0.09 17970.36 15.87 1 0.02 0.02 tred2_
0.08 17984.88 14.52 1 0.01 0.01 tqli2_
0.08 17998.66 13.78 596 0.00 0.00 diffe_
0.07 18011.67 13.01 198 0.00 0.03 combination_
0.07 18024.52 12.85 596 0.00 0.00 summa_
0.04 18031.49 6.97 100 0.00 0.03 matri_
0.03 18037.03 5.54 101 0.00 0.00 eigsr1_
0.00 18037.11 0.08 1 0.00 0.03 traccia_
0.00 18037.18 0.07 1 0.00 18.04 MAIN__
0.00 18037.22 0.04 1 0.00 0.00 eigsrt_
0.00 18037.26 0.04 1 0.00 17.09 smatrix_

if I increase the dimension Nc of the matrices gaussj and multi become the
most time consuming routines. I have seen the subroutine cgemm.f of Blas for
the multiplication, but what about the inversion? is ther anything faster
than gaussj? I use g77 (3.2 or 3.4) Pentium IV or Xeon or Opteron or AMD64
(we have several machines).

Andrea



"Joost" <jv244(a)cam.ac.uk> ha scritto nel messaggio
news:1136956436.318850.151600(a)g14g2000cwa.googlegroups.com...
> use optimised blas library routines (i.e. not netlib) for the
> multiplication. You're looking for the routine cgemm. You can (freely)
> obtain GOTO blas, ATLAS blas, MKL, ACML, ... and they are f77
> compatible. They will, depending on your Nc and computer be 2-100 times
> faster than what you have now.
>
> Joost
>