|
Prev: Mex Segmentation Violation
Next: Quite simple probleme
From: Joachim Selke on 5 May 2008 07:48 Tim Davis wrote: > I can't think of a better way ... except to go to a > mexFunction of course. > > The reason I say that is because a mexFunction could do > Atr(:,rows_j)'*B(:j) without forming the Atr(:,rows_j) > matrix as an intermediate step. Wow, that really speeds it up. Thanks! :-) Again, I did some experiments using random matrices (m = 50000, n = 10000, k = 15). The total time needed for constructing the product was around 1.3 seconds on my computer with a MEX function written in C (compared to 2.3 seconds using my previous implementation). Since this is my first MEX function (and since I did not even program in C before) it would help me a lot if you please could have a quick look at my implementation. It is straight-forward and I am quite sure that there is still room for improvement (maybe by doing some computations in parallel?). Here is the code: ------------------------------------------------------------------------- #include "mex.h" void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) { const mxArray *a, *b, *x; a = prhs[0]; b = prhs[1]; x = prhs[2]; mwSize m, n, k; m = mxGetM(x); n = mxGetN(x); k = mxGetN(a); mxArray *ab; ab = mxDuplicateArray(x); mwIndex *rows_ab, *cols_ab; rows_ab = mxGetIr(ab); cols_ab = mxGetJc(ab); double *data_a, *data_b, *data_ab; data_a = mxGetPr(a); data_b = mxGetPr(b); data_ab = mxGetPr(ab); int i, j, r; int ind_a, ind_b, ind_ab, ind_ab_next; ind_ab = 0; for (j = 0; j < n; j++) { ind_ab_next = ind_ab + cols_ab[j + 1] - cols_ab[j]; while (ind_ab < ind_ab_next) { i = rows_ab[ind_ab]; data_ab[ind_ab] = 0; for (r = 0; r < k; r++) { ind_a = r * m + i; ind_b = j * k + r; data_ab[ind_ab] += data_a[ind_a] * data_b[ind_b]; } ind_ab++; } } plhs[0] = ab; } ------------------------------------------------------------------------- Joachim
From: Tim Davis on 5 May 2008 08:14 Joachim Selke <j.selke(a)tu-bs.de> wrote in message <688agnF2rcjvjU1(a)mid.dfncis.de>... > Tim Davis wrote: > > I can't think of a better way ... except to go to a > > mexFunction of course. > > > > The reason I say that is because a mexFunction could do > > Atr(:,rows_j)'*B(:j) without forming the Atr(:,rows_j) > > matrix as an intermediate step. > > Wow, that really speeds it up. Thanks! :-) > > Again, I did some experiments using random matrices (m = 50000, > n = 10000, k = 15). The total time needed for constructing the product > was around 1.3 seconds on my computer with a MEX function written in C > (compared to 2.3 seconds using my previous implementation). > > Since this is my first MEX function (and since I did not even program in > C before) it would help me a lot if you please could have a quick look > at my implementation. It is straight-forward and I am quite sure that > there is still room for improvement (maybe by doing some computations in > parallel?). Here is the code: > > > ------------------------------------------------------------------------- > > #include "mex.h" > > void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray > *prhs[]) { > > const mxArray *a, *b, *x; > a = prhs[0]; > b = prhs[1]; > x = prhs[2]; > > mwSize m, n, k; > m = mxGetM(x); > n = mxGetN(x); > k = mxGetN(a); > > mxArray *ab; > ab = mxDuplicateArray(x); > > mwIndex *rows_ab, *cols_ab; > rows_ab = mxGetIr(ab); > cols_ab = mxGetJc(ab); > > double *data_a, *data_b, *data_ab; > data_a = mxGetPr(a); > data_b = mxGetPr(b); > data_ab = mxGetPr(ab); > > int i, j, r; > int ind_a, ind_b, ind_ab, ind_ab_next; > ind_ab = 0; > for (j = 0; j < n; j++) { > ind_ab_next = ind_ab + cols_ab[j + 1] - cols_ab[j]; > while (ind_ab < ind_ab_next) { > i = rows_ab[ind_ab]; > data_ab[ind_ab] = 0; > for (r = 0; r < k; r++) { > ind_a = r * m + i; > ind_b = j * k + r; > data_ab[ind_ab] += data_a[ind_a] * data_b[ind_b]; > } > ind_ab++; > } > } > > plhs[0] = ab; > } > > ------------------------------------------------------------------------- > > > Joachim Looks fine to me. I could suggest only one thing - mxDuplicateArray is copying the numerical values of x into the output ab, then later you overwrite them. It would be faster to create ab with mxCreateSparse, then fill the matrix with both the pattern and the values (the Jc, Ir, and data arrays). But the improvement would be minor.
From: Joachim Selke on 7 May 2008 09:02 Tim Davis wrote: > Looks fine to me. I could suggest only one thing - > mxDuplicateArray is copying the numerical values of x into > the output ab, then later you overwrite them. It would be > faster to create ab with mxCreateSparse, then fill the > matrix with both the pattern and the values (the Jc, Ir, and > data arrays). But the improvement would be minor. I tried that and it really seems to be minor. In fact, my implementation of the copying (mxCreateSparse followed by memcopy) was slightly slower than using mxDuplicateArray. But I found two additional ways to speed things up: (1) Replace the k write accesses to data_ab[ind_ab] in the inner loop by write accesses to a temporary variable z and do a single write operation to data_ab[ind_ab] after the loop has finished (2) Linearize accesses to matrix A by using A's transpose instead of A The new code only needs 0.7 s in my example scenario, computing of A' in MATLAB included (old code: 1.3 s, M code: 2.3 s). Here is the code: --------------------------------------------------------------------- #include "mex.h" void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) { const mxArray *atr, *b, *x; atr = prhs[0]; b = prhs[1]; x = prhs[2]; mwSize m, n, k; m = mxGetM(x); n = mxGetN(x); k = mxGetM(atr); mxArray *ab; ab = mxDuplicateArray(x); mwIndex *rows_ab, *cols_ab; rows_ab = mxGetIr(ab); cols_ab = mxGetJc(ab); double *data_atr, *data_b, *data_ab; data_atr = mxGetPr(atr); data_b = mxGetPr(b); data_ab = mxGetPr(ab); double z; int i, j, r; int ind_atr, ind_b, ind_ab, rows_left; ind_ab = 0; for (j = 0; j < n; j++) { rows_left = cols_ab[j + 1] - cols_ab[j]; while (rows_left != 0) { i = rows_ab[ind_ab]; z = 0; ind_atr = i * k; ind_b = j * k; for (r = 0; r < k; r++) { z += data_atr[ind_atr] * data_b[ind_b]; ind_atr++; ind_b++; } data_ab[ind_ab] = z; ind_ab++; rows_left--; } } plhs[0] = ab; } --------------------------------------------------------------------- Now I think that this is all one can do to speed it up. Am I right? :-) Joachim
From: Tim Davis on 7 May 2008 11:41 Instead of "int" you should use mwSignedIndex. It's safer, if you ever go to a 64-bit platform. I also recommend not using mwIndex at all, but mwSignedIndex in its place. The problemis that mwIndex is unsigned, so codes like this: mwIndex i, n ; .... for (i = n-1 ; i >= 0 ; i--) { do something } fail in an infinite loop, since i is always >= 0.
|
Pages: 1 Prev: Mex Segmentation Violation Next: Quite simple probleme |