|
From: Tim Prince on 6 Apr 2008 18:22 FX wrote: >> no compiler support > > I'm not sure exactly what you mean by that: > > $ cat a.c > int foo(int i) { return __builtin_popcount (i); } > $ gcc -S a.c -msse4 > $ cat a.s > .text > .globl _foo > _foo: > pushl %ebp > movl %esp, %ebp > subl $8, %esp > movl 8(%ebp), %eax > popcntl %eax, %eax > leave > ret > .subsections_via_symbols > > There clearly is a popcntl opcode used. If you don't specify -msse4, you get a > call to a function in the GCC support library (libgcc). > > > $ gcc -v > Using built-in specs. > Target: i386-apple-darwin8.10.1 > Configured with: /tmp/gfortran-20080221/ibin/../gcc/configure --prefix=/usr/local/gfortran --enable-languages=c,fortran --with-gmp=/tmp/gfortran-20080221/gfortran_libs --enable-bootstrap > Thread model: posix > gcc version 4.4.0 20080221 (experimental) [trunk revision 132519] (GCC) > OK, that's interesting. gcc 4.4 leaves other compilers in the dust as far as popcount is concerned. I take it this won't be accessible directly by gfortran.
From: Terence on 6 Apr 2008 18:42 I think popcount is just a bit count of '1's in a string of bits. The standard way in market research, which uses this a lot, is to generate a table of bit counts for all possible 8-bit words (or use a pre-calculated one). Better is use an assembler fucntion. The popcount of a bit string is just the continued sum of the table entries corresponding to the 8-bit words it comprises (if the stirg is not an 8-bit multiple in length, the last is zero entended on capture). The assembler routine is surprisingly short and the main time loss is the function housekeeping in Fortan itself, very little occuring in stack control in assembler. The same applies to a 16 or 32-bit version, which for raw speed can use one large table, or treat each register word as a set of 8-bit sub- words and use a 256-word table. I added these functions and others as object files in an additional link library used by the Fortran compiler.
From: FX on 6 Apr 2008 19:28 > I take it this won't be accessible directly by gfortran. Fortran 2008 has it. I think it will be in gfortran-4.4. -- FX
From: glen herrmannsfeldt on 6 Apr 2008 19:34 Terence wrote: > I think popcount is just a bit count of '1's in a string of bits. > The standard way in market research, which uses this a lot, is to > generate a table of bit counts for all possible 8-bit words (or use a > pre-calculated one). Better is use an assembler fucntion. It works pretty well in C, which allows one to reference data as an array of unsigned char. How fast depends on how fast the machine can do byte access. Another way often done in C depends on the properties of twos complement arithmetic as: k=0; while(j=j&(j-1)) k++; Though neither C nor Fortran guarantee twos complement, it is popular enough that people use the system dependent code. k=0 while(j.ne.0) j=iand(j,j-1) k=k+1 end while The time will be somewhat proportional to the number of one bits in j. For small numbers of one bits it will likely be faster than the byte table lookup. One could also unroll the loop. -- glen
From: Walter Spector on 6 Apr 2008 20:22
Dan wrote: > ... > For those lucky enough not to have to program in CAL, S registers are > scalar, V are vector, T are transfer, A and B are data registers, This > code was originally written for a Cray-1, so it is simpler than X and > Y assembler. The code is uncommented deliberately. Sorry about that. All it does is the same as the Fortran-90 PACK intrinsic. Just use the intrinsic. If it isn't fast enough, complain to your compiler vendor. And it won't work on a Cray-1. It uses the compressed index and hardware gather instructions. Which means mid-life X-MP, onwards. The Cray compilers were eventually able to vectorize this idiom automatically. The CAL routine was unnecessary, and probably slower than letting the compiler generate inline code. W. |