From: Tim Prince on
FX wrote:
>> no compiler support
>
> I'm not sure exactly what you mean by that:
>
> $ cat a.c
> int foo(int i) { return __builtin_popcount (i); }
> $ gcc -S a.c -msse4
> $ cat a.s
> .text
> .globl _foo
> _foo:
> pushl %ebp
> movl %esp, %ebp
> subl $8, %esp
> movl 8(%ebp), %eax
> popcntl %eax, %eax
> leave
> ret
> .subsections_via_symbols
>
> There clearly is a popcntl opcode used. If you don't specify -msse4, you get a
> call to a function in the GCC support library (libgcc).
>
>
> $ gcc -v
> Using built-in specs.
> Target: i386-apple-darwin8.10.1
> Configured with: /tmp/gfortran-20080221/ibin/../gcc/configure --prefix=/usr/local/gfortran --enable-languages=c,fortran --with-gmp=/tmp/gfortran-20080221/gfortran_libs --enable-bootstrap
> Thread model: posix
> gcc version 4.4.0 20080221 (experimental) [trunk revision 132519] (GCC)
>
OK, that's interesting. gcc 4.4 leaves other compilers in the dust as far
as popcount is concerned. I take it this won't be accessible directly by
gfortran.
From: Terence on
I think popcount is just a bit count of '1's in a string of bits.

The standard way in market research, which uses this a lot, is to
generate a table of bit counts for all possible 8-bit words (or use a
pre-calculated one). Better is use an assembler fucntion.

The popcount of a bit string is just the continued sum of the table
entries corresponding to the 8-bit words it comprises (if the stirg is
not an 8-bit multiple in length, the last is zero entended on
capture). The assembler routine is surprisingly short and the main
time loss is the function housekeeping in Fortan itself, very little
occuring in stack control in assembler.

The same applies to a 16 or 32-bit version, which for raw speed can
use one large table, or treat each register word as a set of 8-bit sub-
words and use a 256-word table.

I added these functions and others as object files in an additional
link library used by the Fortran compiler.
From: FX on
> I take it this won't be accessible directly by gfortran.

Fortran 2008 has it. I think it will be in gfortran-4.4.

--
FX
From: glen herrmannsfeldt on
Terence wrote:
> I think popcount is just a bit count of '1's in a string of bits.

> The standard way in market research, which uses this a lot, is to
> generate a table of bit counts for all possible 8-bit words (or use a
> pre-calculated one). Better is use an assembler fucntion.

It works pretty well in C, which allows one to reference data
as an array of unsigned char. How fast depends on how fast
the machine can do byte access.

Another way often done in C depends on the properties of twos
complement arithmetic as:

k=0;
while(j=j&(j-1)) k++;

Though neither C nor Fortran guarantee twos complement, it is
popular enough that people use the system dependent code.

k=0
while(j.ne.0)
j=iand(j,j-1)
k=k+1
end while

The time will be somewhat proportional to the number of one
bits in j. For small numbers of one bits it will likely
be faster than the byte table lookup. One could also
unroll the loop.

-- glen

From: Walter Spector on
Dan wrote:
> ...
> For those lucky enough not to have to program in CAL, S registers are
> scalar, V are vector, T are transfer, A and B are data registers, This
> code was originally written for a Cray-1, so it is simpler than X and
> Y assembler. The code is uncommented deliberately. Sorry about that.

All it does is the same as the Fortran-90 PACK intrinsic. Just use
the intrinsic. If it isn't fast enough, complain to your compiler
vendor.

And it won't work on a Cray-1. It uses the compressed index and hardware
gather instructions. Which means mid-life X-MP, onwards.

The Cray compilers were eventually able to vectorize this idiom
automatically. The CAL routine was unnecessary, and probably slower
than letting the compiler generate inline code.

W.