|
From: Gary Scott on 8 Apr 2008 22:23 Gerry Ford wrote: > "Terence" <tbwright(a)cantv.net> wrote in message > news:8aef8a66-8d45-4d8a-b596-7f5c924d6c21(a)p39g2000prm.googlegroups.com... > >>Am I missing something? >>Assuming input data as nwords of 16-bit words in MAT and a table of >>bitcounts in IBITC for integers 0 to 255, you calculate the '1' bit >>count of such a string as follows:- >> >> INTEGER*4 MAT(Nwords),N >> INTEGER*4 IBITC(256) >> DATA IBITC/0,1,1,2,1,2,2,3,......16/ >>C >> N=0 >> DO 1 I=1,nwords >> K1=MAT(I) >> K2=ISHR(K1,8) +1 >> K1=IAND(K1,#FF) +1 >> N=N+IBITC(K1)+IBITC(K2) >> 1 CONTINUE >> >>If this is done in assembler directly is is even faster. >> > > There's something about #FF that looks strange to me, beyond unfamiliarity > to low-level processes in fortran. I find this a good resource: > http://www.robelle.com/smugbook/ascii.txt That's a nice MS/DEC/Compaq/Intel extension for hexadecimal notation. You can read it as 16#FF as the 16 is the default implied base. (Z'FF') > > Does #FF mean 256th char in fortran? > > -- Gary Scott mailto:garylscott(a)sbcglobal dot net Fortran Library: http://www.fortranlib.com Support the Original G95 Project: http://www.g95.org -OR- Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html If you want to do the impossible, don't hire an expert because he knows it can't be done. -- Henry Ford
From: glen herrmannsfeldt on 8 Apr 2008 23:20 Gary Scott wrote: > That's a nice MS/DEC/Compaq/Intel extension for hexadecimal notation. > You can read it as 16#FF as the 16 is the default implied base. (Z'FF') When did DEC do this? The VAX/VMS compilers I knew used the IBM Z notation. (No apostrophes, only in DATA statements.) Before VAX, DEC only had octal for non-decimal constants. -- glen
From: James Van Buskirk on 9 Apr 2008 04:45 "Terence" <tbwright(a)cantv.net> wrote in message news:8aef8a66-8d45-4d8a-b596-7f5c924d6c21(a)p39g2000prm.googlegroups.com... > Am I missing something? > Assuming input data as nwords of 16-bit words in MAT and a table of > bitcounts in IBITC for integers 0 to 255, you calculate the '1' bit > count of such a string as follows:- > INTEGER*4 MAT(Nwords),N > INTEGER*4 IBITC(256) > DATA IBITC/0,1,1,2,1,2,2,3,......16/ > C > N=0 > DO 1 I=1,nwords > K1=MAT(I) > K2=ISHR(K1,8) +1 > K1=IAND(K1,#FF) +1 > N=N+IBITC(K1)+IBITC(K2) > 1 CONTINUE > > If this is done in assembler directly is is even faster. Well, the above doesn't seem very fast. Your snippet was incomplete and it's unclear whether you intended an 8-bit LUT or a 16-bit LUT. gfortran barfed on the 16-bit LUT, so what we are left with is: C:\gfortran\clf\popcnt>type big_popcnt.s .text ..globl _tm1 .def _tm1; .scl 2; .type 32; .endef _tm1: rdtsc shrq $32, %rdx orq %rdx, %rax ret ..globl _popcnt1 .def _popcnt1; .scl 2; .type 32; .endef _popcnt1: subq $88, %rsp movaps %xmm6, (%rsp) movaps %xmm7, 16(%rsp) movaps %xmm8, 32(%rsp) movaps %xmm12, 48(%rsp) movaps %xmm13, 64(%rsp) movaps %xmm14, 96(%rsp) movaps %xmm15, 112(%rsp) movq $0xaaaaaaaaaaaaaaaa, %rax movq %rax, %xmm0 movddup %xmm0, %xmm15 movq $0x3333333333333333, %rax movq %rax, %xmm0 movddup %xmm0, %xmm14 movq $0x0f0f0f0f0f0f0f0f, %rax movq %rax, %xmm0 movddup %xmm0, %xmm13 xorps %xmm12, %xmm12 xorps %xmm2, %xmm2 xorps %xmm3, %xmm3 xorps %xmm4, %xmm4 xorps %xmm5, %xmm5 xorps %xmm6, %xmm6 xorps %xmm7, %xmm7 xorps %xmm8, %xmm8 movl $32768, %edx addq %rdx, %rcx negq %rdx loop1: movaps (%rcx,%rdx), %xmm0 movaps %xmm0, %xmm3 andps %xmm2, %xmm3 xorps %xmm0, %xmm2 movaps 16(%rcx,%rdx), %xmm0 movaps %xmm0, %xmm1 andps %xmm2, %xmm1 xorps %xmm0, %xmm2 orps %xmm1, %xmm3 movaps %xmm3, %xmm5 andps %xmm4, %xmm5 xorps %xmm3, %xmm4 movaps 32(%rcx,%rdx), %xmm0 movaps %xmm0, %xmm3 andps %xmm2, %xmm3 xorps %xmm0, %xmm2 movaps 48(%rcx,%rdx), %xmm0 movaps %xmm0, %xmm1 andps %xmm2, %xmm1 xorps %xmm0, %xmm2 orps %xmm1, %xmm3 movaps %xmm3, %xmm0 andps %xmm4, %xmm0 xorps %xmm3, %xmm4 orps %xmm0, %xmm5 movaps %xmm5, %xmm7 andps %xmm6, %xmm7 xorps %xmm5, %xmm6 movaps 64(%rcx,%rdx), %xmm0 movaps %xmm0, %xmm3 andps %xmm2, %xmm3 xorps %xmm0, %xmm2 movaps 80(%rcx,%rdx), %xmm0 movaps %xmm0, %xmm1 andps %xmm2, %xmm1 xorps %xmm0, %xmm2 orps %xmm1, %xmm3 movaps %xmm3, %xmm5 andps %xmm4, %xmm5 xorps %xmm3, %xmm4 movaps 96(%rcx,%rdx), %xmm0 movaps %xmm0, %xmm3 andps %xmm2, %xmm3 xorps %xmm0, %xmm2 movaps 112(%rcx,%rdx), %xmm0 movaps %xmm0, %xmm1 andps %xmm2, %xmm1 xorps %xmm0, %xmm2 orps %xmm1, %xmm3 movaps %xmm3, %xmm0 andps %xmm4, %xmm0 xorps %xmm3, %xmm4 orps %xmm0, %xmm5 movaps %xmm5, %xmm0 andps %xmm6, %xmm0 xorps %xmm5, %xmm6 orps %xmm0, %xmm7 call wrap paddd %xmm7, %xmm8 addq $128, %rdx jnz loop1 movaps %xmm6, %xmm7 call wrap paddq %xmm8, %xmm8 paddq %xmm7, %xmm8 movaps %xmm4, %xmm7 call wrap paddq %xmm8, %xmm8 paddq %xmm7, %xmm8 movaps %xmm2, %xmm7 call wrap paddq %xmm8, %xmm8 paddq %xmm7, %xmm8 movhlps %xmm8, %xmm1 paddq %xmm8, %xmm1 movq %xmm1, %rax movaps (%rsp), %xmm6 movaps 16(%rsp), %xmm7 movaps 32(%rsp), %xmm8 movaps 48(%rsp), %xmm12 movaps 64(%rsp), %xmm13 movaps 96(%rsp), %xmm14 movaps 112(%rsp), %xmm15 addq $88, %rsp ret wrap: movaps %xmm15, %xmm0 andps %xmm7, %xmm0 psrld $1, %xmm0 psubd %xmm0, %xmm7 movaps %xmm14, %xmm0 andnps %xmm7, %xmm0 psrld $2, %xmm0 andps %xmm14, %xmm7 paddd %xmm0, %xmm7 movaps %xmm7, %xmm0 psrld $4, %xmm7 paddd %xmm0, %xmm7 andps %xmm13, %xmm7 psadbw %xmm12, %xmm7 ret C:\gfortran\clf\popcnt>type big_test.f90 program big_test use ISO_C_BINDING implicit none interface function tm1() bind(C) import C_INT64_T implicit none integer(C_INT64_T) tm1 end function tm1 end interface interface function popcnt1(x) bind(C) import C_INT64_T import C_PTR implicit none integer(C_INT64_T) popcnt1 type(C_PTR), value :: x end function popcnt1 end interface interface function popcnt2(x,n) bind(C) import C_INT64_T import C_PTR import C_INT implicit none integer(C_INT64_T) popcnt2 type(C_PTR), value :: x integer(C_INT) n end function popcnt2 end interface interface subroutine sieve(t, n) bind(C) import C_PTR import C_INT implicit none type(C_PTR), value :: t integer(C_INT) n end subroutine sieve end interface integer(C_INT8_T), allocatable, target :: x(:) integer(C_INT), parameter :: Nbytes = 32768 ! popcnt1 hardwired to this integer, parameter :: align = 16 ! Will use xmm registers type(C_PTR) x_ptr integer(C_INTPTR_T) x_start integer(C_INTPTR_T) t_start type(C_PTR) t_ptr integer i integer(C_INT64_T) t0, t1, np allocate(x(Nbytes+align-1)) x_ptr = C_LOC(x(1)) x_start = transfer(x_ptr, x_start) t_start = iand(x_start+align-1, int(not(align-1),C_INTPTR_T)) t_ptr = transfer(t_start, t_ptr) call sieve(t_ptr, Nbytes) do i = 1, 4 t0 = tm1() np = popcnt1(t_ptr) t1 = tm1() write(*,'(2(a,i0))') 'popcnt1 np = ', np, ' clocks = ', t1-t0 t0 = tm1() np = popcnt2(t_ptr, Nbytes/4) t1 = tm1() write(*,'(2(a,i0))') 'popcnt2 np = ', np, ' clocks = ', t1-t0 end do end program big_test subroutine sieve(t, n) bind(C) use ISO_C_BINDING implicit none integer(C_INT) n integer(C_INT8_T) t(0:n-1) integer i, lim, j lim = sqrt(8*n+0.5_C_DOUBLE) t = -1 t(0) = ibclr(t(0), 0) do i = 2, lim if(btest(t((i-1)/8),modulo(i-1,8))) then do j = i**2, 8*n, i t((j-1)/8) = ibclr(t((j-1)/8),modulo(j-1,8)) end do end if end do end subroutine sieve !function popcnt2(MAT, Nwords) bind(C) ! use ISO_C_BINDING ! implicit none ! integer(C_INT64_T) popcnt2 ! integer(C_INT) Nwords ! integer(C_INT32_T) MAT(Nwords) ! integer i, j ! integer(C_INT8_T), parameter :: IBITC(0:65535) = [( & ! ibits(i,0,1)+ibits(i,1,1)+ibits(i,2,1)+ibits(i,3,1)+ & ! ibits(i,4,1)+ibits(i,5,1)+ibits(i,6,1)+ibits(i,7,1)+ & ! ibits(i,8,1)+ibits(i,9,1)+ibits(i,10,1)+ibits(i,11,1)+ & ! ibits(i,12,1)+ibits(i,13,1)+ibits(i,14,1)+ibits(i,15,1), & ! i=0,65535)] ! integer(C_INT32_T) K1, K2 ! ! popcnt2 = 0 ! do i = 1, Nwords ! K1 = MAT(i) ! K2 = ishft(K1, -16) ! K1 = iand(K1, int(z'ffff', C_INT32_T)) ! popcnt2 = popcnt2+IBITC(K1)+IBITC(K2) ! end do !end function popcnt2 function popcnt2(MAT, Nwords) bind(C) use ISO_C_BINDING implicit none integer(C_INT64_T) popcnt2 integer(C_INT) Nwords integer(C_INT16_T) MAT(2*Nwords) integer i, j integer(C_INT8_T), parameter :: IBITC(0:255) = [( & ibits(i,0,1)+ibits(i,1,1)+ibits(i,2,1)+ibits(i,3,1)+ & ibits(i,4,1)+ibits(i,5,1)+ibits(i,6,1)+ibits(i,7,1), & i=0,255)] integer(C_INT16_T) K1, K2 popcnt2 = 0 do i = 1, 2*Nwords K1 = MAT(i) K2 = ishft(K1, -8) K1 = iand(K1, int(z'ff', C_INT16_T)) popcnt2 = popcnt2+IBITC(K1)+IBITC(K2) end do end function popcnt2 C:\gfortran\clf\popcnt>as big_popcnt.s C:\gfortran\clf\popcnt>c:\gfortran\win64\bin\x86_64-pc-mingw32-gfortran -O2 big_ test.f90 big_popcnt.s -obig_test C:\gfortran\clf\popcnt>big_test popcnt1 np = 23000 clocks = 9060 popcnt2 np = 23000 clocks = 82810 popcnt1 np = 23000 clocks = 9230 popcnt2 np = 23000 clocks = 82830 popcnt1 np = 23000 clocks = 9290 popcnt2 np = 23000 clocks = 82770 popcnt1 np = 23000 clocks = 9310 popcnt2 np = 23000 clocks = 82790 -- write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, & 6.0134700243160014d-154/),(/'x'/)); end
From: Gary Scott on 9 Apr 2008 08:46 glen herrmannsfeldt wrote: > Gary Scott wrote: > >> That's a nice MS/DEC/Compaq/Intel extension for hexadecimal notation. >> You can read it as 16#FF as the 16 is the default implied base. (Z'FF') > > > When did DEC do this? The VAX/VMS compilers I knew used the IBM Z > notation. (No apostrophes, only in DATA statements.) When they created DVF. I doubt if it was retrofit to VAX. > > Before VAX, DEC only had octal for non-decimal constants. > > -- glen > -- Gary Scott mailto:garylscott(a)sbcglobal dot net Fortran Library: http://www.fortranlib.com Support the Original G95 Project: http://www.g95.org -OR- Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html If you want to do the impossible, don't hire an expert because he knows it can't be done. -- Henry Ford
From: Gary Scott on 9 Apr 2008 08:49
glen herrmannsfeldt wrote: > Gary Scott wrote: > >> That's a nice MS/DEC/Compaq/Intel extension for hexadecimal notation. >> You can read it as 16#FF as the 16 is the default implied base. (Z'FF') > > > When did DEC do this? The VAX/VMS compilers I knew used the IBM Z > notation. (No apostrophes, only in DATA statements.) Also, I believe it supports bases from 2-36 (or something like that) 2#00000001 3#... 4#... .... 36#... > > Before VAX, DEC only had octal for non-decimal constants. > > -- glen > -- Gary Scott mailto:garylscott(a)sbcglobal dot net Fortran Library: http://www.fortranlib.com Support the Original G95 Project: http://www.g95.org -OR- Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html If you want to do the impossible, don't hire an expert because he knows it can't be done. -- Henry Ford |