From: Gary Scott on
Gerry Ford wrote:
> "Terence" <tbwright(a)cantv.net> wrote in message
> news:8aef8a66-8d45-4d8a-b596-7f5c924d6c21(a)p39g2000prm.googlegroups.com...
>
>>Am I missing something?
>>Assuming input data as nwords of 16-bit words in MAT and a table of
>>bitcounts in IBITC for integers 0 to 255, you calculate the '1' bit
>>count of such a string as follows:-
>>
>> INTEGER*4 MAT(Nwords),N
>> INTEGER*4 IBITC(256)
>> DATA IBITC/0,1,1,2,1,2,2,3,......16/
>>C
>> N=0
>> DO 1 I=1,nwords
>> K1=MAT(I)
>> K2=ISHR(K1,8) +1
>> K1=IAND(K1,#FF) +1
>> N=N+IBITC(K1)+IBITC(K2)
>> 1 CONTINUE
>>
>>If this is done in assembler directly is is even faster.
>>
>
> There's something about #FF that looks strange to me, beyond unfamiliarity
> to low-level processes in fortran. I find this a good resource:
> http://www.robelle.com/smugbook/ascii.txt

That's a nice MS/DEC/Compaq/Intel extension for hexadecimal notation.
You can read it as 16#FF as the 16 is the default implied base. (Z'FF')

>
> Does #FF mean 256th char in fortran?
>
>


--

Gary Scott
mailto:garylscott(a)sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford
From: glen herrmannsfeldt on
Gary Scott wrote:

> That's a nice MS/DEC/Compaq/Intel extension for hexadecimal notation.
> You can read it as 16#FF as the 16 is the default implied base. (Z'FF')

When did DEC do this? The VAX/VMS compilers I knew used the IBM Z
notation. (No apostrophes, only in DATA statements.)

Before VAX, DEC only had octal for non-decimal constants.

-- glen

From: James Van Buskirk on
"Terence" <tbwright(a)cantv.net> wrote in message
news:8aef8a66-8d45-4d8a-b596-7f5c924d6c21(a)p39g2000prm.googlegroups.com...

> Am I missing something?
> Assuming input data as nwords of 16-bit words in MAT and a table of
> bitcounts in IBITC for integers 0 to 255, you calculate the '1' bit
> count of such a string as follows:-

> INTEGER*4 MAT(Nwords),N
> INTEGER*4 IBITC(256)
> DATA IBITC/0,1,1,2,1,2,2,3,......16/
> C
> N=0
> DO 1 I=1,nwords
> K1=MAT(I)
> K2=ISHR(K1,8) +1
> K1=IAND(K1,#FF) +1
> N=N+IBITC(K1)+IBITC(K2)
> 1 CONTINUE
>
> If this is done in assembler directly is is even faster.

Well, the above doesn't seem very fast. Your snippet was
incomplete and it's unclear whether you intended an 8-bit LUT or a
16-bit LUT. gfortran barfed on the 16-bit LUT, so what we are left
with is:

C:\gfortran\clf\popcnt>type big_popcnt.s
.text
..globl _tm1
.def _tm1; .scl 2; .type 32; .endef
_tm1:
rdtsc
shrq $32, %rdx
orq %rdx, %rax
ret
..globl _popcnt1
.def _popcnt1; .scl 2; .type 32; .endef
_popcnt1:
subq $88, %rsp
movaps %xmm6, (%rsp)
movaps %xmm7, 16(%rsp)
movaps %xmm8, 32(%rsp)
movaps %xmm12, 48(%rsp)
movaps %xmm13, 64(%rsp)
movaps %xmm14, 96(%rsp)
movaps %xmm15, 112(%rsp)
movq $0xaaaaaaaaaaaaaaaa, %rax
movq %rax, %xmm0
movddup %xmm0, %xmm15
movq $0x3333333333333333, %rax
movq %rax, %xmm0
movddup %xmm0, %xmm14
movq $0x0f0f0f0f0f0f0f0f, %rax
movq %rax, %xmm0
movddup %xmm0, %xmm13
xorps %xmm12, %xmm12

xorps %xmm2, %xmm2
xorps %xmm3, %xmm3
xorps %xmm4, %xmm4
xorps %xmm5, %xmm5
xorps %xmm6, %xmm6
xorps %xmm7, %xmm7
xorps %xmm8, %xmm8

movl $32768, %edx
addq %rdx, %rcx
negq %rdx
loop1: movaps (%rcx,%rdx), %xmm0
movaps %xmm0, %xmm3
andps %xmm2, %xmm3
xorps %xmm0, %xmm2
movaps 16(%rcx,%rdx), %xmm0
movaps %xmm0, %xmm1
andps %xmm2, %xmm1
xorps %xmm0, %xmm2
orps %xmm1, %xmm3
movaps %xmm3, %xmm5
andps %xmm4, %xmm5
xorps %xmm3, %xmm4
movaps 32(%rcx,%rdx), %xmm0
movaps %xmm0, %xmm3
andps %xmm2, %xmm3
xorps %xmm0, %xmm2
movaps 48(%rcx,%rdx), %xmm0
movaps %xmm0, %xmm1
andps %xmm2, %xmm1
xorps %xmm0, %xmm2
orps %xmm1, %xmm3
movaps %xmm3, %xmm0
andps %xmm4, %xmm0
xorps %xmm3, %xmm4
orps %xmm0, %xmm5
movaps %xmm5, %xmm7
andps %xmm6, %xmm7
xorps %xmm5, %xmm6
movaps 64(%rcx,%rdx), %xmm0
movaps %xmm0, %xmm3
andps %xmm2, %xmm3
xorps %xmm0, %xmm2
movaps 80(%rcx,%rdx), %xmm0
movaps %xmm0, %xmm1
andps %xmm2, %xmm1
xorps %xmm0, %xmm2
orps %xmm1, %xmm3
movaps %xmm3, %xmm5
andps %xmm4, %xmm5
xorps %xmm3, %xmm4
movaps 96(%rcx,%rdx), %xmm0
movaps %xmm0, %xmm3
andps %xmm2, %xmm3
xorps %xmm0, %xmm2
movaps 112(%rcx,%rdx), %xmm0
movaps %xmm0, %xmm1
andps %xmm2, %xmm1
xorps %xmm0, %xmm2
orps %xmm1, %xmm3
movaps %xmm3, %xmm0
andps %xmm4, %xmm0
xorps %xmm3, %xmm4
orps %xmm0, %xmm5
movaps %xmm5, %xmm0
andps %xmm6, %xmm0
xorps %xmm5, %xmm6
orps %xmm0, %xmm7

call wrap
paddd %xmm7, %xmm8

addq $128, %rdx
jnz loop1

movaps %xmm6, %xmm7
call wrap
paddq %xmm8, %xmm8
paddq %xmm7, %xmm8
movaps %xmm4, %xmm7
call wrap
paddq %xmm8, %xmm8
paddq %xmm7, %xmm8
movaps %xmm2, %xmm7
call wrap
paddq %xmm8, %xmm8
paddq %xmm7, %xmm8
movhlps %xmm8, %xmm1
paddq %xmm8, %xmm1
movq %xmm1, %rax

movaps (%rsp), %xmm6
movaps 16(%rsp), %xmm7
movaps 32(%rsp), %xmm8
movaps 48(%rsp), %xmm12
movaps 64(%rsp), %xmm13
movaps 96(%rsp), %xmm14
movaps 112(%rsp), %xmm15
addq $88, %rsp
ret

wrap: movaps %xmm15, %xmm0
andps %xmm7, %xmm0
psrld $1, %xmm0
psubd %xmm0, %xmm7
movaps %xmm14, %xmm0
andnps %xmm7, %xmm0
psrld $2, %xmm0
andps %xmm14, %xmm7
paddd %xmm0, %xmm7
movaps %xmm7, %xmm0
psrld $4, %xmm7
paddd %xmm0, %xmm7
andps %xmm13, %xmm7
psadbw %xmm12, %xmm7
ret

C:\gfortran\clf\popcnt>type big_test.f90
program big_test
use ISO_C_BINDING
implicit none
interface
function tm1() bind(C)
import C_INT64_T
implicit none

integer(C_INT64_T) tm1
end function tm1
end interface

interface
function popcnt1(x) bind(C)
import C_INT64_T
import C_PTR
implicit none

integer(C_INT64_T) popcnt1
type(C_PTR), value :: x
end function popcnt1
end interface

interface
function popcnt2(x,n) bind(C)
import C_INT64_T
import C_PTR
import C_INT
implicit none

integer(C_INT64_T) popcnt2
type(C_PTR), value :: x
integer(C_INT) n
end function popcnt2
end interface

interface
subroutine sieve(t, n) bind(C)
import C_PTR
import C_INT
implicit none

type(C_PTR), value :: t
integer(C_INT) n
end subroutine sieve
end interface

integer(C_INT8_T), allocatable, target :: x(:)
integer(C_INT), parameter :: Nbytes = 32768 ! popcnt1 hardwired to this
integer, parameter :: align = 16 ! Will use xmm registers
type(C_PTR) x_ptr
integer(C_INTPTR_T) x_start
integer(C_INTPTR_T) t_start
type(C_PTR) t_ptr
integer i
integer(C_INT64_T) t0, t1, np

allocate(x(Nbytes+align-1))
x_ptr = C_LOC(x(1))
x_start = transfer(x_ptr, x_start)
t_start = iand(x_start+align-1, int(not(align-1),C_INTPTR_T))
t_ptr = transfer(t_start, t_ptr)
call sieve(t_ptr, Nbytes)
do i = 1, 4
t0 = tm1()
np = popcnt1(t_ptr)
t1 = tm1()
write(*,'(2(a,i0))') 'popcnt1 np = ', np, ' clocks = ', t1-t0
t0 = tm1()
np = popcnt2(t_ptr, Nbytes/4)
t1 = tm1()
write(*,'(2(a,i0))') 'popcnt2 np = ', np, ' clocks = ', t1-t0
end do

end program big_test

subroutine sieve(t, n) bind(C)
use ISO_C_BINDING
implicit none
integer(C_INT) n
integer(C_INT8_T) t(0:n-1)
integer i, lim, j

lim = sqrt(8*n+0.5_C_DOUBLE)
t = -1
t(0) = ibclr(t(0), 0)
do i = 2, lim
if(btest(t((i-1)/8),modulo(i-1,8))) then
do j = i**2, 8*n, i
t((j-1)/8) = ibclr(t((j-1)/8),modulo(j-1,8))
end do
end if
end do
end subroutine sieve

!function popcnt2(MAT, Nwords) bind(C)
! use ISO_C_BINDING
! implicit none
! integer(C_INT64_T) popcnt2
! integer(C_INT) Nwords
! integer(C_INT32_T) MAT(Nwords)
! integer i, j
! integer(C_INT8_T), parameter :: IBITC(0:65535) = [( &
! ibits(i,0,1)+ibits(i,1,1)+ibits(i,2,1)+ibits(i,3,1)+ &
! ibits(i,4,1)+ibits(i,5,1)+ibits(i,6,1)+ibits(i,7,1)+ &
! ibits(i,8,1)+ibits(i,9,1)+ibits(i,10,1)+ibits(i,11,1)+ &
! ibits(i,12,1)+ibits(i,13,1)+ibits(i,14,1)+ibits(i,15,1), &
! i=0,65535)]
! integer(C_INT32_T) K1, K2
!
! popcnt2 = 0
! do i = 1, Nwords
! K1 = MAT(i)
! K2 = ishft(K1, -16)
! K1 = iand(K1, int(z'ffff', C_INT32_T))
! popcnt2 = popcnt2+IBITC(K1)+IBITC(K2)
! end do
!end function popcnt2

function popcnt2(MAT, Nwords) bind(C)
use ISO_C_BINDING
implicit none
integer(C_INT64_T) popcnt2
integer(C_INT) Nwords
integer(C_INT16_T) MAT(2*Nwords)
integer i, j
integer(C_INT8_T), parameter :: IBITC(0:255) = [( &
ibits(i,0,1)+ibits(i,1,1)+ibits(i,2,1)+ibits(i,3,1)+ &
ibits(i,4,1)+ibits(i,5,1)+ibits(i,6,1)+ibits(i,7,1), &
i=0,255)]
integer(C_INT16_T) K1, K2

popcnt2 = 0
do i = 1, 2*Nwords
K1 = MAT(i)
K2 = ishft(K1, -8)
K1 = iand(K1, int(z'ff', C_INT16_T))
popcnt2 = popcnt2+IBITC(K1)+IBITC(K2)
end do
end function popcnt2

C:\gfortran\clf\popcnt>as big_popcnt.s

C:\gfortran\clf\popcnt>c:\gfortran\win64\bin\x86_64-pc-mingw32-gfortran -O2
big_
test.f90 big_popcnt.s -obig_test

C:\gfortran\clf\popcnt>big_test
popcnt1 np = 23000 clocks = 9060
popcnt2 np = 23000 clocks = 82810
popcnt1 np = 23000 clocks = 9230
popcnt2 np = 23000 clocks = 82830
popcnt1 np = 23000 clocks = 9290
popcnt2 np = 23000 clocks = 82770
popcnt1 np = 23000 clocks = 9310
popcnt2 np = 23000 clocks = 82790

--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end


From: Gary Scott on
glen herrmannsfeldt wrote:

> Gary Scott wrote:
>
>> That's a nice MS/DEC/Compaq/Intel extension for hexadecimal notation.
>> You can read it as 16#FF as the 16 is the default implied base. (Z'FF')
>
>
> When did DEC do this? The VAX/VMS compilers I knew used the IBM Z
> notation. (No apostrophes, only in DATA statements.)

When they created DVF. I doubt if it was retrofit to VAX.

>
> Before VAX, DEC only had octal for non-decimal constants.
>
> -- glen
>


--

Gary Scott
mailto:garylscott(a)sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford
From: Gary Scott on
glen herrmannsfeldt wrote:

> Gary Scott wrote:
>
>> That's a nice MS/DEC/Compaq/Intel extension for hexadecimal notation.
>> You can read it as 16#FF as the 16 is the default implied base. (Z'FF')
>
>
> When did DEC do this? The VAX/VMS compilers I knew used the IBM Z
> notation. (No apostrophes, only in DATA statements.)

Also, I believe it supports bases from 2-36 (or something like that)

2#00000001
3#...
4#...
....
36#...

>
> Before VAX, DEC only had octal for non-decimal constants.
>
> -- glen
>


--

Gary Scott
mailto:garylscott(a)sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford