From: Skybuck Flying on
Hello,

There is a bit position which needs to be converted to the longword position
in "byte pointer/offset form".

Method 1:

Divide the bit position by 32 (bits) and then to multiply it by 8 (bits).

The division gets rid of the remainder/fraction, the integer part is then
multiplied with 8 (bits) again to get the byte/memory cell offset.

(In other words "bit space is converted to longword space which is then
converted back to byte space")

Method 2:

Zero out bit position value range 0 to 31 by and-ing with the not of bits 0
to 4 set (the first/lower 5 bits).

This gets rid of the remainder as well and then divides it by 8 (bits).

(In other words "bit space is longword-nerved :) and converted directly to
byte space")

Bit space has intervals of 1 bits.
Longword space has intervals of 32 bits.
Byte space has intervals of 8 bits.

So it kinda makes sense doesn't it ;) :)

Anyway that's not that the weird part =D

The weird part is what Delphi does, and that's where you guys come in ! ;)
:)

Delphi implement method 1 as follows:

TestProgram.dpr.27: mLongwordPosition1 := (mBitPosition shr 5) shl 2; //
delphi does something special here...
00408F5D 8BC3 mov eax,ebx
00408F5F C1E805 shr eax,$05
00408F62 03C0 add eax,eax
00408F64 03C0 add eax,eax

The shift right 5 is done with instruction shr 5.
The shift left 2 is done with two adds ?!?.
A total of 4 instructions (including the initializer (?))
For a total of 9 instruction bytes.

Delphi implement method 2 as follows:

TestProgram.dpr.30: mLongwordPosition2 := (mBitPosition and not
(1+2+4+8+16)) shr 3;
00408F66 8BD3 mov edx,ebx
00408F68 83E2E0 and edx,-$20
00408F6B C1EA03 shr edx,$03

A total of 3 instructions (including the initializer (?))
For a total of 8 instruction bytes.

Pretty straight forward...

Now some questions for you to answer:

1. Why does Delphi replace the shl 2 with two adds like that ?!?

Is it maybe faster ? If so why ? Maybe pairing ? Maybe shl/shr conflict with
each other ?

2. Which method would be faster and why ?

3. Also if you have any alternative methods of calculating the same results
let me know ! ;) :)

Bye,
Skybuck.


From: Branimir Maksimovic on
On Thu, 27 May 2010 07:27:00 +0200
"Skybuck Flying" <IntoTheFuture(a)hotmail.com> wrote:

> Hello,
>
> There is a bit position which needs to be converted to the longword
> position in "byte pointer/offset form".
>
>
> 3. Also if you have any alternative methods of calculating the same
> results let me know ! ;) :)
bmaxa(a)maxa:~/fasm/test$ fasm bytepos.asm
flat assembler version 1.68 (16384 kilobytes memory)
2 passes, 149 bytes.
bmaxa(a)maxa:~/fasm/test$ ./bytepos || $?
No command '116' found, did you mean:
Command 'e16' from package 'e16' (universe)
116: command not found
bmaxa(a)maxa:~/fasm/test$

bmaxa(a)maxa:~/fasm/test$ cat bytepos.asm
format ELF executable

struc bytepos p,o
{
.ptr dd ?
.off db ?
}

macro pos bp,address,off
{
virtual at bp
.ptr dd ?
.off db ?
end virtual
mov dword[.ptr],address
mov byte[.off],off
}
segment readable writeable
exmpl db ?
vr bytepos 0,0
segment executable
entry $
pos vr,exmpl,1

movzx ebx,byte[vr.ptr]
movzx eax,byte[vr.off]
int 0x80
bmaxa(a)maxa:~/fasm/test$


>
> Bye,
> Skybuck.
>
>

Bye, ...


--
http://maxa.homedns.org/

Sometimes online sometimes not

Svima je "dozvoljeno" biti idiot i
> mrak, ali samo neki to odaberu,


From: Skybuck Flying on
I won't be needing this method so often anymore, because of a different
implementation, so no pressure, no hurry at answering the questions...

If no answer at all that just fine too ;)

Bye,
Skybuck :)


From: Nick Keighley on
The sibject is important so it should be included in the post body
Subject: Calculating longword pointer, which method is faster ?

On 27 May, 06:27, "Skybuck Flying" <IntoTheFut...(a)hotmail.com> wrote:


> There is a bit position which needs to be converted to the longword position
> in "byte pointer/offset form".

longword is 32 bits, byte is 8 bits?

Isn't this just quotient and remainder? You could do something fancy
with shifts and masks. Is ldiv from stdlib.h any use to you?

Can't you just time the various techniques? The answer is likely
platform dependent.

<snip>
From: nedbrek on
Hello all,

"Skybuck Flying" <IntoTheFuture(a)hotmail.com> wrote in message
news:40e23$4bfe029f$54190f09$16210(a)cache2.tilbu1.nb.home.nl...
> 1. Why does Delphi replace the shl 2 with two adds like that ?!?

On the first P4, add was 1/2 cycle (2 dependent adds in 1 clock). On other
machines, adders are often more available than shifters (being cheaper).
Sometimes, shifts cost more than one cycle. Probably, the compiler has a
bias towards P4.

Ned