From: Skybuck Flying on
Hello,

Here is my fast (latency 3) get bit routine:

Enjoy ! ;) =D

// *** Begin of Code ***

program Project1;

{$APPTYPE CONSOLE}

{

Skybuck presents FastGetBit

version 0.01 created on 4 may 2008 by Skybuck Flying.

This routine has just 3 latency on AMD X2 3800+.

Ofcourse it also has some call and ret latency as usual.

It is limited to bit positions 0 to 31.

Could still be a nice and interesting function or just asm code
demonstration
when having to deal with these kinds of limited situations ! ;) =D

I wish Delphi supported something like this in high level language !

Would be nice ! =D and fast too ! ;) :)

Currentl Delphi compiler is limited to "test" and "set byte" instruction for
set of bit enum.
which kinda sux anyway... special/fast bit operator/indexing would be nice.

}

uses
SysUtils;

// returns true (1) if bit position set, or false(0) if bit position not
set.
// 3 latency plus ofcourse call + ret latency ;)
// bit position range: 0 to 31
function FastGetBit( Value : longword; BitPosition : longword ) : boolean;
asm
bt eax, edx // latency: 1
mov eax, 0 // latency: 1
adc eax, 0 // latency: 1
end;

// when you just wanna get 0 and 1 and not a boolean.
function FastGetBitInt( Value : longword; BitPosition : longword ) :
longword;
asm
bt eax, edx // latency: 1
mov eax, 0 // latency: 1
adc eax, 0 // latency: 1
end;

procedure Main;
var
B : boolean;
L : longword;
begin

B := false;
writeln( longword( B ) ); // 0

B := true;
writeln( longword( B ) ); // 1

L := 2147483648;

if FastGetBit( L, 31 ) then
begin
writeln( 'Bit Position is set');
end;
end;

begin
try
Main;
except
on E:Exception do
Writeln(E.Classname, ': ', E.Message);
end;
readln;
end.

// *** End of Code ***

Bye,
Skybuck.


From: Wojciech Muła on
"Skybuck Flying" <BloodyShame(a)hotmail.com> wrote:

> function FastGetBit( Value : longword; BitPosition : longword ) : boolean;
> asm
> bt eax, edx // latency: 1
> mov eax, 0 // latency: 1
> adc eax, 0 // latency: 1
> end;

You can replace mov/adc with single setc eax -- this instruction
has 1 cycle latency on modern CPUs.

w.
From: Skybuck Flying on
"Wojciech Mula" <wojciech_mula(a)poczta.null.onet.pl.invalid> wrote in message
news:20080504190913.938d1fff.wojciech_mula(a)poczta.null.onet.pl.invalid...
> "Skybuck Flying" <BloodyShame(a)hotmail.com> wrote:
>
>> function FastGetBit( Value : longword; BitPosition : longword ) :
>> boolean;
>> asm
>> bt eax, edx // latency: 1
>> mov eax, 0 // latency: 1
>> adc eax, 0 // latency: 1
>> end;
>
> You can replace mov/adc with single setc eax -- this instruction
> has 1 cycle latency on modern CPUs.

No, there is a little problem with that solution.

setxx only sets a single byte.

The delphi 2007 compiler uses the setxx solution and for some reason it is
forced to output:

"and 127" as well.

Which is an extra instruction.

It's probably better to avoid working with bytes, because Delphi likes
adding:
"movzx eax, al" all over the place ;) :) <- which are extra instructions :(

Bye,
Skybuck.


From: Robert Redelmeier on
In alt.lang.asm Wojciech Mu?a <wojciech_mula(a)poczta.null.onet.pl.invalid> wrote in part:
> "Skybuck Flying" <BloodyShame(a)hotmail.com> wrote:
>> function FastGetBit( Value : longword; BitPosition : longword ) : boolean;
>> asm
>> bt eax, edx // latency: 1
>> mov eax, 0 // latency: 1
>> adc eax, 0 // latency: 1
>> end;
>
> You can replace mov/adc with single setc eax -- this
> instruction has 1 cycle latency on modern CPUs.

If you can tolerate additional bits set, try:

SBB eax, eax

This is likely over-optimizing -- unless inlined as part of
a larger routine, the control transfer (and any prolog/epilog)
will eat more than a few clocks.

-- Robert


From: Wojciech Muła on
"Skybuck Flying" <BloodyShame(a)hotmail.com> wrote:

> > You can replace mov/adc with single setc eax -- this instruction
> > has 1 cycle latency on modern CPUs.
>
> No, there is a little problem with that solution.
>
> setxx only sets a single byte.

Sorry, I was sure that setxx accept 32-bit registers.
However if BitPostion lie in range 0..31 or even 0..255
you can use setc instruction.

w.