|
From: Skybuck Flying on 4 May 2008 16:09 "Wojciech Mula" <wojciech_mula(a)poczta.null.onet.pl.invalid> wrote in message news:20080504210809.45250bc5.wojciech_mula(a)poczta.null.onet.pl.invalid... > "Skybuck Flying" <BloodyShame(a)hotmail.com> wrote: > >> > You can replace mov/adc with single setc eax -- this instruction >> > has 1 cycle latency on modern CPUs. >> >> No, there is a little problem with that solution. >> >> setxx only sets a single byte. > > Sorry, I was sure that setxx accept 32-bit registers. > However if BitPostion lie in range 0..31 or even 0..255 > you can use setc instruction. Setc doesn't clear the high bits of eax. Here is an example: program Project1; {$APPTYPE CONSOLE} uses SysUtils; procedure Main; var Temp : longword; begin asm mov eax, $11223344 stc setc al mov Temp, eax end; writeln( Temp ); // displays: 287453953 end; begin try Main; except on E:Exception do Writeln(E.Classname, ': ', E.Message); end; readln; end. So you see, garbage remains in the high bits of eax. Delphi doesn't like that ;) So Delphi adds extra instructions like: and eax, 127 To clear out the high bits I guess when it translates "pascal sets" and "bit sets" to asm code. Ok time for a little example of what I think a problem could be in Delphi: procedure Bla( Input : longword; var Output : TarrayOfLongword ); type TBitEnum = 0..31; TBitSet = set of TBitEnum; begin Output[0] := Output[0] + longword( 0 in TBitSet(Input) ); Output[1] := Output[1] + longword( 1 in TBitSet(Input) ); Output[2] := Output[2] + longword( 2 in TBitSet(Input) ); // gets compiled to: Project1.dpr.299: Output[0] := Output[0] + longword( 0 in TBitSet(Input) ); 004090AC A801 test al,$01 004090AE 0F95C1 setnz cl 004090B1 83E17F and ecx,$7f // extra instruction ?!?!? 004090B4 010A add [edx],ecx // 4 instructions, inefficient ?! // my method is just 3 instructions ?! Bye, Skybuck.
From: Skybuck Flying on 4 May 2008 16:32 "Wojciech Mula" <wojciech_mula(a)poczta.null.onet.pl.invalid> wrote in message news:20080504210809.45250bc5.wojciech_mula(a)poczta.null.onet.pl.invalid... > "Skybuck Flying" <BloodyShame(a)hotmail.com> wrote: > >> > You can replace mov/adc with single setc eax -- this instruction >> > has 1 cycle latency on modern CPUs. >> >> No, there is a little problem with that solution. >> >> setxx only sets a single byte. > > Sorry, I was sure that setxx accept 32-bit registers. > However if BitPostion lie in range 0..31 or even 0..255 > you can use setc instruction. The BT instruction is used to test the bit position. The BT instruction is limited to a range of 0 to 31. How do you propose to test range 32 to 255 ??? <- could still be interesting ! ;) Bye, Skybuck.
From: Skybuck Flying on 4 May 2008 18:31 "Rob Kennedy" <me3(a)privacy.net> wrote in message news:686ne8F2sa42lU1(a)mid.individual.net... > Skybuck Flying wrote: >> "Wojciech Mula" <wojciech_mula(a)poczta.null.onet.pl.invalid> wrote in >> message >> news:20080504190913.938d1fff.wojciech_mula(a)poczta.null.onet.pl.invalid... >>> "Skybuck Flying" <BloodyShame(a)hotmail.com> wrote: >>> >>>> function FastGetBit( Value : longword; BitPosition : longword ) : >>>> boolean; >>>> asm >>>> bt eax, edx // latency: 1 >>>> mov eax, 0 // latency: 1 >>>> adc eax, 0 // latency: 1 >>>> end; >>> You can replace mov/adc with single setc eax -- this instruction >>> has 1 cycle latency on modern CPUs. >> >> No, there is a little problem with that solution. >> >> setxx only sets a single byte. > > Why is that a problem? That's exactly the size of your function's return > type. My mistake, I thought the boolean type was 4 bytes in size ;) I could have sworn that I saw SizeOf(Boolean) return 4 sometime. Maybe it was a LongBool in disguise :) Tboolean = LongBool; :) > >> The delphi 2007 compiler uses the setxx solution and for some reason it >> is forced to output: >> >> "and 127" as well. >> >> Which is an extra instruction. > > Based on your later message in this thread, it looks as though Delphi > really adds that as part of converting a Boolean to a LongWord. It has Yes, now it's starting to make a little bit of sense to me, I was wondering about that. Still though, why does it do "and 127" ? and not "and 255" ? Why does it not simply do: "and 1" ? > nothing to do with the code in your function. Delphi would add the "and > 127" part to the result of your function call no matter what code you put > in your function since Delphi doesn't do interprocedural analysis, > especially not for inline assembler. Hmmmmm :) Thanks, Bye, Skybuck.
From: Skybuck Flying on 4 May 2008 18:56 Ok, I see what the problem is now. It's Delphi's "set of" feature. type TBitEnum = 0..31; TBitSet = set of TBitEnum; begin Output[0] := Output[0] + longword( 0 in TBitSet(Input) ); The following is happening: The delphi compiler sees this line: "0 in TBitSet(Input)" and interprets it as a boolean (which is 1 byte). I don't want a boolean, I want a longword, I just want a number, not a boolean. So I am trying to force Delphi to simply use the TBitSet / TBitEnum as a "bit operator". Because that is what I really want ! I just want a bit operator in Delphi. Something like: if BitPos(5) in Value is set then begin writeln('bit position 5 is set'); end; This is still a boolean example above. But I would also like to be able to just get a single bit in a number-like-fashion: Byte := GetBitPos(5) of SomeValue; // returns 0 or 1 Anyway... the problem above is shitty. It's just one more reason why not to use sets. However this still does not answer my previous questions: 1. "Why is it using "and 127" ?" 2. "Why not "and 255" ?" 3. "Why does it not simply use "and 1" ?" Bye, Skybuck ;)
From: Wolfgang Kern on 5 May 2008 05:00 Robert Redelmeier replied to "The Flying Bucket": >>> asm >>> bt eax, edx // latency: 1 >>> mov eax, 0 // latency: 1 >>> adc eax, 0 // latency: 1 >>> end; >> You can replace mov/adc with single setc eax -- this >> instruction has 1 cycle latency on modern CPUs. should read SETc "AL", also possible: SALC > If you can tolerate additional bits set, try: > SBB eax, eax > This is likely over-optimizing -- unless inlined as part of > a larger routine, the control transfer (and any prolog/epilog) > will eat more than a few clocks. SBB is a good solution for all HLLs where 32 bits are needed to tell TRUE or FALSE, we ASMers can use just the carry flag to say good or bad and use jc cmovc setc or their counter parts after BT. Perhaps this SBB were the main idea behind BT results are in the Carry instead of (more logical on first glimpse) the Zero-flag. __ wolfgang
First
|
Prev
|
Pages: 1 2 Prev: Wanna do a WriteLongwordBits contest ? Next: Patching of a divide overflow error ? |