|
From: Bram Bos on 21 Oct 2006 04:58 I'm replacing some of my regular Delphi code with assembler, but I'm still quite newbie at it, so here's a really basic question. I am writing function which limits a 32bit signed integer to -32767 .. 32768. Currently I do it this way: mov EAX, myInt cmp EAX, $00007FFF // < 32767 jbe @resQL cmp EAX, $FFFF8000 // > -32768 jae @resQL cmp EAX, $FFFF8000 // < -32768 jb @setQL mov EAX, $FFFF8000 // -32768 jmp @resQL @setQL: mov EAX, $00007FFF // 32767 @resQL: mov myInt, EAX I have a feeling I could do it in a brachless way using SBB, but I haven't figured it out yet. Anyone have a clue how to do this the optimal way?
From: Bram Bos on 21 Oct 2006 05:45 > I have a feeling I could do it in a brachless way using SBB, but I > haven't figured it out yet. Anyone have a clue how to do this the > optimal way? I think I have figured it out.. although I would like to know if there's an even more optimized way of doing it... mov EAX, rRi ; min xor EAX, $80000000 mov EDX, $FFFF8000 xor EDX, $80000000 sub EAX, EDX cmc sbb ECX, ECX and EAX, ECX add EAX, EDX ; max mov EDX, $00007FFF xor EDX, $80000000 sub EAX, EDX sbb ECX, ECX and EAX, ECX add EAX, EDX ; eax contains the (saturated) signed value xor EAX, $80000000 mov rRi, EAX cheers!
From: Herbert Kleebauer on 21 Oct 2006 16:07 Bram Bos wrote: > I think I have figured it out.. although I would like to know if > there's an even more optimized way of doing it... > ; min > xor EAX, $80000000 > mov EDX, $FFFF8000 > xor EDX, $80000000 > sub EAX, EDX > cmc > sbb ECX, ECX > and EAX, ECX > add EAX, EDX > ; max > mov EDX, $00007FFF > xor EDX, $80000000 > sub EAX, EDX > sbb ECX, ECX > and EAX, ECX > add EAX, EDX > ; eax contains the (saturated) signed value > xor EAX, $80000000 Are you sure that this is faster than the conditional jump version? Here you always have to execute 13 instructions ("mov EDX, $FFFF8000" + "xor EDX, $80000000" is only one instruction). If you use the conditional jump cmp.l #$00007fff,r0 bls.b _10 cmp.l #$ffff8000,r0 bhs.b _10 add.l r0,r0 subc.l r0,r0 eor.l #$00007fff,r0 _10: and we can assume, that in most cases (lets say 98%) the value is within the limits and positive and negative values have the same probability, then in 49% we have to execute 2 instructions, in another 49% 4 instructions and only in 2% 7 instructions. Also it can be an advantage, that no other register content is destroyed. With an average of 3 instructions this should be faster than your 13 instruction code, even if the conditional jump is slow (but at least for the second branch the branch prediction should be very good).
From: Bram Bos on 22 Oct 2006 08:06 > Are you sure that this is faster than the conditional jump > version? Here you always have to execute 13 instructions Actually you are absolutely right. I benchmarked both solutions and the conditional jump one turns out to be 20% faster than the non-branched one (at least on my AMD Athlon, haven't tried on an Intel CPU). I guess I should not be so obsessed with getting rid of the branches :-) Thanks!
From: vid512@gmail.com on 30 Oct 2006 09:55
how about this? (FASM syntax) test eax, 0FFFF0000h jz .okay js .negative ..positive: mov eax, 7FFFh jmp .okay ..negative: mov eax, 8000h ..okay: |