From: Bram Bos on
I'm replacing some of my regular Delphi code with assembler, but I'm
still quite newbie at it, so here's a really basic question.

I am writing function which limits a 32bit signed integer to -32767 ..
32768.
Currently I do it this way:

mov EAX, myInt
cmp EAX, $00007FFF // < 32767
jbe @resQL
cmp EAX, $FFFF8000 // > -32768
jae @resQL
cmp EAX, $FFFF8000 // < -32768
jb @setQL
mov EAX, $FFFF8000 // -32768
jmp @resQL
@setQL:
mov EAX, $00007FFF // 32767
@resQL:
mov myInt, EAX

I have a feeling I could do it in a brachless way using SBB, but I
haven't figured it out yet. Anyone have a clue how to do this the
optimal way?

From: Bram Bos on
> I have a feeling I could do it in a brachless way using SBB, but I
> haven't figured it out yet. Anyone have a clue how to do this the
> optimal way?

I think I have figured it out.. although I would like to know if
there's an even more optimized way of doing it...

mov EAX, rRi
; min
xor EAX, $80000000
mov EDX, $FFFF8000
xor EDX, $80000000
sub EAX, EDX
cmc
sbb ECX, ECX
and EAX, ECX
add EAX, EDX
; max
mov EDX, $00007FFF
xor EDX, $80000000
sub EAX, EDX
sbb ECX, ECX
and EAX, ECX
add EAX, EDX
; eax contains the (saturated) signed value
xor EAX, $80000000
mov rRi, EAX

cheers!

From: Herbert Kleebauer on
Bram Bos wrote:

> I think I have figured it out.. although I would like to know if
> there's an even more optimized way of doing it...

> ; min
> xor EAX, $80000000
> mov EDX, $FFFF8000
> xor EDX, $80000000
> sub EAX, EDX
> cmc
> sbb ECX, ECX
> and EAX, ECX
> add EAX, EDX
> ; max
> mov EDX, $00007FFF
> xor EDX, $80000000
> sub EAX, EDX
> sbb ECX, ECX
> and EAX, ECX
> add EAX, EDX
> ; eax contains the (saturated) signed value
> xor EAX, $80000000

Are you sure that this is faster than the conditional jump
version? Here you always have to execute 13 instructions
("mov EDX, $FFFF8000" + "xor EDX, $80000000" is only one
instruction). If you use the conditional jump

cmp.l #$00007fff,r0
bls.b _10
cmp.l #$ffff8000,r0
bhs.b _10
add.l r0,r0
subc.l r0,r0
eor.l #$00007fff,r0
_10:

and we can assume, that in most cases (lets say 98%) the value
is within the limits and positive and negative values have the
same probability, then in 49% we have to execute 2 instructions,
in another 49% 4 instructions and only in 2% 7 instructions.
Also it can be an advantage, that no other register content
is destroyed. With an average of 3 instructions this should
be faster than your 13 instruction code, even if the conditional
jump is slow (but at least for the second branch the branch
prediction should be very good).
From: Bram Bos on

> Are you sure that this is faster than the conditional jump
> version? Here you always have to execute 13 instructions

Actually you are absolutely right. I benchmarked both solutions and the
conditional jump one turns out to be 20% faster than the non-branched
one (at least on my AMD Athlon, haven't tried on an Intel CPU). I guess
I should not be so obsessed with getting rid of the branches :-)

Thanks!

From: vid512@gmail.com on
how about this? (FASM syntax)

test eax, 0FFFF0000h
jz .okay
js .negative
..positive:
mov eax, 7FFFh
jmp .okay
..negative:
mov eax, 8000h
..okay: