Prev: OctaOS
Next: DIV overflow
From: Guga on
Hi wolfgang

[EDITED] I mistyped the last post. The corect one is:

tks for the tip of


" SHL al,4 ;mul by 16 (entry size) and get rid of 030
"


i suppose that using:
shl al 4
shr al 4


is faster then


sub eax 030


right ?

From: /o//annabee on
>
>
> Oops.. i forgot... i just answerd to you...
>
> But.. how do i multiply a value by 10 without using mul ?
>
> Is there a way to use lea to multiply a number by 10 ? The usage of
> lea is only to speed up a little.

lea ecx D$ecx*4+ecx
lea ecx D$ecx*2

>
> Best Regards
> Guga
>



--
Sendt med Operas revolusjonerende e-postprogram: http://www.opera.com/mail/
From: �a/b on
On 26 Mar 2007 12:39:48 -0700, "Guga" <GugaGTG(a)gmail.com> wrote:

>Hi guys
>
>someone knows how to convert an null terminated ascii string to
>tword ? (80 bits)

it seems i have get the long way
it seems my function store the array in an array of unsigned32bits
than change the base from the digit 10 base to the
digit base max_unsigned

and for doing all that it seems i use division (or better mod)
but in this way it seems i use only one routine for conversion
input <-> output with all you want "base" < max_unsigned

i'm not secure because i have done that many years ago in C language
and see only the code now

>I suceeded to convert an ascii to qword, making an similar function as
>atoi64, but i can�t extend the convertion to 80 bits.
>
>Some one knows how to convert? Also for 128 bit would be good too :)
>
>Btw: If someone have a C source of those routines and not an assembly
>one.. no problem...i can try translate to assembly;
>
>
>Best Regards,
>
>Guga
From: �a/b on
On Tue, 27 Mar 2007, Frank Kotler <fbkotler(a)verizon.net> wrote:
>Guga wrote:
>> I mean.. if using the convertino to 80 bit.. 3 registers can hold the
>> value,
>
>Yeah... two and a half registers, if we want to be stingy -
>cx:edx:eax... or use a segreg for the odd two bytes :)
>
>But what's the point? What are you going to do with this 80-bit integer
>once you've got it? Strange size for an integer - but used for
>extended-precision floats, which is what threw me off.

but 80bits unsigned does not fit all in an 80 bits "long double"
because float has "mantissa" and "exponent"
e.g long double == 64bits mantissa 1 sign the rest exponent
or something like

64bits unsigned fits in an 80bits long double

>> 128 bits, it may be 4 registers (or putting the generated data
>> on a structure with the proper size)
From: Wolfgang Kern on

Hello Guga,

[..]
>> Is there a way to use lea to multiply a number by 10 ? The usage of
>> lea is only to speed up a little.
>> </>

> > for 32 bits it will only work until 0_1999_9999*0A:
> >
> > LEA eax D$eax+eax*4 ;*5
> > ADD eax,eax ;*2
> >
> > but you got larger figures to multiply,
> > so you could either use (similar what Randy posted)
> > a bitwise shift-add which takes its time or
> > you can do it 32-bit wise (faster) instead.
> >
> > Both variants loop a lot and needs much time,
> > so I finally used the LUT solution.

> Ok.. but in cases of overflows, it will place the resultant overflown
> value in edx ?

No, only the 'F6/F7'-group (MUL IMUL) use dx:ax/edx:eax as result
(Ok, CPUID and LDMSR use also several registers),

LEA is not restricted to eax:

LEA ebp D$ebp+ebp*4 ;*5
ADD ebp,ebp ;*2

Neihter the immediate IMUL nor LEA will adjust edx, LEA does not
even affect any flags and truncate the result without indication.

In the code I posted any final Carry in the ADD-result
indicates an overflow error.
So if you haven't limited the input to it's max.value, you
will need a larger result buffer (one more register) or at least
"JC error".

If speed is not really of concern, then a 32-bit wise MUL is the
easiest way for 32-bit bound line-multiply.
The MUL instruction isn't that slow anymore, at least on AMDs,
So shift-add methods may not gain too much speed here.

There should be my old fast-MUL example on my homepage,
But it works with buffers as it can act on very huge figures.

__
wolfgang



First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Prev: OctaOS
Next: DIV overflow