Prev: OctaOS
Next: DIV overflow
From: Guga on
On Mar 28, 9:02 am, "Wolfgang Kern" <nowh...(a)never.at> wrote:
> Hello Guga,
> [..]
>
> <quote Guga..>
> Hi wolfgang.. tks for the reply.
>
> Have a good party :):):)
>
> I can remove the usage of stack frames. I´m just used to them for
> readability purposes mainly. I distinguish better a functin from
> another when i see the 1st "Proc" on the begginning of a line.
>
> About thye checkings for errors and limits on the ascii string.. yes..
> they are used in other routine. On the example i provided, i built it
> only when the string was already checked.
>
> "80 bit conversion could be done in three registers, but for 128 bit
> I'm afraid you either need a few LOCALs or use SSE to speed it up. "
>
> Yes.. this is what i was thinking. Using 80 bit in 3 registers, and on
> 128 bit, using local to compute the data, and returning them in 4
> registers (or returning in inside a global data - like a structure)
>
> "I can extract the method for fix-sized conversion and convert it into
> readable ASM."
>
> Tks.. i´ll appreciate it :)
>
> But.. if you suceed to do for 128 bit.. is SSE really needed ?
> </quote>
>
> The party ended somehow heavy this morning :)
>
> I found my old (KESYS1998) 80-bit conversion which is short but
> slow (later versions don't support this 'odd' IEEE-754 format anymore).
> It first compressed the string to BCD and used FBLD followed by
> FISTP and needed some rounding overhead,
> but you asked for an 80-bit integer and not the 80-bit FPU format ?
> btw: where is this required ?
>
> For speed reasons I now use my tiny calculator routines which
> works with a DEC<->2^n LUT on 256 bit variables.
> This table is quite long (78*9 entries 32 byte each ~22KB),
> [maximal 77.1 decimal digits can be represented with 256 unsigned bits,
> only nine entries per decade in the table (a partial log-LUT)]
> and so it's also usable for many other calculation.
>
> For the rare used 512-bit values I use a shorter table
> which contain just every 10th digitvalue but needs one line multiply.
>
> In your 128/80-bit case the LUT would need 39*9*16 bytes [~5.5 KB]
> and you can use it for 64-bit conversion as well.
>
> this then would work like (somehow fast):
> ___________________
> LUT_CONV_ASCII2BIN:
>
> XOR esi,esi ;result go to three regs for 80/96 bit
> MOV ebx,esi ;
> MOV edx,esi ;
>
> MOV ecx str_len -1 ;this is power10 of 1st digit (MSD)
> L1:
> MOVZX eax B$strptr+ecx ;we start with MSD, just for fun?
> SHL al,4 ;mul by 16 (entry size) and get rid of 030
> JZ L2> ;skip if 0
>
> ; LEA edi,D$ecx+ecx*8+table_ptr ;not sure if RosAsm accept this ?
> ; so you might need to split it into two lines:
>
> LEA edi D$ecx+ecx*8 ;mul digits power by 9
> ADD edi table_ptr ;table offset for power
>
> ADD edi,eax ;table offset for digit
> ; as above, LEA could combine the two ADD lines
>
> ;now just add the table entry to your destination:
> ;ie 80 bit:
> ADD ebx D$edi
> ADC edx D$edi+4
> ADC si W$edi+8
> L2:
> DEC ecx | JNS L1< ;next digit, and we include "+0" .
> done:
> RET
> _____________
>
> For 128-bit the story can be similar, and if you could avoid
> the stack-frames then ebp could be the 'missing forth' register. ;)
>
> If you need the code for table creation or just the table
> I can mail it to you.
> But it just contains binary expressed decimals starting
> with 1..9,10..90, and so on. So I'm sure you can do it as well.
>
> And No, SSE is not really required, even it may be faster than
> a plain register/buffer line MUL solution.
>
> __
> wolfgang


Oops.. i forgot... i just answerd to you...

But.. how do i multiply a value by 10 without using mul ?

Is there a way to use lea to multiply a number by 10 ? The usage of
lea is only to speed up a little.

Best Regards
Guga

From: Wolfgang Kern on

Hello Guga,
[..]
<>
Oops.. i forgot... i just answerd to you...

But.. how do i multiply a value by 10 without using mul ?

Is there a way to use lea to multiply a number by 10 ? The usage of
lea is only to speed up a little.
</>

for 32 bits it will only work until 0_1999_9999*0A:

LEA eax D$eax+eax*4 ;*5
ADD eax,eax ;*2

but you got larger figures to multiply,
so you could either use (similar what Randy posted)
a bitwise shift-add which takes its time or
you can do it 32-bit wise (faster) instead.

Both variants loop a lot and needs much time,
so I finally used the LUT solution.

__
wolfgang


From: Guga on
On Mar 28, 10:56 am, "Wolfgang Kern" <nowh...(a)never.at> wrote:
> Hello Guga,
> [..]
> <>
> Oops.. i forgot... i just answerd to you...
>
> But.. how do i multiply a value by 10 without using mul ?
>
> Is there a way to use lea to multiply a number by 10 ? The usage of
> lea is only to speed up a little.
> </>
>
> for 32 bits it will only work until 0_1999_9999*0A:
>
> LEA eax D$eax+eax*4 ;*5
> ADD eax,eax ;*2
>
> but you got larger figures to multiply,
> so you could either use (similar what Randy posted)
> a bitwise shift-add which takes its time or
> you can do it 32-bit wise (faster) instead.
>
> Both variants loop a lot and needs much time,
> so I finally used the LUT solution.
>
> __
> wolfgang

Ok.. but in cases of overflows, it will place the resultant overflown
value in edx ?

Best Regards,

Guga

From: Guga on
On Mar 28, 12:14 pm, "Guga" <Guga...(a)gmail.com> wrote:
> On Mar 28, 10:56 am, "Wolfgang Kern" <nowh...(a)never.at> wrote:
>
>
>
>
>
> > Hello Guga,
> > [..]
> > <>
> > Oops.. i forgot... i just answerd to you...
>
> > But.. how do i multiply a value by 10 without using mul ?
>
> > Is there a way to use lea to multiply a number by 10 ? The usage of
> > lea is only to speed up a little.
> > </>
>
> > for 32 bits it will only work until 0_1999_9999*0A:
>
> > LEA eax D$eax+eax*4 ;*5
> > ADD eax,eax ;*2
>
> > but you got larger figures to multiply,
> > so you could either use (similar what Randy posted)
> > a bitwise shift-add which takes its time or
> > you can do it 32-bit wise (faster) instead.
>
> > Both variants loop a lot and needs much time,
> > so I finally used the LUT solution.
>
> > __
> > wolfgang
>
> Ok.. but in cases of overflows, it will place the resultant overflown
> value in edx ?
>
> Best Regards,
>
> Guga- Hide quoted text -
>
> - Show quoted text -



Hi wolfgang

This is the rpeliminary version.

It can work with literally _any_ bit size. The example of code i´m
posting converts an decimal ascii string to 128bit

I´m now reviewing the code, and see if i can improve it´s speed, and
insert comments on how to use for other bit sizes (512, for
example :) :)

It seems to be precise.
here is the code.. it is subject to changes.. so i´m posting here
mainly for testing.


[Value:
Value.Conv32Bit: D$0
Value.Conv64Bit: D$ 0
Value.Conv96Bit: D$0
Value.Conv128Bit: D$ 0]

[Value.Conv32BitDis 0
Value.Conv64BitDis 4
Value.Conv96BitDis 8
Value.Conv128BitDis 12]

[MUL_32BIT 1]
[MUL_64BIT 2]
[MUL_96BIT 3]
[MUL_128BIT 4]

Proc AtoiAnyBit:
Arguments @String

mov edi D(a)String
mov esi Value

While B$edi <> 0

call alldecmulGuga

push edi
movsx eax B$edi
sub eax '0'
cdq

add D$esi+Value.Conv32BitDis eax ; 1st char
adc D$esi+Value.Conv64BitDis edx ; the product of the
multiplication of the char by 10 is stored in esi
adc D$esi+Value.Conv96BitDis edx ; the product of the
multiplication of the char by 10 is stored in esi
adc D$esi+Value.Conv128BitDis edx ; the product of the
multiplication of the char by 10 is stored in esi
pop edi

inc edi

End_While


EndP

[OverflowData: D$ 0]

alldecmulGuga:

xor eax eax ; always initialize eax 0

mov ecx 10

push edi

mov edi MUL_128BIT ; 4 dwords to analyse

mov ebx Value.Conv128BitDis ; we always star fromt the last member
and keep decreasing it

L1:
mov D$OverflowData eax ; copy to be added to the remainder
mov eax D$esi+ebx;Value.Conv32BitDis; Value.Conv32Bit;ebx ---
the targeted value
mul ecx

mov D$esi+ebx eax ; added now. copy the result value to the
targedt value
add edx D$OverflowData

cmp edi MUL_128BIT | je L0> ; avoid copying it ouside the
limits of the structure
mov D$esi+ebx+4 edx ; copy the remainder to the next 4
dword of the structure
L0:

dec ebx ; subtract edi by 4. we need to decrease it to point
to the next member of the structure
dec ebx ; Also we don´t want to affect the carry flag. This is
why use dec, instead sub edi 4
dec ebx
dec ebx
dec edi | jne L1<

pop edi
ret


Best Regards,

Guga

From: Guga on
Hi wolfgang

tks for the tip of

" SHL al,4 ;mul by 16 (entry size) and get rid of 030
"

i suppose that using:
shl al 4
shr al 4

is faster then

sub al 030

right ?


First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Prev: OctaOS
Next: DIV overflow