Assembly Language - Mathematics WITHOUT maths coprocessor [ASM]

Prev: HLA wont compile!!
Next: Structs in Assembly

From: ?a/b on 10 Oct 2005 01:39

From: Richard Cooper on 10 Oct 2005 07:17

On Mon, 10 Oct 2005 01:39:52 -0400, ?a\/b <al(a)f.g> wrote:

> for me round in that stage is an error: there is no round,should be only
> a print_round at end

No, you round every time you truncate the precision of your numbers. What
you don't do is reduce the precision of your numbers until the end of your
calculation. However, since we're not free to keep an endless supply of
digits in our numbers, we're frequently forced to truncate the precision
of our numbers even when we're not at the end of our calculation.

For example, add and subtract don't give us any extra bits of precision,
so we don't have to do any rounding when we do an add or subtract, and
doing so would definately be a bad idea.

However, with multiply, we get extra precision behind the decimal point.
For example, 0.7 times 0.7 gives us 0.49, but we can only have one decimal
place, so we're forced to reduce the precision of our answer in order to
fit it into the fixed point data type we're using.

Now, for example, if you know that you're following up that multiplication
with a division, you would be far better off to not reduce the precision
at all, since the first step of doing a division is increasing the
precision of the number. So if the next step was to divide by 0.3, then
instead of rounding 0.49 to 0.5, then increasing the precision again to
0.50 and dividing by 0.3, you'd be much better off to keep the more
precise 0.49 and divide that by 0.3.

Keeping additional precision is exactly what floating point math is all
about. The exponent field allows you to say how many bits of the matissa
are the integer portion, and how many are the fractional portion, and so
you can keep the extra precision and just modify the exponent to reflect
that.

However, in fixed point math, we don't have an exponent field in the
number. So we have to reduce the precision of 0.49, and the best way to
do that is to round it to 0.5 rather than to simply truncate it to 0.4.

It's not the rounding that is bad, it's the loss of precision. It's just
that people say "you should round until the end" because people usually
round when they reduce precision, and they usually don't reduce precision
until the end. Everyone uses calculators, and calculators do the
precision reduction for them. When someone puts 2 / 3 into a calculator,
and the calculator says the answer is 0.666666667, no one ever says that
the calculator rounded the result, but that's what it did, because it had
to reduce it's precision to 9 decimal places, and that's how that last 6
became a 7.

The rounding that you don't do until the end is when you're calculating a
number to two decimal places, you don't round to two decimal places every
step of the way, but instead just once at the end. However, if your
calculator only has 9 decimal places, then of course you round to 9
decimal places every step of the way. That's what we're doing with the
fixed point math. Numbers in "<< 8" format only have room for 2.4 decimal
places, so we have to round every result along the way to the 2.4th
decimal place.

With that said, my method for rounding division is clearly wrong:

10.0 / 2.8 -> 00001005 / 0028 -> 0035 = 3.5, actual answer 3.5714

The goal is to round that 3.5714 to 3.6, but since we never calculate the
extra trailing digits, the only way to do it is to fix up the numerator
before doing the divide so that the result will have an additional 1/2 of
it's least signifigant bit added to it, so that when the division is done,
we get the correct rounded answer. Adding 1/2 of the least signifigant
bit doesn't do that, you have to add 1/2 of the denominator.

So the correct formula is this: (a << 8 + b >> 1) / b

Using that formula, you get these results:

10.0 / 2.8 -> 00001014 / 0028 -> 0036 = 3.6, actual answer 3.5714

I wrote a test program this time, which tested that formula with 100
million division problems, so I'm sure I got it right this time.

Of course, another way to do it is to use "(a << 8) / b", and then
increment the result if the remainder is greater than "b >> 1", the result
is exactly the same either way, however the "(a << 8 + b >> 1) / b" method
avoids the need for a conditional jump.

> round results are no good results too

Yes, but rounded results are closer to the correct answer than truncated
results. With the "* 10" format numbers, it's the difference between each
calculation having an error of +/- 0.05 if you round, or 0.05 +/- 0.05 if
you don't round. The average error when rounding is 0.00, whereas the
average error when truncating is -0.05.

From: randyhyde@earthlink.net on 10 Oct 2005 09:53

Richard Cooper wrote:

>
> For example, add and subtract don't give us any extra bits of precision,
> so we don't have to do any rounding when we do an add or subtract, and
> doing so would definately be a bad idea.
>

Technically, the sum (or difference) of two n-bit numbers may require
as many as (n+1) bits to hold the result. So you *do* need one extra
bit of precision to hold all possible results from a single addition or
subtraction operation.
Cheers,
Randy Hyde

From: ?a/b on 11 Oct 2005 01:27

On Mon, 10 Oct 2005 02:11:43 GMT, "Richard Cooper"
<spamandviruses(a)rr.com> wrote:
>On Sun, 09 Oct 2005 15:35:04 -0400, ?a\/b <al(a)f.g> wrote:
>
>> if you want understand it is in chapter 14 of book
>
>Yes, but with stuff like this:
>
>>>> 4. r x.
>
>I wouldn't know where to being to understand it. I mean, why would I "r
>x" when I could "q t" or "v m" or "p s" or maybe even "d e w g d x v?"
>
>Not that I understood any of the rest of it...
>
>> if "a" is an fnum than "a.sn" is the signed number in it
>> a.sn.sn is the unsigned number in it
>> so i define +-/* for "fixed point numbers"
>>
>> fnum a, b;
>>
>> a+b = a.sn+b.sn;
>> a-b = a.sn-b.sn;
>> a*b =(a.sn*b.sn).sn>>(32*precision)
>> a/b =(a.sn.sn<<(2*32*precision))/b.sn;

this should be a/b =(a.sn.sn<<(32*precision))/b.sn; ?

>Now you see, I have no idea what you just said there.
>
>Now since we're talking about fixed point numbers, I think that you're
>talking about how to add, subtract, multiply and divide them. E.g., if
>you have 32 bit integers, and the 24 most signifigant bits are the integer
>part, and the 8 least signifigant bits are the fractional part, then you
>do this for each operation:
>
>add: a + b
>subtract: a - b
>multiply: ( a * b + 1 << 7 ) >> 8
>divide: ( a << 8 + 1 << 7 ) / b

>> for see it in 10 base
>> 12.9/3.4 = 3.7[94]
>> 1290/34 = 37.[94]
>> 12.9*3.4 = 43.8[6]
>> 129 *34 = 4386

it is for me, for see if above my +-/* are right
....
>Again, I'm a bit confused, so I'll write out my own. We'll use four
>digits, one for each byte in the dword sized numbers:

>For 012.9 / 003.4, we've basically got the numbers represented as
>integers, 0129 and 0034. Now if you remember what you learned in 4th
>grade math, since both numbers are multiplied by 10, then you get the
>correct answer when you divide, which is going to be 4 in correctly
>rounded integer math. However, we don't want the correct answer, we want
>the answer times 10, so that we've still got one fractional digit. To do
>this, we just multiply the numerator by 10 before we do the division.
>That way the numerator is multiplied by 100, which is divided by the
>denominator multiplied by 10, and so the answer comes out multiplied by 10.

sorry for me,
i in my life have no "4th grade math" and no calculus course
only "Analisi Matematica 1-2" "Fisica 1-2" "Meccanica razionale"
"Geometria" "Fisica Matematica" the only corse compiuter related could
be "Teoria e applicazione delle macchine calcolatrici"

....
>> it should be ok but i don't know how to find the position of "."
>> in the translation base2^32 --> base10
>> and base10 --> base2^32
>> that serve for read-write the fixed point number in input output.
>> Has someone some idea on how it is possible find that position?
>
>I always convert to integer before displaying.
>
>Say we've got the '<< 8' style numbers. 123456.789 in this type of
>representation would be 0x01E240CA, which converted back to decimal is
>123456.789063...
>
>Since 8 bits gets us 256 possible fractional values, that's equivelent to
>2.408 decimal digits. So we'll want to output the number with either two
>or three decimal places. Usually you want to throw away a little bit of
>the number since it likely just contains rounding errors anyway, if we
>display 2 decimal places that's equivelent to discarding the 1.356 least
>signifigant bits. Normally you'd throw away more than that, but since
>we've only got 32 bits to begin with, it might be nice to keep them even
>if that last digit might be a little inaccurate.
>
>So now we need to convert the number to decimal. To do this, we first
>multiply it by 100 since we want two decimal places, this way the last two
>digits of our number are the fractional digits. Then we shift it 8 bits
>to the right to undo the '<< 8' stuff we've represented the number as. Of
>course, we add '1 < 7' to it first to round it.
>
>Now 100 in our number format is 25600 decimal, or 0x00006400 hex.
>
>multiply: (a * b + 1 << 7) >> 8
>
>(0x01E240CA * 0x00006400 + 0x00000080 ) >> 8 = 0xBC614EE8
>
>Of course, the whole event of taking "100 << 8" and then taking the answer
>">> 8" doesn't have a net effect, even with the "1 << 7" in there, so for
>this specific multiplication you can just do "a * b" and be done with it.
>
>0x01E240CA * 0x00000064 = 0x00000000BC614EE8
>
>(At this point we want it to be a 64 bit number, in case of overflow,
>although in this particular instance, the number still fits in 32 bits as
>long as it isn't signed.)
>
>Now we add 1 << 7 and then shift it right 8 bits. This has the effect of
>dividing it by 256, which we're doing to convert the number to a number
>with no fractional bits, which makes it easy to convert to decimal.
>
>0x00000000BC614EE8 + 1 << 7 = 0xBC614F68
>
>0x00000000BC614F68 >> 8 = 0x00BC614F
>
>(At this point, we're back into 32 bits again, but that's because we chose
>to discard 1.356 bits from our number, and so now our 32 bit number is a
>30.644 bit number. If we had chosen 3 decimal places instead, then we
>would have added 1.966 bits, and so at this point we'd have a 33.996 bit
>number, and so we'd have to continue doing the math in 64 bits.)
>
>(BTW, if you're wondering how I'm calculating how many bits our numbers
>are, all you do is take the base 2 log of the number. For example, to
>represent 2 digits of decimal accuracy, you do log_2(0.01) which is
>-6.6438, and so you need 6.6438 decimal places. For a more obvious
>example, to have you numbers accurate to 1/64, you calculate log_2(1/64)
>which is -8, and so you need 8 fractional bits.)
>
>Now, 0x00BC614F in decimal is 12345679, and since that is our number times
>100 (because of the step where we multiplied by 100) the last two decimal
>digits are after the decimal point, so the number is 123456.79, which is
>123456.789063 rounded to two decimal places.
>
>To convert from decimal to fixed point, you basically just do the
>reverse. With 123456.789 as input, you convert it to an integer,
>123456789, remembering that you multiplied by 1000 to make it an integer.
>Then you convert 123456789 to binary, which is 0x75BCD15. Then you shift
>that 8 bits to the left, to make it 0x000000075BCD1500 which is the
>multiplied integer in the fixed point representation. To convert it back
>to a fraction, you divide it by 1000 using the formula "(a << 8 + 1 << 7)
>/ 1000" which gives you 0x01E240CA, which is the most accurate
>representation of that number in the fixed point math.

i think it is more easy than what you say
for input something like this:
if Lstring is the string of numbers and "a" is the number
result
fnum a, b;
unum c;
char *p1, *p2;
b.sn=read(Lstring, p1);
if(*p1==".") ++p1;
else goto label;
c=read(p1, p2);
if(p1==p2) {label:;
a=b;
return a;
}
a=b<<(32*precision);
if(precision<c.len)
{k=c.len-precision;
c>>= 32*k;
}
a+=c;
return a;

for output, something like above, should be possible

From: ?a/b on 11 Oct 2005 03:04

On Tue, 11 Oct 2005 07:27:01 +0200, "?a\\/b" <al(a)f.g> wrote:

>i think it is more easy than what you say
>for input something like this:
>if Lstring is the string of numbers and "a" is the number
>result
>fnum a, b;
>unum c;
>char *p1, *p2;
>b.sn=read(Lstring, p1);

a=b<<(32*precision);

>if(*p1==".") ++p1;
>else goto label;
>c=read(p1, p2);

>if(p1==p2) {label:;

> return a;
> }

>if(precision<c.len)
> {k=c.len-precision;
> c>>= 32*k;
> }
>a+=c;
>return a;
>
>for output, something like above, should be possible

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: HLA wont compile!!
Next: Structs in Assembly