floating point, how many significant figures? [C++]

Prev: Accessor Functions (getter) for C String (Character Array) Members
Next: Why is the return type of count_if() "signed" rather than "unsigned"?

From: Rui Maciel on 27 Jun 2010 21:07

Martin Vejnár wrote:

> And for the other roundtrip (string->double->string) to be guaranteed to
> work the original string must have at most 15 significant digits.

Don't you mean "the original string must have at least 16 significant
digits" ?

Rui Maciel

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: George Neuner on 27 Jun 2010 21:31

On Sun, 27 Jun 2010 09:27:05 CST, Zeljko Vrba
<mordor.nospam(a)fly.srk.fer.hr> wrote:

>On 2010-06-26, George Neuner <gneuner2(a)comcast.net> wrote:
>>
>> I don't have a cite ready, but problems similar to yours have been
>> discussed before in this group. The last one I remember was about
>> exponential notation: someone was reading in values like "0.0003" and
>> "3.e-4" and finding they don't compare equal. This is a known issue
>> with the VC++ library - it goes back at least to VS2002 and maybe
>> further.
>>
>Few citations from this paper: http://doi.acm.org/10.1145/382043.382405
>(rather old paper, but still..)
>
>"Decimal-to-binary conversions and vice versa stand out from the other
>conversions in IEEE-754 and IEEE-854 in that these conversions need not
>be correctly rounded for all ranges of operands."

Yes, I'm aware of this. However, conversion of "0.0003" and "3.e-4"
should produce the same bits. In VC++, the two conversions produce
different results. This (IMO and that of many others) is wrong.

George

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Hans Bos on 28 Jun 2010 05:36

"Rui Maciel" <rui.maciel(a)gmail.com> schreef in bericht news:4c24e64a$0$17402$a729d347(a)news.telepac.pt...
> Sorry for nit-picking but that isn't exactly true. According to IEEE 754,
> the single precision
> floating point data type has a (23+1)-bit mantissa while the double
> precisioun has a (52+1)-bit
> mantissa. So that means that the most significant digits a floating point
> representation which
> complies with IEEE 754 is:
>
> for single precision: log_10(2^(23+1)) = 7.2247 => 8 significant digits
> for double precision: log_10(2^(52+1)) = 15.955 => 16 significant digits
>
> There is no such thing as 15.9 digits. We either have 15 digits or 16
> digits. If we only handle 15
> significant digits with a IEEE 754 double precision data type then we are
> needlessly avoiding taking
> full advantage of double precision's precision. If we opt to handle 16
> significant digits then we
> take full advantage of the data type's precision at the expense of having
> a precision loss in those
> cases where a exact conversion from a 16-digit decimal representation to a
> double precision floating
> point representation would require more than 53 bits.

I am not sure what you mean by significant digits here.
When you convert a decimal to a IEEE float and then back to a decimal, in
the worst case only 6 decimals are preserved.
For IEEE double 15 decimals are preserved.

E.g. the closest float value for 9444738e15 is 9444737.469338917797888e15.
(8388612 * 2^50)
When converted back to a 7 digit decimal you get 9444737e15 (so only 6
decimals are preserved).

Greetings,
Hans.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: SG on 28 Jun 2010 05:47

On 28 Jun., 14:07, Rui Maciel wrote:
> Martin Vejn�r wrote:
> > And for the other roundtrip (string->double->string) to be guaranteed to
> > work the original string must have at most 15 significant digits.
>
> Don't you mean "the original string must have at least 16 significant
> digits" ?

No, he did not. If you allow strings with 16 or more significant
decimal digits there can be no injective mapping from "human readable
strings" to double values. Doubles are just not precise enough for
this.

Here's a more visual example. The crosses (+) refer to numbers on the
number line that can be represented with a "1+4"-bit mantissa:

1.0 1.5 2.0 3.0 4.0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+---+---+---+---+---+---+---+---+

These numbers are of the form

1.bbbb * 10^k (in base 2, bbbb = four bits)

You see that in the range 1...2 the spacing between two neighbouring
numbers is 1/16 while the spacings in the range of 2...4 is 1/8. If
you try to convert between this representation and a decimal one with
two digits of the form

x.y * 10^k (in base 10, x=1..9, y=0..9)

you'll see that in the range 1...9.9 we have a constant step size of
1/10. So, sometimes two decimal digits is not enough to represent the
floating point value. And sometimes it's the other way around.

That's exactly the problem with 16-digit decimals and IEEE-754 64-bit
floats. In some areas of the number line the decimal string version
has a higher accuracy and in some other areas the binary version has a
higher accuracy -- in terms of the spacings between consecutive
representable numbers.

Cheers!
SG

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: SG on 28 Jun 2010 05:42

On 26 Jun., 10:21, Andrew wrote:
> On 25 June, 13:11, SG wrote:
> > > VS GCC
> > > -937566.2364699869 -937566.2364699868
>
> > 937566.2364699869 =
> > 11100100111001011110.001111001000100101001100000011000 01110...
>
> > The closest representable number with an IEEE-754 64bit float is
>
> > 11100100111001011110.001111001000100101001100000011000 =
> > 937566.2364699868 485...
>
> > The closest representable 16-digit decimal number is
>
> > 937566.2364699868
>
> > So, your program you compiled with GCC did a good job.
>
> I'm not convinced.

I know that you expected to see a different result. What I'm telling
you is that it's not a problem with GCC but a problem with the
accuracy of your 64-bit floats. As I pointed out, the GCC version
finds the closest double value to your input string during the string-
>double conversion and it finds the closest 16digit string to your
double value in your double->string value. Truth is, your 64-bit
floats are just not accurate enough to be able to represent 16 decimal
digits reliably. Just look at the closest 4 representable double
values I gave you.

937566.2364699867 321...
937566.2364699868 485...
937566.2364699869 649...
937566.2364699870 813...

The step size is
0.0000000001 164...
which is higher than
0.0000000001 000

==> You can NOT represent 16 decimal digits accurately in every case
using IEEE-754 64bit floats. It works for some numbers (those with low
leading digits) but it won't work reliably with other numbers such as
yours starting with the digit 9, for example.

In a nutshell, this tables shows what kind of conversion roundtrip can
be lossless and which cannot assuming IEEE-754 64bit floats:

convesion \ decimal digits | 15 or less | 16 | 17 or more
---------------------------+------------+--------+-----------
string->double->string | lossless | lossy | lossy
double->string->double | lossy | lossy | lossless

The fact that both kinds of conversions are lossy using 16 decimal
digits in strings is due to the different bases. doubles use a binary
system with a base of two. The decimal strings use a base of 10. Some
numbers are represented less accurately in decimal (in case of low
leading digits) and some numbers are represented more accurately (in
case of high leading digits) compared to their binary counter part.
This makes 16 decimal digits generally unreliable for these
conversions.

> > If you're interested in a lossless double->string->double roundtrip
> > you should use 17 decimal digits and high quality conversions.
>
> See my sample program in this thread that uses the value
> -937566.2364699869. When GCC takes that string, converts it to a
> double, then converts the double back to a string, it gives
> -937566.2364699868. Adding an extra digit of precision gives
> -937566.23646998685. IFAICS this means it is doing the rounding
> incorrectly.

No, it does not. I explained the conversion step by step in my
previous post. Everything was rounded correctly. The problem is with
precision of IEEE-754 64-bit floats. They're just not accurate enough.
In more mathematical terms, a conversion from string to double where
the string contains 16 significant decimal digits CAN NOT be an
injective mapping. You just witnessed a proof for this.

Two possible input strings:
937566.2364699868
937566.2364699869
In both cases the CLOSEST REPRESENTABLE double-value is
937566.2364699868 485...

So, the mapping that picks the double value that is closest to your
input is NOT INJECTIVE. But this injectivity is a requirement for it
to be invertible.

Cheers!
SG

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: Accessor Functions (getter) for C String (Character Array) Members
Next: Why is the return type of count_if() "signed" rather than "unsigned"?