From: Colin Paul Gloster on
On Mon, 15 Feb 2010, S. J. W. posted:

|---------------------------------------------------------------------------|
|"[..] |
|> |
|> You see now what's happening.  With the gnatn switch the |
|> compiler is smart enough to call the Log just once, rather |
|> than 10**6 times. |
|> |
|> If you remove the -gnatn or -gnatN switches, then it runs in |
|>  0m0.024s again. |
| |
|The trouble is that that benchmark does something other than Colin's!" |
|---------------------------------------------------------------------------|

That is not the problem. The code which I posted at the beginning of
this thread was not a means in itself, but was intended for timing
performances of implementations of logarithm functions in the base of
ten in a manner representative of real code which I use. The real code
is not dedicated to calculating something approximately equal to
6.3E+08. I could have written 500 * 1_000_000 calls or 3.14 * 1000
calls or a single call. A single call might have been overwhelmed by
overhead unrelated to the logarithm function. In the case of the C++
version when using a particular compilation switch, I failed in the
task because the hardcoded arguments I provided resulted in a trivial
and dramatic optimization which would not happen in the real code.

While it is unfortunate for Ada code in general that Ada compilers
fail to mimic this optimization of G++'s, that particular optimization
would not benefit the usage of logarithms in the real code I
mentioned. Dr. Jonathan Parker is free to pursue this problem in a
subthread or with vendors.

|---------------------------------------------------------------------------|
|"This might be a more accurate translation: |
| |
|with Ada.Numerics.Generic_Elementary_Functions; |
|with Text_IO; use Text_IO; |
|procedure Log_Bench_0 is |
| type Real is digits 15; |
| package Math is new Ada.Numerics.Generic_Elementary_Functions |
|(Real); |
| use Math; |
| Answer : Real := 0.0; |
| Log_Base_10_Of_E : constant := 0.434_294_481_903_251_827_651_129; |
|begin |
| for I in 1 .. 1_000_000 loop |
| declare |
| X : Real := 0.1; |
| begin |
| for J in 1 .. 500 loop |
| Answer := Answer + Log_Base_10_Of_E * Log (X); |
| X := X + 0.1; |
| end loop; |
| end; |
| end loop; |
| Put (Real'Image(Answer)); |
|end Log_Bench_0; |
| |
|I've tried inlining GNAT's implementation (GCC 4.5.0, x86_64-aqpple- |
|darwin10.2.0) and even just calling up the C log10 routine using an |
|inline. None was very impressive compared to the g++ result. |
| |
|Colin's time: 37s |
|Jonathan's time (-O3 -ffast-math -gnatp): 16s |
|Jonathan;s time (-O3 -ffast-math -gnatp -gnatN -funroll-loops): 14s |
|Jonathan's time (same opts, but using C log10()): 11s" |
|---------------------------------------------------------------------------|

That ordering does not necessarily hold...

GCC4.2.4...

gnatmake -O3 -ffast-math -gnatp Log_Bench_0.adb -o Log_Bench_0_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp

time ./Log_Bench_0_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp
6.34086408606382E+08

real 0m14.328s
user 0m14.329s
sys 0m0.000s


gnatmake -O3 -ffast-math -gnatp -gnatN -funroll-loops Log_Bench_0.adb -o Log_Bench_0_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops

time ./Log_Bench_0_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops
6.34086408606382E+08

real 0m14.346s
user 0m14.341s
sys 0m0.004s


GCC4.4.3 (slower than GCC4.2.4 for this program)...

gnatmake -O3 -ffast-math Log_Bench_0.adb -o Log_Bench_0_compiled_by_GCC4.4.3_with_-ffast-math

time ./Log_Bench_0_compiled_by_GCC4.4.3_with_-ffast-math
6.34086408606382E+08

real 0m14.713s
user 0m14.689s
sys 0m0.000s


gnatmake -O3 -ffast-math -gnatp Log_Bench_0.adb -o Log_Bench_0_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp

time ./Log_Bench_0_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp
6.34086408606382E+08

real 0m14.691s
user 0m14.693s
sys 0m0.000s


gnatmake -O3 -ffast-math -gnatp -gnatN -funroll-loops Log_Bench_0.adb -o Log_Bench_0_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops

time ./Log_Bench_0_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops
6.34086408606382E+08

real 0m14.690s
user 0m14.689s
sys 0m0.000s

|---------------------------------------------------------------------------|
|"so we still have 3 orders of magnitude to go to get to the g++ result: |
|0.02s |
| |
|This is my final version, with the inlined GNAT implementation too: |
| |
|with Ada.Numerics.Generic_Elementary_Functions; |
|with System.Machine_Code; use System.Machine_Code; |
|with Text_IO; use Text_IO; |
|procedure Log_Bench is |
| type Real is digits 15; |
| package Math is new Ada.Numerics.Generic_Elementary_Functions |
|(Real); |
| use Math; |
| Answer : Real := 0.0; |
| Log_Base_10_Of_E : constant := 0.434_294_481_903_251_827_651_129; |
| function LogM (X : Real) return Real; |
| pragma Inline_Always (LogM); |
| function LogM (X : Real) return Real is |
| Result : Real; |
| NL : constant String := ASCII.LF & ASCII.HT; |
| begin |
| Asm (Template => |
| "fldln2 " & NL |
| & "fxch " & NL |
| & "fyl2x " & NL, |
| Outputs => Real'Asm_Output ("=t", Result), |
| Inputs => Real'Asm_Input ("0", X)); |
| return Result; |
| end LogM; |
| function LogL (X : Real) return Real; |
| pragma Import (C, LogL, "log10"); |
|begin |
| for I in 1 .. 1_000_000 loop |
| declare |
| X : Real := 0.1; |
| begin |
| for J in 1 .. 500 loop |
|-- Answer := Answer + Log_Base_10_Of_E * LogM (X); |
| Answer := Answer + LogL (X); |
| X := X + 0.1; |
| end loop; |
| end; |
| end loop; |
| Put (Real'Image(Answer)); |
|end Log_Bench; |
| |
|[..]" |
|---------------------------------------------------------------------------|

Not all of those switches would yield fair proxies for timings of
logarithms in the real code which inspired this thread, but anyway...


64bit GCC4.2.4...
gnatmake -O3 -ffast-math Log_Bench.adb -o Log_Bench_compiled_by_GCC4.2.4_with_-ffast-math -largs /lib/libm.so.6

time ./Log_Bench_compiled_by_GCC4.2.4_with_-ffast-math
6.34086408606382E+08

real 0m34.497s
user 0m34.494s
sys 0m0.004s


gnatmake -gnatp -O3 -ffast-math Log_Bench.adb -o Log_Bench_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp -largs /lib/libm.so.6

time ./Log_Bench_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp
6.34086408606382E+08

real 0m34.503s
user 0m34.506s
sys 0m0.000s


gnatmake -gnatN -funroll-loops -gnatp -O3 -ffast-math Log_Bench.adb -o Log_Bench_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops -largs /lib/libm.so.6

time ./Log_Bench_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops
6.34086408606382E+08

real 0m34.547s
user 0m34.546s
sys 0m0.004s

64bit GCC4.4.3...
gnatmake -O3 -ffast-math Log_Bench.adb -o Log_Bench_compiled_by_GCC4.4.3_with_-ffast-math -largs /lib/libm.so.6

time ./Log_Bench_compiled_by_GCC4.4.3_with_-ffast-math
6.34086408606382E+08

real 0m34.257s
user 0m34.258s
sys 0m0.000s


gnatmake -gnatp -O3 -ffast-math Log_Bench.adb -o Log_Bench_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp -largs /lib/libm.so.6

time ./Log_Bench_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp
6.34086408606382E+08

real 0m34.474s
user 0m34.478s
sys 0m0.000s


gnatmake -gnatN -funroll-loops -gnatp -O3 -ffast-math Log_Bench.adb -o Log_Bench_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops -largs /lib/libm.so.6

time ./Log_Bench_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops
6.34086408606382E+08

real 0m34.188s
user 0m34.182s
sys 0m0.004s
From: Colin Paul Gloster on
On Mon, 15 Feb 2010, William Findlay posted:

|------------------------------------------------------------------------|
|"On 15/02/2010 10:58, in article |
|alpine.LNX.2.00.1002151055530.17315(a)Bluewhite64.example.net, "Colin Paul|
|Gloster" <Colin_Paul_Gloster(a)ACM.org> wrote: |
| |
|> Of the two programs shown, the fastest C++ implementation on one test |
|> platform took less than one millisecond and the fastest Ada |
|> implementation took one minute and 31 seconds and 874 milliseconds on |
|> the same platform. Both g++ and gnatmake were from the same |
|> installation of GCC 4.1.2 20080704 (Red Hat 4.1.2-44). |
| |
|Is that 1 millisecond for 1e6 calls?" |
|------------------------------------------------------------------------|

No, that was less than one millisecond for 500 * 10**6 C++ calls.

|------------------------------------------------------------------------|
|" This implies 1ns per call in C++. |
|I find it incredible that a log function could be so fast. |
|I think the loop body must be evaluated at compile-time in C++." |
|------------------------------------------------------------------------|

The C++ compiler did manage to eliminate almost everything.

|------------------------------------------------------------------------|
|"On my system your Ada code gives: |
| |
|6.34086408536266E+08 |
| |
|real 0m33.918s |
|user 0m33.864s |
|sys 0m0.025s |
| |
|And your original C++ code gives: |
| |
|6.34086e+08 |
|real 0m0.110s |
|user 0m0.003s |
|sys 0m0.003s |
| |
|But if I replace the C++ loop body by: |
| |
| for(int j=1; j<=500; ++j) |
| answer += std::log10(j*0.100000000000000000000); |
|It now gives: |
| |
|6.34086e+08 |
|real 0m18.112s |
|user 0m18.082s |
|sys 0m0.015s |
| |
|This less than twice as fast as the more generalized Ada code. |
| |
|[..]" |
|------------------------------------------------------------------------|

Thank you for exposing this flaw in the C++ code.

with Ada.Numerics.Generic_Elementary_Functions;
with Interfaces.C;
with Ada.Text_IO;
procedure Logarithmic_Work_In_Ada_with_a_Findlay_loop is
answer : Interfaces.C.double := 0.0;
package double_library is new Ada.Numerics.Generic_Elementary_Functions(Interfaces.C.double);
package double_output_library is new Ada.Text_IO.Float_IO(Interfaces.C.double);
begin

for I in 1 .. 1_000_000 loop
for J in 1 .. 500 loop
answer := Interfaces.C."+"(
answer, double_library.log(
Interfaces.C."*"(
Interfaces.C.double(J),
0.100000000000000000000
)
,
10.0
)
);
end loop;
end loop;

double_output_library.Put(answer);
end;

gnatmake -O3 -ffast-math Logarithmic_Work_In_Ada_with_a_Findlay_loop.adb -o Logarithmic_Work_In_Ada_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math

time ./Logarithmic_Work_In_Ada_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math
6.34086408606382E+08

real 0m31.091s
user 0m31.090s
sys 0m0.004s

time ./Logarithmic_Work_In_Ada_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math
6.34086408606382E+08

real 0m31.094s
user 0m31.094s
sys 0m0.004s


gnatmake -O3 Logarithmic_Work_In_Ada_with_a_Findlay_loop.adb -o Logarithmic_Work_In_Ada_with_a_Findlay_loop_compiled_by_GCC4.4.3

6.34086408606382E+08

real 0m31.388s
user 0m31.378s
sys 0m0.008s


g++ -O3 -ffast-math logarithmic_work_in_CPlusPlus_with_a_Findlay_loop.cc -o logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math

time ./logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math
6.34086e+08
real 0m38.388s
user 0m38.390s
sys 0m0.000s

time ./logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math
6.34086e+08
real 0m38.547s
user 0m38.546s
sys 0m0.000s


g++ -O3 logarithmic_work_in_CPlusPlus_with_a_Findlay_loop.cc -o logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3

time ./logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3
6.34086e+08
real 0m38.428s
user 0m38.426s
sys 0m0.004s






with Ada.Numerics.Generic_Elementary_Functions;
with Interfaces.C;
with Ada.Text_IO;
procedure Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism is
Log_Base_10_Of_The_Base_Of_The_Natural_Logarithm : constant Interfaces.C.Double := 0.434_294_481_903_251_827_651_129;

answer : Interfaces.C.double := 0.0;
package double_library is new Ada.Numerics.Generic_Elementary_Functions(Interfaces.C.double);
package double_output_library is new Ada.Text_IO.Float_IO(Interfaces.C.double);
begin

for I in 1 .. 1_000_000 loop
for J in 1 .. 500 loop
answer := Interfaces.C."+"
(
answer, Interfaces.C."*"
(
double_library.Log
(
Interfaces.C."*"
(
Interfaces.C.double(J),
0.100000000000000000000
)
)
,
Log_Base_10_Of_The_Base_Of_The_Natural_Logarithm
)
);
end loop;
end loop;

double_output_library.Put(answer);
end;



gnatmake -O3 -ffast-math Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3_with_-ffast-math.adb -o Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3_with_-ffast-math

time ./Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3_with_-ffast-math
6.34086408606382E+08

real 0m14.434s
user 0m14.433s
sys 0m0.004s
-bash bluewhite64 /home/Colin_Paul/logarithms $


-bash bluewhite64 /home/Colin_Paul/logarithms $
gnatmake -O3 Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism.adb -o Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3

time ./Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3
6.34086408606382E+08

real 0m14.450s
user 0m14.453s
sys 0m0.000s
From: Jeffrey R. Carter on
Colin Paul Gloster wrote:
>
> |---------------------------------------------------------------------------------|
> |"Note that suppressing runtime checks (-gnatp) is needed to be sort of equivalent|
> |to C++." |
> |---------------------------------------------------------------------------------|
>
> Thanks for the tip, but I do not program in Ada to really program in
> C++ with Ada syntax.

I would hope not. But when comparing execution times between Ada and a language
like C++, it's important not to try to compare apples to lugnuts.

--
Jeff Carter
"I don't know why I ever come in here. The
flies get the best of everything."
Never Give a Sucker an Even Break
102
From: Colin Paul Gloster on
On Tue, 16 Feb 2010, Jeffrey R. Carter sent:

|------------------------------------------------------------------|
|"[..] |
| |
|[..] when comparing execution times between Ada and a language |
|like C++, it's important not to try to compare apples to lugnuts."|
|------------------------------------------------------------------|

Fair enough, but when I say Ada is better than C++ I am not comparing
an apple with an apple.

Anyway, as I mentioned in
news:alpine.LNX.2.00.1002161654110.21651(a)Bluewhite64.example.net
in response to Bill Findlay, G++ has produced much slower code than
GNAT (the GNATism is in standard Ada, merely the obvious way to do it
in standard Ada is different)...

gnatmake -O3 -ffast-math Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3_with_-ffast-math.adb -o Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3_with_-ffast-math

time ./Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3_with_-ffast-math
6.34086408606382E+08

real 0m14.434s
user 0m14.433s
sys 0m0.004s


g++ -O3 -ffast-math logarithmic_work_in_CPlusPlus_with_a_Findlay_loop.cc -o logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math

time ./logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math
6.34086e+08
real 0m38.388s
user 0m38.390s
sys 0m0.000s
From: Colin Paul Gloster on
On Tue, 16 Feb 2010, Colin Paul Gloster alleged:

|------------------------------------------------------------------------------------------------------------------------|
|"[..] |
| |
|with Ada.Numerics.Generic_Elementary_Functions; |
|with Interfaces.C; |
|with Ada.Text_IO; |
|procedure Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism is |
| Log_Base_10_Of_The_Base_Of_The_Natural_Logarithm : constant |
|Interfaces.C.Double := 0.434_294_481_903_251_827_651_129; |
| |
| answer : Interfaces.C.double := 0.0; |
| package double_library is new |
|Ada.Numerics.Generic_Elementary_Functions(Interfaces.C.double); |
| package double_output_library is new |
|Ada.Text_IO.Float_IO(Interfaces.C.double); |
|begin |
| |
| for I in 1 .. 1_000_000 loop |
| for J in 1 .. 500 loop |
| answer := Interfaces.C."+" |
| ( |
| answer, Interfaces.C."*" |
| ( |
| double_library.Log |
| ( |
| Interfaces.C."*" |
| ( |
| Interfaces.C.double(J),|
| 0.100000000000000000000|
| ) |
| ) |
| , |
| Log_Base_10_Of_The_Base_Of_The_Natural_Logarithm |
| ) |
| ); |
| end loop; |
| end loop; |
| |
| double_output_library.Put(answer); |
|end; |
| |
|[..]" |
|------------------------------------------------------------------------------------------------------------------------|

Actually this is not a GNATism. I have noticed that
"*"(Left=>variable,
Right=>Log_Base_10_Of_The_Base_Of_The_Natural_Logarithm) is faster
than log(X=>variable, Base=>10.0) on a number of other compilers.