From: Georg Bauhaus on
Olivier Scalbert schrieb:

> $ ./knucleotide.gnat_run < fasta/fasta25m.dat
> $ gprof -b ./knucleotide.gnat_run
>
> Flat profile:
>
> Each sample counts as 0.01 seconds.
> % cumulative self self total
> time seconds seconds calls s/call s/call name
> 100.00 114.75 114.75 1 114.75 114.75 _ada_knucleotide
> 0.00 114.75 0.00 1 0.00 0.00 adainit
>
>
> Call graph
>
>
> granularity: each sample hit covers 4 byte(s) for 0.01% of 114.75 seconds
>
> index % time self children called name
> <spontaneous>
> [1] 100.0 0.00 114.75 main [1]
> 114.75 0.00 1/1 _ada_knucleotide [2]
> 0.00 0.00 1/1 adainit [3]
> -----------------------------------------------
> 460000417 _ada_knucleotide [2]
> 114.75 0.00 1/1 main [1]
> [2] 100.0 114.75 0.00 1+460000417 _ada_knucleotide [2]
> 460000417 _ada_knucleotide [2]
> -----------------------------------------------
> 0.00 0.00 1/1 main [1]
> [3] 0.0 0.00 0.00 1 adainit [3]
> -----------------------------------------------
>
>
> Index by function name
>
> [2] _ada_knucleotide [3] adainit
>
> Olivier


This looks as though gprof's option -a might have been given
(somewhere). Does /full/path/to/normal/gprof -b ...give the same
result?
From: Olivier Scalbert on
Ludovic Brenta wrote:
> I cannot help more with gprof as I've never used it before but it
> seems to me that profiling the unoptimized program is pointless. It is
> much better to profile the fully optimized and inlined program; for
> this I still recommend valgrind because it gives accurate measurements
> for every instruction.
>
> --
> Ludovic Brenta.

Ok, let's switch to valgrind !

First, let's check memory leaks:

$ gnatmake -gnatNp -O3 -g -fomit-frame-pointer -march=native -f
knucleotide.adb -o knucleotide.gnat_run

$ valgrind --max-stackframe=20000000 --leak-check=full
--show-reachable=yes ./knucleotide.gnat_run < fasta/fasta25m.dat
==13802==
==13802== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 21 from 1)
==13802== malloc/free: in use at exit: 12,512,380 bytes in 3 blocks.
==13802== malloc/free: 145 allocs, 142 frees, 396,773,180 bytes allocated.
==13802== For counts of detected errors, rerun with: -v
==13802== searching for pointers to 3 not-freed blocks.
==13802== checked 12,680,612 bytes.
==13802==
==13802== 12,512,380 bytes in 3 blocks are still reachable in loss
record 1 of 1
==13802== at 0x4026FDE: malloc (vg_replace_malloc.c:207)
==13802== by 0x42C1664: __gnat_malloc (in /usr/lib/libgnat-4.3.so.1)
==13802== by 0x406993E: system__task_primitives__operations__new_atcb
(in /usr/lib/libgnarl-4.3.so.1)
==13802== by 0x406C7AD: system__tasking__initialize (in
/usr/lib/libgnarl-4.3.so.1)
==13802== by 0x406C186: system__tasking__initialization__init_rts (in
/usr/lib/libgnarl-4.3.so.1)
==13802== by 0x406C2A6: system__tasking__initialization___elabb (in
/usr/lib/libgnarl-4.3.so.1)
==13802== by 0x804AFA0: adainit (b~knucleotide.adb:162)
==13802== by 0x804AD50: ??? (start.S:119)
==13802==
==13802== LEAK SUMMARY:
==13802== definitely lost: 0 bytes in 0 blocks.
==13802== possibly lost: 0 bytes in 0 blocks.
==13802== still reachable: 12,512,380 bytes in 3 blocks.
==13802== suppressed: 0 bytes in 0 blocks.

Mmmm, 3 blocks are still reachable. 3 workers ?

Second, let's profile:
$ valgrind --max-stackframe=20000000 --leak-check=yes
../knucleotide.gnat_run < fasta/fasta25m.dat

25 minutes later ...
callgrind_annotate --inclusive=yes --tree=both --auto=yes
callgrind.out.14048 > callgrind_out_14048.txt

As the file is quit big, you can find it there:
http://scalbert.dyndns.org/ada/knucleotide/callgrind_out_14048.txt

As it is the first time I use this tool, do not hesitate to ask me a new
run with other options !

Enjoy,

Olivier



From: Ludovic Brenta on
On Sep 4, 12:17 pm, Olivier Scalbert <olivier.scalb...(a)algosyn.com>
wrote:
> Mmmm, 3 blocks are still reachable. 3 workers ?

Yes, these are the task control blocks. Perhaps there is a way to
teach valgrind to ignore those as normal.

> Second, let's profile:
> $ valgrind --max-stackframe=20000000 --leak-check=yes
> ./knucleotide.gnat_run < fasta/fasta25m.dat
>
> 25 minutes later ...
> callgrind_annotate --inclusive=yes --tree=both --auto=yes
> callgrind.out.14048  > callgrind_out_14048.txt

OK, now try "kcachegrind callgrind.out.14048" and you can navigate
graphically in the call graph and see where the time is spent.

--
Ludovic Brenta.
From: Ludovic Brenta on
Olivier Scalbert wrote on comp.lang.ada:
> http://scalbert.dyndns.org/ada/knucleotide/callgrind_out_14048.txt

Let's see:

43,644,857,598 PROGRAM TOTALS
[...]

.
----------------------------------------------------------------------------
. --
. -- Calculate and write data - either a percentage
for all fragments found or - when
. -- Nucleotide_Fragment is given - the count for
that fragment.
. --
262,499,981 procedure Write
8,251,549,172 => ???:system__secondary_stack__ss_mark (7x)
. (Nucleotide_Length : in Frequencies;
. Nucleotide_Fragment : in Fragment :=
Fragments.Null_Bounded_String)
. is

Apparently, passing unconstrained strings to procedure Write involves
allocations on the secondary stack which account for 20% of the entire
execution time. That's hot spot #1. I wonder whether it is feasible to
pass access values to the strings.

--
Ludovic Brenta.
From: Olivier Scalbert on
Ludovic Brenta wrote:
> On Sep 4, 12:17 pm, Olivier Scalbert <olivier.scalb...(a)algosyn.com>
> wrote:
>> Mmmm, 3 blocks are still reachable. 3 workers ?
>
> Yes, these are the task control blocks. Perhaps there is a way to
> teach valgrind to ignore those as normal.
>
>> Second, let's profile:
>> $ valgrind --max-stackframe=20000000 --leak-check=yes
>> ./knucleotide.gnat_run < fasta/fasta25m.dat
>>
>> 25 minutes later ...
>> callgrind_annotate --inclusive=yes --tree=both --auto=yes
>> callgrind.out.14048 > callgrind_out_14048.txt
>
> OK, now try "kcachegrind callgrind.out.14048" and you can navigate
> graphically in the call graph and see where the time is spent.
>
> --
> Ludovic Brenta.

What an impressive program !
Should I rerun with --dump-instr=yes and --trace-jump=yes ?
(I run it on a 32 bits linux)

Olivier