From: Georg Bauhaus on 4 Sep 2009 05:27 Olivier Scalbert schrieb: > $ ./knucleotide.gnat_run < fasta/fasta25m.dat > $ gprof -b ./knucleotide.gnat_run > > Flat profile: > > Each sample counts as 0.01 seconds. > % cumulative self self total > time seconds seconds calls s/call s/call name > 100.00 114.75 114.75 1 114.75 114.75 _ada_knucleotide > 0.00 114.75 0.00 1 0.00 0.00 adainit > > > Call graph > > > granularity: each sample hit covers 4 byte(s) for 0.01% of 114.75 seconds > > index % time self children called name > <spontaneous> > [1] 100.0 0.00 114.75 main [1] > 114.75 0.00 1/1 _ada_knucleotide [2] > 0.00 0.00 1/1 adainit [3] > ----------------------------------------------- > 460000417 _ada_knucleotide [2] > 114.75 0.00 1/1 main [1] > [2] 100.0 114.75 0.00 1+460000417 _ada_knucleotide [2] > 460000417 _ada_knucleotide [2] > ----------------------------------------------- > 0.00 0.00 1/1 main [1] > [3] 0.0 0.00 0.00 1 adainit [3] > ----------------------------------------------- > > > Index by function name > > [2] _ada_knucleotide [3] adainit > > Olivier This looks as though gprof's option -a might have been given (somewhere). Does /full/path/to/normal/gprof -b ...give the same result?
From: Olivier Scalbert on 4 Sep 2009 06:17 Ludovic Brenta wrote: > I cannot help more with gprof as I've never used it before but it > seems to me that profiling the unoptimized program is pointless. It is > much better to profile the fully optimized and inlined program; for > this I still recommend valgrind because it gives accurate measurements > for every instruction. > > -- > Ludovic Brenta. Ok, let's switch to valgrind ! First, let's check memory leaks: $ gnatmake -gnatNp -O3 -g -fomit-frame-pointer -march=native -f knucleotide.adb -o knucleotide.gnat_run $ valgrind --max-stackframe=20000000 --leak-check=full --show-reachable=yes ./knucleotide.gnat_run < fasta/fasta25m.dat ==13802== ==13802== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 21 from 1) ==13802== malloc/free: in use at exit: 12,512,380 bytes in 3 blocks. ==13802== malloc/free: 145 allocs, 142 frees, 396,773,180 bytes allocated. ==13802== For counts of detected errors, rerun with: -v ==13802== searching for pointers to 3 not-freed blocks. ==13802== checked 12,680,612 bytes. ==13802== ==13802== 12,512,380 bytes in 3 blocks are still reachable in loss record 1 of 1 ==13802== at 0x4026FDE: malloc (vg_replace_malloc.c:207) ==13802== by 0x42C1664: __gnat_malloc (in /usr/lib/libgnat-4.3.so.1) ==13802== by 0x406993E: system__task_primitives__operations__new_atcb (in /usr/lib/libgnarl-4.3.so.1) ==13802== by 0x406C7AD: system__tasking__initialize (in /usr/lib/libgnarl-4.3.so.1) ==13802== by 0x406C186: system__tasking__initialization__init_rts (in /usr/lib/libgnarl-4.3.so.1) ==13802== by 0x406C2A6: system__tasking__initialization___elabb (in /usr/lib/libgnarl-4.3.so.1) ==13802== by 0x804AFA0: adainit (b~knucleotide.adb:162) ==13802== by 0x804AD50: ??? (start.S:119) ==13802== ==13802== LEAK SUMMARY: ==13802== definitely lost: 0 bytes in 0 blocks. ==13802== possibly lost: 0 bytes in 0 blocks. ==13802== still reachable: 12,512,380 bytes in 3 blocks. ==13802== suppressed: 0 bytes in 0 blocks. Mmmm, 3 blocks are still reachable. 3 workers ? Second, let's profile: $ valgrind --max-stackframe=20000000 --leak-check=yes ../knucleotide.gnat_run < fasta/fasta25m.dat 25 minutes later ... callgrind_annotate --inclusive=yes --tree=both --auto=yes callgrind.out.14048 > callgrind_out_14048.txt As the file is quit big, you can find it there: http://scalbert.dyndns.org/ada/knucleotide/callgrind_out_14048.txt As it is the first time I use this tool, do not hesitate to ask me a new run with other options ! Enjoy, Olivier
From: Ludovic Brenta on 4 Sep 2009 08:37 On Sep 4, 12:17 pm, Olivier Scalbert <olivier.scalb...(a)algosyn.com> wrote: > Mmmm, 3 blocks are still reachable. 3 workers ? Yes, these are the task control blocks. Perhaps there is a way to teach valgrind to ignore those as normal. > Second, let's profile: > $ valgrind --max-stackframe=20000000 --leak-check=yes > ./knucleotide.gnat_run < fasta/fasta25m.dat > > 25 minutes later ... > callgrind_annotate --inclusive=yes --tree=both --auto=yes > callgrind.out.14048 > callgrind_out_14048.txt OK, now try "kcachegrind callgrind.out.14048" and you can navigate graphically in the call graph and see where the time is spent. -- Ludovic Brenta.
From: Ludovic Brenta on 4 Sep 2009 08:50 Olivier Scalbert wrote on comp.lang.ada: > http://scalbert.dyndns.org/ada/knucleotide/callgrind_out_14048.txt Let's see: 43,644,857,598 PROGRAM TOTALS [...] . ---------------------------------------------------------------------------- . -- . -- Calculate and write data - either a percentage for all fragments found or - when . -- Nucleotide_Fragment is given - the count for that fragment. . -- 262,499,981 procedure Write 8,251,549,172 => ???:system__secondary_stack__ss_mark (7x) . (Nucleotide_Length : in Frequencies; . Nucleotide_Fragment : in Fragment := Fragments.Null_Bounded_String) . is Apparently, passing unconstrained strings to procedure Write involves allocations on the secondary stack which account for 20% of the entire execution time. That's hot spot #1. I wonder whether it is feasible to pass access values to the strings. -- Ludovic Brenta.
From: Olivier Scalbert on 4 Sep 2009 09:50
Ludovic Brenta wrote: > On Sep 4, 12:17 pm, Olivier Scalbert <olivier.scalb...(a)algosyn.com> > wrote: >> Mmmm, 3 blocks are still reachable. 3 workers ? > > Yes, these are the task control blocks. Perhaps there is a way to > teach valgrind to ignore those as normal. > >> Second, let's profile: >> $ valgrind --max-stackframe=20000000 --leak-check=yes >> ./knucleotide.gnat_run < fasta/fasta25m.dat >> >> 25 minutes later ... >> callgrind_annotate --inclusive=yes --tree=both --auto=yes >> callgrind.out.14048 > callgrind_out_14048.txt > > OK, now try "kcachegrind callgrind.out.14048" and you can navigate > graphically in the call graph and see where the time is spent. > > -- > Ludovic Brenta. What an impressive program ! Should I rerun with --dump-instr=yes and --trace-jump=yes ? (I run it on a 32 bits linux) Olivier |