From: Georg Bauhaus on 3 Sep 2009 11:28 Olivier Scalbert schrieb: > One question: > I have tried to profile knucleotide, so I have recompile it after > removing all the inlines and optimization: Thanks for doing so. Ludovic has mentioned -gnatN. If all goes well I will put a new multitasking version at the given address later this evening. This will include all new patches we have collected so far. In order for gprof to show something that makes sense (to me, at least), it seems a good idea to use an alphabetically named Fragments."=" function. package Fragments ... function Equals (Left, Right: Fragment) return Boolean is ... function "=" (Left, Right: Fragment) return Boolean renames Equals; Then, use Equals in the actual parameter list for the table generic. I have used the single task program the Shootout collection. $ gnatmake -gnato -march=native -g -pg \ -f knucleotide.adb -o knucleotide.gnat_run Maybe without -gnato, and gradually increasing optimization.
From: sjw on 3 Sep 2009 15:11 On Sep 3, 4:13 pm, Olivier Scalbert <olivier.scalb...(a)algosyn.com> wrote: > Ludovic Brenta wrote: > > You enabled front-end inlining with -gnatN; GNAT turned the whole > > program into one procedure. Even then, it should be possible to run > > the program under valgrind's callgrind tool to get accurate, per-line > > (indeed per-instruction) execution costs. I don't know whether gprof > > has such granularity or not. ... > Anyway with: > $ gnatmake -pg -f knucleotide.adb -o knucleotide.gnat_run You need -ftest-coverage -fprofile-arcs for gprof to give per-line coverage. Though not sure what it will do with really massive inlining! (our experience on PowerPC has been that inlining usually makes things worse - but that's for a large program with much logic, little maths)
From: Georg Bauhaus on 3 Sep 2009 21:24 Georg Bauhaus wrote: > If all goes well I will put a new multitasking > version at the given address later this evening. It went almost well. Here is a new single tasking version which incorporates many if not all of the latest patches, including a new Line_IO. Interesting to play with, for example, is Bytes_Per_Word : constant := ? ; in KNucleotide's String_Fragments, where ? = 4 or ? = 8. Seems to make a difference in some environments. To see for yourself you would need these sources: http://home.arcor.de/bauhaus/Ada/knucleotide.single.gnat http://home.arcor.de/bauhaus/Ada/line_io.ada (The final Line_IO for K-Nucleotide will be with a null Write implementation only as Write isn't used. Should be a few lines shorter, then.)
From: Olivier Scalbert on 4 Sep 2009 02:11 sjw wrote: > On Sep 3, 4:13 pm, Olivier Scalbert <olivier.scalb...(a)algosyn.com> > wrote: >> Ludovic Brenta wrote: >>> You enabled front-end inlining with -gnatN; GNAT turned the whole >>> program into one procedure. Even then, it should be possible to run >>> the program under valgrind's callgrind tool to get accurate, per-line >>> (indeed per-instruction) execution costs. I don't know whether gprof >>> has such granularity or not. > .. >> Anyway with: >> $ gnatmake -pg -f knucleotide.adb -o knucleotide.gnat_run > > You need -ftest-coverage -fprofile-arcs for gprof to give per-line > coverage. Though not sure what it will do with really massive > inlining! (our experience on PowerPC has been that inlining usually > makes things worse - but that's for a large program with much logic, > little maths) Thanks Simon. But same result ! $ gnat -version GNAT 4.3.3 Copyright 1996-2007, Free Software Foundation, Inc. $ gnatmake -pg -f -ftest-coverage -fprofile-arcs knucleotide.adb -o knucleotide.gnat_run $ ./knucleotide.gnat_run < fasta/fasta25m.dat $ gprof -b ./knucleotide.gnat_run Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 100.00 114.75 114.75 1 114.75 114.75 _ada_knucleotide 0.00 114.75 0.00 1 0.00 0.00 adainit Call graph granularity: each sample hit covers 4 byte(s) for 0.01% of 114.75 seconds index % time self children called name <spontaneous> [1] 100.0 0.00 114.75 main [1] 114.75 0.00 1/1 _ada_knucleotide [2] 0.00 0.00 1/1 adainit [3] ----------------------------------------------- 460000417 _ada_knucleotide [2] 114.75 0.00 1/1 main [1] [2] 100.0 114.75 0.00 1+460000417 _ada_knucleotide [2] 460000417 _ada_knucleotide [2] ----------------------------------------------- 0.00 0.00 1/1 main [1] [3] 0.0 0.00 0.00 1 adainit [3] ----------------------------------------------- Index by function name [2] _ada_knucleotide [3] adainit Olivier
From: Ludovic Brenta on 4 Sep 2009 04:18
I cannot help more with gprof as I've never used it before but it seems to me that profiling the unoptimized program is pointless. It is much better to profile the fully optimized and inlined program; for this I still recommend valgrind because it gives accurate measurements for every instruction. -- Ludovic Brenta. |