From: Georg Bauhaus on
Olivier Scalbert schrieb:

> One question:
> I have tried to profile knucleotide, so I have recompile it after
> removing all the inlines and optimization:

Thanks for doing so. Ludovic has mentioned -gnatN.

If all goes well I will put a new multitasking
version at the given address later this evening.
This will include all new patches we have collected
so far.

In order for gprof to show something that makes sense
(to me, at least), it seems a good idea to use an alphabetically
named Fragments."=" function.

package Fragments ...

function Equals (Left, Right: Fragment) return Boolean is ...
function "=" (Left, Right: Fragment) return Boolean
renames Equals;

Then, use Equals in the actual parameter list for the table
generic. I have used the single task program
the Shootout collection.

$ gnatmake -gnato -march=native -g -pg \
-f knucleotide.adb -o knucleotide.gnat_run

Maybe without -gnato, and gradually increasing optimization.
From: sjw on
On Sep 3, 4:13 pm, Olivier Scalbert <olivier.scalb...(a)algosyn.com>
wrote:
> Ludovic Brenta wrote:
> > You enabled front-end inlining with -gnatN; GNAT turned the whole
> > program into one procedure. Even then, it should be possible to run
> > the program under valgrind's callgrind tool to get accurate, per-line
> > (indeed per-instruction) execution costs. I don't know whether gprof
> > has such granularity or not.
...
> Anyway with:
> $ gnatmake -pg -f knucleotide.adb -o knucleotide.gnat_run

You need -ftest-coverage -fprofile-arcs for gprof to give per-line
coverage. Though not sure what it will do with really massive
inlining! (our experience on PowerPC has been that inlining usually
makes things worse - but that's for a large program with much logic,
little maths)
From: Georg Bauhaus on
Georg Bauhaus wrote:

> If all goes well I will put a new multitasking
> version at the given address later this evening.

It went almost well. Here is a new single tasking
version which incorporates many if not all of the
latest patches, including a new Line_IO.

Interesting to play with, for example, is

Bytes_Per_Word : constant := ? ;

in KNucleotide's String_Fragments, where ? = 4 or ? = 8.
Seems to make a difference in some environments.

To see for yourself you would need these sources:
http://home.arcor.de/bauhaus/Ada/knucleotide.single.gnat
http://home.arcor.de/bauhaus/Ada/line_io.ada

(The final Line_IO for K-Nucleotide will be with a null
Write implementation only as Write isn't used. Should
be a few lines shorter, then.)
From: Olivier Scalbert on
sjw wrote:
> On Sep 3, 4:13 pm, Olivier Scalbert <olivier.scalb...(a)algosyn.com>
> wrote:
>> Ludovic Brenta wrote:
>>> You enabled front-end inlining with -gnatN; GNAT turned the whole
>>> program into one procedure. Even then, it should be possible to run
>>> the program under valgrind's callgrind tool to get accurate, per-line
>>> (indeed per-instruction) execution costs. I don't know whether gprof
>>> has such granularity or not.
> ..
>> Anyway with:
>> $ gnatmake -pg -f knucleotide.adb -o knucleotide.gnat_run
>
> You need -ftest-coverage -fprofile-arcs for gprof to give per-line
> coverage. Though not sure what it will do with really massive
> inlining! (our experience on PowerPC has been that inlining usually
> makes things worse - but that's for a large program with much logic,
> little maths)

Thanks Simon.
But same result !
$ gnat -version
GNAT 4.3.3
Copyright 1996-2007, Free Software Foundation, Inc.

$ gnatmake -pg -f -ftest-coverage -fprofile-arcs knucleotide.adb -o
knucleotide.gnat_run

$ ./knucleotide.gnat_run < fasta/fasta25m.dat
$ gprof -b ./knucleotide.gnat_run

Flat profile:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
100.00 114.75 114.75 1 114.75 114.75 _ada_knucleotide
0.00 114.75 0.00 1 0.00 0.00 adainit


Call graph


granularity: each sample hit covers 4 byte(s) for 0.01% of 114.75 seconds

index % time self children called name
<spontaneous>
[1] 100.0 0.00 114.75 main [1]
114.75 0.00 1/1 _ada_knucleotide [2]
0.00 0.00 1/1 adainit [3]
-----------------------------------------------
460000417 _ada_knucleotide [2]
114.75 0.00 1/1 main [1]
[2] 100.0 114.75 0.00 1+460000417 _ada_knucleotide [2]
460000417 _ada_knucleotide [2]
-----------------------------------------------
0.00 0.00 1/1 main [1]
[3] 0.0 0.00 0.00 1 adainit [3]
-----------------------------------------------


Index by function name

[2] _ada_knucleotide [3] adainit

Olivier
From: Ludovic Brenta on
I cannot help more with gprof as I've never used it before but it
seems to me that profiling the unoptimized program is pointless. It is
much better to profile the fully optimized and inlined program; for
this I still recommend valgrind because it gives accurate measurements
for every instruction.

--
Ludovic Brenta.