From: Olivier Scalbert on
Ludovic Brenta wrote:
> Georg Bauhaus wrote on comp.lang.ada:
>> Georg Bauhaus schrieb:
>>
>>> Ludovic Brenta schrieb:
>>>> Apparently, passing unconstrained strings to procedure Write involves
>>>> allocations on the secondary stack which account for 20% of the entire
>>>> execution time. That's hot spot #1.
>>> Indeed, and this particular hot spot had been cooled down twice:
>>> Step 1 - we replaced Bounded_String with our own Bounded_String
>>> Step 2 - we replaced this new Bounded_String with plain
>>> constrained strings of suitable fixed length (using generics)
>> I should add that the current program spends much of its time
>> in equality comparison of fragment strings,
>> and then some in the hash function.
>> So not only are the bounded_strings gone;
>> Jonathan has also contributed a highly efficient hashing
>> function and a cute string equality function.
>>
>> (As mentioned, to actually see the effects (of the current
>> program), String_Fragments."=" should be a renaming of Equals.
>> Operator subprograms seem to confuse the profiling programs,
>> or am I missing some setting?)
>
> So I gather that Olivier was profiling an old version of the program.
> Correct?
>
> --
> Ludovic Brenta.

Ooops, sorry for that ...

Today I can provide profile for the last version on:
- 32 bits - Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz - gcc
version 4.3.3 (Ubuntu 4.3.3-5ubuntu4)
- 64 bits - AMD Athlon(tm) 64 Processor 3000+ - gcc version 4.3.4
(Debian 4.3.4-1)

Can it help ?

Olivier
From: Georg Bauhaus on
Olivier Scalbert schrieb:

> Today I can provide profile for the last version on:
> - 32 bits - Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz - gcc
> version 4.3.3 (Ubuntu 4.3.3-5ubuntu4)
> - 64 bits - AMD Athlon(tm) 64 Processor 3000+ - gcc version 4.3.4
> (Debian 4.3.4-1)
>
> Can it help ?

Yes, as we have few measurements of what happens on 4core and 1core
CPUs, and for GCC 4.3.3.

If you like, arrange the task starts in different order:
placing Work_On_1.Writer.Set (1) first seems to be a must.
The following two (12, 18) have run longest.
From: Olivier Scalbert on
Georg Bauhaus wrote:
> Olivier Scalbert schrieb:
>
>> Today I can provide profile for the last version on:
>> - 32 bits - Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz - gcc
>> version 4.3.3 (Ubuntu 4.3.3-5ubuntu4)
>> - 64 bits - AMD Athlon(tm) 64 Processor 3000+ - gcc version 4.3.4
>> (Debian 4.3.4-1)
>>
>> Can it help ?
>
> Yes, as we have few measurements of what happens on 4core and 1core
> CPUs, and for GCC 4.3.3.
>
> If you like, arrange the task starts in different order:
> placing Work_On_1.Writer.Set (1) first seems to be a must.
> The following two (12, 18) have run longest.

Here are the results !

On 32 bits - Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz - gcc
version 4.3.3 (Ubuntu 4.3.3-5ubuntu4)
---------------------------------------
$ gnatmake -f -g -gnatnp -O3 knucleotide.adb -o knucleotide.gnat_run
$ time ./knucleotide.gnat_run < fasta/fasta250m.dat
real 0m14.607s
user 0m32.798s
sys 0m0.568s

$ valgrind --tool=callgrind --dump-instr=yes --trace-jump=yes
../knucleotide.gnat_run < fasta/fasta25m.dat

see:http://scalbert.dyndns.org/ada/knucleotide/callgrind.out.32bits.1

--

On 64 bits - AMD Athlon(tm) 64 Processor 3000+ - gcc version 4.3.4
(Debian 4.3.4-1) �
$ gnatmake -f -g -gnatnp -O3 knucleotide.adb -o knucleotide.gnat_run
$ time ./knucleotide.gnat_run < fasta/fasta250m.dat
real 1m10.190s
user 1m9.252s
sys 0m0.724s
$ valgrind --tool=callgrind --dump-instr=yes --trace-jump=yes
../knucleotide.gnat_run < fasta/fasta25m.dat

see:http://scalbert.dyndns.org/ada/knucleotide/callgrind.out.64bits.1

Olivier
From: jonathan on
On Sep 7, 2:31 pm, Olivier Scalbert <olivier.scalb...(a)algosyn.com>
wrote:

> On 64 bits - AMD Athlon(tm) 64 Processor 3000+ - gcc version 4.3.4
> (Debian 4.3.4-1) µ
> $ gnatmake -f -g -gnatnp -O3 knucleotide.adb -o knucleotide.gnat_run
> $ time ./knucleotide.gnat_run < fasta/fasta250m.dat
> real    1m10.190s
> user    1m9.252s
> sys     0m0.724s
> $ valgrind --tool=callgrind --dump-instr=yes --trace-jump=yes
> ./knucleotide.gnat_run < fasta/fasta25m.dat
>
> see:http://scalbert.dyndns.org/ada/knucleotide/callgrind.out.64bits.1
>
> Olivier

This last test is worrisome. Are you sharing the machine with other
processes? Here is what I get when I have 8-cores to myself
(using GNAT 4.3.4 (GPL2009)):

time ./knucleotide.gnat_run < /tmp/fasta250.dat

real 0m6.647s
user 0m17.273s
sys 0m0.448s

and here is what I get when I share with another (heavy)
user of the machine:

time ./knucleotide.gnat_run < /tmp/fasta250.dat

real 0m25.475s
user 0m24.766s
sys 0m0.708s


Jonathan
From: Olivier Scalbert on
jonathan wrote:

> This last test is worrisome. Are you sharing the machine with other
> processes? Here is what I get when I have 8-cores to myself
> (using GNAT 4.3.4 (GPL2009)):
>
> time ./knucleotide.gnat_run < /tmp/fasta250.dat
>
> real 0m6.647s
> user 0m17.273s
> sys 0m0.448s
>
> and here is what I get when I share with another (heavy)
> user of the machine:
>
> time ./knucleotide.gnat_run < /tmp/fasta250.dat
>
> real 0m25.475s
> user 0m24.766s
> sys 0m0.708s
>
>
> Jonathan

My 64 bits machine is an "old" single core Athlon (512KB cache size). So
perhaps it is normal !

Olivier