Optimize integer program in speed [C++]

Prev: Review of a safer memory management approach for C++?
Next: Static variable in template function

From: albert kao on 5 Jun 2010 05:50

I want to optimize the following small integer program in speed.
My g++ compiler version is 4.3.4 on 32 bit linux.
Please suggest or comment the following ideas.
Some ideas are:
1. use compile flag such as -O3
2. rewrite the bigger function with function object

#include <algorithm>
#include <cstdlib> // for abs()
using namespace std;

const int N = 50;
const int D = 20;
bool bigger (int i) { return abs(i) >= D; }

int main()
{
// initialize seq
// ...
adjacent_difference(seq, seq + N, diff);
int count = count_if(diff + 1, diff + N, bigger);
// ...
return 0;
}

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Saeed Amrollahi on 6 Jun 2010 07:15

On Jun 5, 11:50 pm, albert kao <albertk...(a)gmail.com> wrote:
> I want to optimize the following small integer program in speed.
> My g++ compiler version is 4.3.4 on 32 bit linux.
> Please suggest or comment the following ideas.
> Some ideas are:
> 1. use compile flag such as -O3
> 2. rewrite the bigger function with function object
>
> #include <algorithm>
> #include <cstdlib> // for abs()
> using namespace std;
>
> const int N = 50;
> const int D = 20;
> bool bigger (int i) { return abs(i) >= D; }
>
> int main()
> {
> // initialize seq
> // ...
> adjacent_difference(seq, seq + N, diff);
> int count = count_if(diff + 1, diff + N, bigger);
> // ...
> return 0;
>
> }

{ quoted banner removed; please do it yourself. -mod }

Hi Albert
1. I compiled and ran the following program with O, O1, O2 and O3
optimization options:

#include <algorithm>
#include <cstdlib>
#include <numeric> // for adjacent_difference
#include <ctime>
#include <iostream>

using namespace std;

const int N = 50 * 10000;
const int D = 20;
bool bigger(int i) { return abs(i) >= D; }
int main()
{
clock_t t = clock();
// initialize sequence
int seq[N];
for (int i = 0; i < N; ++i)
seq[i] = rand();
int diff[N];
adjacent_difference(seq, seq + N, diff);
int count = count_if(diff + 1, diff + N, bigger);
cout << '\n' << count << '\n';
t = clock() - t;
cout << t << '\n';
return 0;
}

You have to include <numeric>, also the N should be
a large number like 50,000, for optimization make sense.
As you see, I use the clock_t and time difference.
Under gcc 4.4.0, the results for -O and -O1 is same and the results
for
-O2 and -O3 is equal. The -O2 and -O3 reduce the program execution
time
about %20 percents, so I think -O3 is the best option
for optimizing your program.
2. the adjacent_difference and count_if algorithms aren't
complex per se. The complexity of count_if is constant: last - first
applications of predicate (in this case bigger() call). Also the
complexity of adjacent_difference is constant: last -first + 1
applications of binary function.
3. There is no performance gain by rewriting bigger() with function
object. Calling member function is almost as efficient as
calling global function.

Regards,
-- Saeed Amrollahi

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Keith H Duggar on 6 Jun 2010 17:14

On Jun 6, 6:15 pm, Saeed Amrollahi <amrollahi.sa...(a)gmail.com> wrote:
> On Jun 5, 11:50 pm, albert kao <albertk...(a)gmail.com> wrote:
> 3. There is no performance gain by rewriting bigger() with function
> object. Calling member function is almost as efficient as
> calling global function.

That is wrong. For most implementations of the STL you will find
that function objects can be /faster/ than passing free function
references/pointers. In the ops case

struct Bigger
{
bool operator ( ) ( int i ) const { return abs(i) >= D ; }
} ;
...
count += count_if(diff + 1, diff + N, Bigger());

is likely to be much faster than

bool bigger(int i) { return abs(i) >= D; }
...
count += count_if(diff + 1, diff + N, bigger);

Also, on many platforms

struct Bigger
{
bool operator ( ) ( int i ) const { return i >= D || i <= -
D ; }
} ;

will be faster still. Give it a try and you will see. By the way,
if you put the count_if into a loop you will get better sampling
and higher signal to noise. For example:

int count = 0 ;
for ( int k = 0 ; k < 1024 ; ++k ) {
count += count_if(diff + 1, diff + N, Bigger()) ;
}

KHD

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Ulrich Eckhardt on 7 Jun 2010 00:02

Note up front: This has _very_ little to do with C++, as such things always
depend on the actual implementation etc.

albert kao wrote:
> I want to optimize the following small integer program in speed.
[...]
> // initialize seq
> // ...
> adjacent_difference(seq, seq + N, diff);
> int count = count_if(diff + 1, diff + N, bigger);

You are computing a sequence of values from another sequence and then count
the number of values in the generated sequence fitting certain criteria. If
the sequences are large, a significant amount of time will be used by the
CPU/MMU to transfer the values between RAM and the various cache levels. If
you generate and count in one loop, the pressure on the memory IO system
will be lower. In particular you then don't generate data, push it to RAM
only to later load it again from RAM.

Uli

--
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Saeed Amrollahi on 7 Jun 2010 00:09

On Jun 7, 11:14 am, Keith H Duggar <dug...(a)alum.mit.edu> wrote:
> On Jun 6, 6:15 pm, Saeed Amrollahi <amrollahi.sa...(a)gmail.com> wrote:
>
> > On Jun 5, 11:50 pm, albert kao <albertk...(a)gmail.com> wrote:
> > 3. There is no performance gain by rewriting bigger() with function
> > object. Calling member function is almost as efficient as
> > calling global function.
>
> That is wrong. For most implementations of the STL you will find
> that function objects can be /faster/ than passing free function
> references/pointers. In the ops case
>
> struct Bigger
> {
> bool operator ( ) ( int i ) const { return abs(i) >= D ; }
> } ;
> ...
> count += count_if(diff + 1, diff + N, Bigger());
>
> is likely to be much faster than
>
> bool bigger(int i) { return abs(i) >= D; }
> ...
> count += count_if(diff + 1, diff + N, bigger);
>
> Also, on many platforms
>
> struct Bigger
> {
> bool operator ( ) ( int i ) const { return i >= D || i <= -
> D ; }
> } ;
>
> will be faster still. Give it a try and you will see. By the way,
> if you put the count_if into a loop you will get better sampling
> and higher signal to noise. For example:
>
> int count = 0 ;
> for ( int k = 0 ; k < 1024 ; ++k ) {
> count += count_if(diff + 1, diff + N, Bigger()) ;
> }
>
> KHD

Hi Keith

You are right. Of course we take advantage of inline expansion of
operator(). In this case we have %25 performance improvement.
For non-inline defintion (I mean defining function outside
structure), the results are identical.
The interesting question is:
In case #1, when I declare bigger() with inline specifier, there is
no performance improvement. Why?

Regards,
-- Saeed Amrollahi

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

| Next | Last
Pages: 1 2 3
Prev: Review of a safer memory management approach for C++?
Next: Static variable in template function