From: C++ Newbie on
Hello,

In another exercise I was trying, I am trying to write an efficient
sorting algorithm.

Here's mine, with a random number generator at the end to make
testfiles. How do I improve the efficiency of the sorting process?
I've read about the various techniques but without statistical
analysis it is not immediately clear which is best or if there is
indeed a better way than the known best.

Example: Number of operations with a random seed of 0 and a 100,000
line file: 704982704 (assuming my definition of operations is the same
as people who claim nlog(n) - n^2 operations)

Thanks for any input.

// This program sorts a vector of numbers in ascending order
#include <fstream>
#include <stdlib.h>
#include <string>
#include <iostream>
#include <math.h>

using namespace std;

int main()
{
string line;
int i=0, count = 0, temp_int2;

fstream myfile("vector_in.txt");
// Read in number of lines
while (!myfile.eof())
{
getline(myfile,line);
count++;
}

count = count - 1;
myfile.clear(); // forget we hit the end of file
myfile.seekg(0, ios::beg); // move to the start of the file
cout << "Number of file lines = "<< count << "\n";
int vector_in[count];

// Read in data
while (i < count)
{
getline(myfile,line);
vector_in[i] = atoi(line.c_str());
i++;
}

// Swapping sort

int temp_int = 0, operations = 0;

for (int j = 0; j < count; j++)
{
for (i = j+1; i < count; i++)
{
operations++;
if (vector_in[i] < vector_in[j])
{
temp_int = vector_in[i];
vector_in[i] = vector_in[j];
vector_in[j] = temp_int;
}
}
}

for (i = 0; i < count; i++)
cout << vector_in[i] << "\n";
cout << "Number of operations = " << operations << "\n";

return 0;
}

Random number generator:
===================

// This program sorts a vector of numbers in ascending order
#include <fstream>
#include <stdlib.h>
#include <string>
#include <iostream>
#include <math.h>

using namespace std;

int main()
{
string line;
int i=0, count = 0, temp_int2, seed, length;

// Randomly generate file?
fstream myfile("vector_in.txt", ios::out);
cout << "Random seed value less than " << RAND_MAX << "\n";
cin >> seed;
cout << "Length of testfile?" <<"\n";
cin >> length;
// system ("rm vector_in.txt");
{
for (i = 0; i < length; i++)
{
temp_int2 = rand();
myfile << temp_int2 << "\n";
}
myfile<<std::flush;
myfile.close();
}

cout << "File generated.\n";
return 0;
}




--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Thomas Maeder on
"C++ Newbie" <newbie.cpp(a)googlemail.com> writes:

> Here's mine, with a random number generator at the end to make
> testfiles. How do I improve the efficiency of the sorting process?
> I've read about the various techniques but without statistical
> analysis it is not immediately clear which is best or if there is
> indeed a better way than the known best.
>
> Example: Number of operations with a random seed of 0 and a 100,000
> line file: 704982704 (assuming my definition of operations is the same
> as people who claim nlog(n) - n^2 operations)
>
> Thanks for any input.
>
> // This program sorts a vector of numbers in ascending order
> #include <fstream>
> #include <stdlib.h>
> #include <string>
> #include <iostream>
> #include <math.h>
>
> using namespace std;
>
> int main()
> {
> string line;
> int i=0, count = 0, temp_int2;
>
> fstream myfile("vector_in.txt");
> // Read in number of lines
> while (!myfile.eof())
> {
> getline(myfile,line);
> count++;
> }
>
> count = count - 1;
> myfile.clear(); // forget we hit the end of file
> myfile.seekg(0, ios::beg); // move to the start of the file
> cout << "Number of file lines = "<< count << "\n";
> int vector_in[count];
>
> // Read in data
> while (i < count)
> {
> getline(myfile,line);
> vector_in[i] = atoi(line.c_str());
> i++;
> }

First, this is an inefficient way of determining the number of lines,
and it may also be off by one (if the file doesn't end with an end of
line character (sequence)) or result in an eternal loop (if input
fails for a reason other than end-of-file).

Second, atoi() has no way of reporting conversion failure; better use
the function std::strtol() or std::istream.

E.g.

std::vector<long> numbers;
while (getline(myfile,line))
{
std::istringstream is(line);
long number;
if (is >> number)
numbers.push_back(number);
else
; // deal with the erroneous line
}

or, somewhat more compact,

std::ifstream myfile("testfile");
std::istream_iterator<int> ii(myfile);
std::vector<int> numbers(ii,std::istream_iterator<int>());
if (myfile.eof())
; // entire file consumed without error; proceed
else
; // deal with error



> // Swapping sort
> int temp_int = 0, operations = 0;
>
> for (int j = 0; j < count; j++)
> {
> for (i = j+1; i < count; i++)
> {
> operations++;
> if (vector_in[i] < vector_in[j])
> {
> temp_int = vector_in[i];
> vector_in[i] = vector_in[j];
> vector_in[j] = temp_int;
> }
> }
> }

This algorithm is named "bubble sort". It's very easily implemented,
and very inefficient in general.

To find more efficient algorithms, get your hands on a good algorithms
book, or search the internet for e.g. "quicksort", "heap sort" or
"merge sort" (http://en.wikipedia.org/ has nice articles on all these
algorithms (including bubble sort)).

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Martin York on

> for (int j = 0; j < count; j++)
> {
> for (i = j+1; i < count; i++)
> {
> operations++;
> if (vector_in[i] < vector_in[j])
> {
> temp_int = vector_in[i];
> vector_in[i] = vector_in[j];
> vector_in[j] = temp_int;
> }
> }
> }

AKA: Bubble sort.
Sorting algorithms are measured in term of their complexity:
Bubble sort has a complexity of O(n^2)

In bubble sort, if you had an array of n elements, th sort requires
two nested loops that both basically iterate over the whole length of
the array (though you have tried to optimizeing this by limiting the
inner loop this does not affect the complexity) [PS your optimization
fails].

There are multiple sorts available (do a google on sorting
algorithms), The most optimal sort is called a bucket sort. Under
optimal conditions has a complexity of O(n).

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Markus Moll on
Hi

Martin York wrote:

> There are multiple sorts available (do a google on sorting
> algorithms), The most optimal sort is called a bucket sort. Under
> optimal conditions has a complexity of O(n).

However, I think that one should add that O(n) can only be achieved because
bucket sort imposes restrictions on the input and that in general the lower
bound is Omega(n log n).

Markus


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]