|
From: C++ Newbie on 22 Apr 2008 06:21 Hello, In another exercise I was trying, I am trying to write an efficient sorting algorithm. Here's mine, with a random number generator at the end to make testfiles. How do I improve the efficiency of the sorting process? I've read about the various techniques but without statistical analysis it is not immediately clear which is best or if there is indeed a better way than the known best. Example: Number of operations with a random seed of 0 and a 100,000 line file: 704982704 (assuming my definition of operations is the same as people who claim nlog(n) - n^2 operations) Thanks for any input. // This program sorts a vector of numbers in ascending order #include <fstream> #include <stdlib.h> #include <string> #include <iostream> #include <math.h> using namespace std; int main() { string line; int i=0, count = 0, temp_int2; fstream myfile("vector_in.txt"); // Read in number of lines while (!myfile.eof()) { getline(myfile,line); count++; } count = count - 1; myfile.clear(); // forget we hit the end of file myfile.seekg(0, ios::beg); // move to the start of the file cout << "Number of file lines = "<< count << "\n"; int vector_in[count]; // Read in data while (i < count) { getline(myfile,line); vector_in[i] = atoi(line.c_str()); i++; } // Swapping sort int temp_int = 0, operations = 0; for (int j = 0; j < count; j++) { for (i = j+1; i < count; i++) { operations++; if (vector_in[i] < vector_in[j]) { temp_int = vector_in[i]; vector_in[i] = vector_in[j]; vector_in[j] = temp_int; } } } for (i = 0; i < count; i++) cout << vector_in[i] << "\n"; cout << "Number of operations = " << operations << "\n"; return 0; } Random number generator: =================== // This program sorts a vector of numbers in ascending order #include <fstream> #include <stdlib.h> #include <string> #include <iostream> #include <math.h> using namespace std; int main() { string line; int i=0, count = 0, temp_int2, seed, length; // Randomly generate file? fstream myfile("vector_in.txt", ios::out); cout << "Random seed value less than " << RAND_MAX << "\n"; cin >> seed; cout << "Length of testfile?" <<"\n"; cin >> length; // system ("rm vector_in.txt"); { for (i = 0; i < length; i++) { temp_int2 = rand(); myfile << temp_int2 << "\n"; } myfile<<std::flush; myfile.close(); } cout << "File generated.\n"; return 0; } -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Thomas Maeder on 22 Apr 2008 13:19 "C++ Newbie" <newbie.cpp(a)googlemail.com> writes: > Here's mine, with a random number generator at the end to make > testfiles. How do I improve the efficiency of the sorting process? > I've read about the various techniques but without statistical > analysis it is not immediately clear which is best or if there is > indeed a better way than the known best. > > Example: Number of operations with a random seed of 0 and a 100,000 > line file: 704982704 (assuming my definition of operations is the same > as people who claim nlog(n) - n^2 operations) > > Thanks for any input. > > // This program sorts a vector of numbers in ascending order > #include <fstream> > #include <stdlib.h> > #include <string> > #include <iostream> > #include <math.h> > > using namespace std; > > int main() > { > string line; > int i=0, count = 0, temp_int2; > > fstream myfile("vector_in.txt"); > // Read in number of lines > while (!myfile.eof()) > { > getline(myfile,line); > count++; > } > > count = count - 1; > myfile.clear(); // forget we hit the end of file > myfile.seekg(0, ios::beg); // move to the start of the file > cout << "Number of file lines = "<< count << "\n"; > int vector_in[count]; > > // Read in data > while (i < count) > { > getline(myfile,line); > vector_in[i] = atoi(line.c_str()); > i++; > } First, this is an inefficient way of determining the number of lines, and it may also be off by one (if the file doesn't end with an end of line character (sequence)) or result in an eternal loop (if input fails for a reason other than end-of-file). Second, atoi() has no way of reporting conversion failure; better use the function std::strtol() or std::istream. E.g. std::vector<long> numbers; while (getline(myfile,line)) { std::istringstream is(line); long number; if (is >> number) numbers.push_back(number); else ; // deal with the erroneous line } or, somewhat more compact, std::ifstream myfile("testfile"); std::istream_iterator<int> ii(myfile); std::vector<int> numbers(ii,std::istream_iterator<int>()); if (myfile.eof()) ; // entire file consumed without error; proceed else ; // deal with error > // Swapping sort > int temp_int = 0, operations = 0; > > for (int j = 0; j < count; j++) > { > for (i = j+1; i < count; i++) > { > operations++; > if (vector_in[i] < vector_in[j]) > { > temp_int = vector_in[i]; > vector_in[i] = vector_in[j]; > vector_in[j] = temp_int; > } > } > } This algorithm is named "bubble sort". It's very easily implemented, and very inefficient in general. To find more efficient algorithms, get your hands on a good algorithms book, or search the internet for e.g. "quicksort", "heap sort" or "merge sort" (http://en.wikipedia.org/ has nice articles on all these algorithms (including bubble sort)). -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Martin York on 23 Apr 2008 04:10 > for (int j = 0; j < count; j++) > { > for (i = j+1; i < count; i++) > { > operations++; > if (vector_in[i] < vector_in[j]) > { > temp_int = vector_in[i]; > vector_in[i] = vector_in[j]; > vector_in[j] = temp_int; > } > } > } AKA: Bubble sort. Sorting algorithms are measured in term of their complexity: Bubble sort has a complexity of O(n^2) In bubble sort, if you had an array of n elements, th sort requires two nested loops that both basically iterate over the whole length of the array (though you have tried to optimizeing this by limiting the inner loop this does not affect the complexity) [PS your optimization fails]. There are multiple sorts available (do a google on sorting algorithms), The most optimal sort is called a bucket sort. Under optimal conditions has a complexity of O(n). -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Markus Moll on 24 Apr 2008 06:19 Hi Martin York wrote: > There are multiple sorts available (do a google on sorting > algorithms), The most optimal sort is called a bucket sort. Under > optimal conditions has a complexity of O(n). However, I think that one should add that O(n) can only be achieved because bucket sort imposes restrictions on the input and that in general the lower bound is Omega(n log n). Markus -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
|
Pages: 1 Prev: pitfalls of string for binary data Next: a concurrent programming abstraction |