From: Chris Riesbeck on 23 Jun 2010 21:44 M Stefan wrote: > On Jun 22, 2:34 am, Brad <byte8b...(a)gmail.com> wrote: >> >> Basically, I'm comparing a randomly generated string to a vector of >> strings by iterating through the vector (as seen above). Here is the >> problem: The vector may have one or millions of strings and execution >> time slows considerably when I go above a few thousand strings. >> >> Is there a more efficient way to approach this? ... >> >> Brad >> > > Implementing a trie data structure will provide very efficient > substring lookup > after the strings have been inserted into the trie. > > http://en.wikipedia.org/wiki/Trie > http://en.wikipedia.org/wiki/Patricia_trie Let me second the Trie approach. One programming exercise I give is to create a trie for a English dictionary to speed up a Boggle word finder. The speedup is quite impressive over searching. -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Brad on 24 Jun 2010 10:02 On Jun 24, 8:44 am, Chris Riesbeck <Chris.Riesb...(a)gmail.com> wrote: > Let me second the Trie approach. One programming exercise I give is to > create a trie for a English dictionary to speed up a Boggle word finder. > The speedup is quite impressive over searching. I will test Trie as well. I just experimented with boost::unordered_set and that made a significant difference on large sets, but was a bit slower on small sets when compared to a sorted vector and binary search. Here are some numbers: 1 string: binary_search (28 seconds) unordered_set (36 seconds) 1,000,000 strings: binary_search (127 seconds) unordered_set (60 seconds) -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: dhruv on 10 Jul 2010 04:26 On Jun 22, 4:34 am, Brad <byte8b...(a)gmail.com> wrote: > I think there are better (more efficient) ways to do this, but I've > only ever needed this simple approach: > > void compare(const std::vector<std::string>& strings) > { > // Normally, this is randomly generated. It is a C style > string. > char the_string[4] = "abc"; > > std::vector<std::string>::const_iterator send(strings.end()); > > for (std::vector<std::string>::const_iterator sit = strings.begin(); > sit != send; ++sit) > { > if (the_string == *sit) > std::clog << "MATCH:\t" << *sit << std::endl; > } > > } > > Basically, I'm comparing a randomly generated string to a vector of > strings by iterating through the vector (as seen above). Here is the > problem: The vector may have one or millions of strings and execution > time slows considerably when I go above a few thousand strings. > > Is there a more efficient way to approach this? My code works fine on > smaller vectors, I just never expected to do millions and it may not > be possible to do that many as quickly as a few, but I wanted to ask. > Perhaps there is something in algorithm that I could use. Just point > me to it. I prefer standard C++ wherever possible. This seems to be more of an algorithm related question rather than a C+ + related one. Many approaches have been mentioned on this thread. 1. I however think you should hash every string and then do the brute force compare with only those strings that hash to the same value and have the same length. 2. The trie approach may not work for strings more than a handful (10-15) bytes long. 3. There is another data structure called a suffix tree which will help you in this case if you can afford to build it for each of the strings you want to compare with. A suffix array is an equivalent data structure, but much easier to implement and bumps up the runtime complexity by a factor of just log(n). You might want to check them out as well. Regards, -Dhruv. -- Try out Duck Duck Go www.duckduckgo.com [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
First
|
Prev
|
Pages: 1 2 3 4 Prev: New types defined in a return type Next: Generating a derived class from a base class |