|
From: Joe Gottman on 10 Apr 2008 11:39 Why is there no operator== defined for the unordered containers? Assuming a good hash function it is possible to define one for unordered_set and unordered_map that takes O(N) time (where N = the size of the collection): template <class X, class H, class P, class A> bool operator==(const unorderd_set<T, H, P, A> &lhs, const unordered_set<T, H, P, A> &rhs) { if (lhs.size() != rhs.size()) return false; for (auto it1 = lhs.begin(); it1!= lhs.end(); ++it1) { auto it2 = rhs.find(*it1); if ((it2 == rhs.end()) || (*it1 != *it2)) return false; } return true; } Whether this is possible for unordered_multiset and unordered_multiset depends on how unordered we want the unordered container to be. Consider two hash_multimap<int, char> objects h1 and h2, such that h1 contains the elements { {1, 'a'}, {1, 'b'} and h2 contains the same elements in the opposite order. The efficiency of operator== on hash_multimap and hash_multiset depends on whether s1 and s2 should be considered equal. If the order of elements with equivalent keys matters, then the unordered_multiset and unordered_multimap can still be compared in O(N) time: template <class X, class H, class P, class A> bool operator==(const unorderd_multiset<T, H, P, A> &lhs, const unordered_multiset<T, H, P, A> &rhs) { if (lhs.size() != rhs.size()) return false; for auto it1= lsh.begin(); it1!= lhs.end(); /*No increment*/) { if (lhs.count(*it1) != rhs.count(*it1)) return false; auto rng = rhs.equal_range(*it); for (auto it2 = rng.first; it2 != rng.second; ++it1, ++it2) { if (*it1 != *it2) return false; } } return true; } If the order of elements with equivalent keys doesn't matter then unordered_multiset and unordered_multimap can be compared in O(Nk) time where k is the maximum number of elements with equivalent keys. template <class X, class H, class P, class A> bool operator==(const unorderd_multiset<T, H, P, A> &lhs, const unordered_multiset<T, H, P, A> &rhs) { if (lhs.size() != rhs.size()) return false; for auto it1= lsh.begin(); it1!= lhs.end(); /*No increment*/) { if (lhs.count(*it1) != rhs.count(*it1)) return false; //Can't just search for all the elements equivalent to *it1 in rhs because the multiplicities might differ. auto rng = rhs.equal_range(*it); list<unordered_multiset::const_iterator> rh_list; for (auto it2 = rh_rng.first; it2 != rng.second; ++ it2) { rh_list.insert(it); } for (/*No init */ ; !rh_list.empty(); ++it1) { auto it2 = rh_list.begin(); for (/*No init */; (it2 != rh_list.end()) && (*it1 != *it2); ++it2) {} if (it2 == rh_list.end()) { //None found equal to *it1 return false; } else { rh_list.erase(it2); // Make sure we don't match the same element twice } } } } Note that all these equality operators can blow up if the unordered container has a bad hash function. But if an unordered container has a bad hash function then ALL of it's operations become inefficient. I still think it's worthwhile to be able to compare two hashed containers for equality. Joe Gottman -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Tony Delroy on 11 Apr 2008 02:36 On Apr 11, 11:39 am, Joe Gottman <jgott...(a)carolina.rr.com> wrote: > Why is there no operator== defined for the unordered containers? Ignoring a few things about correcting / improving the algorithms above, or alternatives. The STL doesn't normally implement inefficient operators (think [] for linked lists), as it encourages end-users to use them casually, then C++ gets blamed for the terrible performance. Kind of a "If you're smart enough to understand what it really means, then you're definitely smart enough to implement it" attitude. Often some other insights about the problem domain suggest a more efficient solution anyway. Tony -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Pete Becker on 11 Apr 2008 02:47 On 2008-04-10 16:39:49 -0400, Joe Gottman <jgottman(a)carolina.rr.com> said: > Why is there no operator== defined for the unordered containers? > Assuming a good hash function it is possible to define one for > unordered_set and unordered_map that takes O(N) time (where N = the size > of the collection): The reason they're called "unordered" is that there's no guaranteed ordering. Two unordered sets that use the same hash function and hold the same elements and have the same number of buckets will probably store the elements in the same order (that's not formally required, but it would take a rather perverse mindset to write an unordered container that didn't do that). But if they have different numbers of buckets, then the elements can be distributed differently among the buckets, and the order of the elements as presented by the containers' iterators can be different. The number of buckets isn't specified. It's affected by the load factor, and by how the elements got there. If one container is built by adding the elements, and the other is built by adding many more elements then removing some, they're likely to have a different number of buckets, and, as a result, probably a different order. -- Pete Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The Standard C++ Library Extensions: a Tutorial and Reference (www.petebecker.com/tr1book) [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Joe Gottman on 12 Apr 2008 03:10 Pete Becker wrote: > On 2008-04-10 16:39:49 -0400, Joe Gottman <jgottman(a)carolina.rr.com> said: > >> Why is there no operator== defined for the unordered containers? >> Assuming a good hash function it is possible to define one for >> unordered_set and unordered_map that takes O(N) time (where N = the size >> of the collection): > > The reason they're called "unordered" is that there's no guaranteed > ordering. Two unordered sets that use the same hash function and hold > the same elements and have the same number of buckets will probably > store the elements in the same order (that's not formally required, but > it would take a rather perverse mindset to write an unordered container > that didn't do that). But if they have different numbers of buckets, > then the elements can be distributed differently among the buckets, and > the order of the elements as presented by the containers' iterators can > be different. The number of buckets isn't specified. It's affected by > the load factor, and by how the elements got there. If one container is > built by adding the elements, and the other is built by adding many > more elements then removing some, they're likely to have a different > number of buckets, and, as a result, probably a different order. > And that's why the usual definition of container equality, which is based on std::equal, doesn't work for the unordered containers. However, if we define that two unordered_sets are equal if they contain the same elements in any order, it is still possible to compute operator== on two unordered_sets in linear time as long as the hash function is good, by first comparing sizes, then searching for every element of the first unordered_set in the second one. Of course if the hash function is bad then this is inefficient, but if the hash function is bad then EVERY operation on the unordered_set, including find(), insert() and erase(), becomes inefficient. This is just the way hashed containers work. But we still use them because if the hash is good they are extremely efficient. Joe Gottman -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Tony Delroy on 17 Apr 2008 21:13 On Apr 12, 2:47 am, Pete Becker <p...(a)versatilecoding.com> wrote: > ... Two unordered sets that use the same hash function and hold > the same elements and have the same number of buckets will probably > store the elements in the same order (that's not formally required, but > it would take a rather perverse mindset to write an unordered container > that didn't do that). [ But if they have different numbers of buckets... Even with the same number of buckets preallocated, the order in which the elements are inserted into the hash may well affect the order of traversal (i.e. elements hashing to the same bucket may be rehashed/ linked/displaced etc in another order) .... Tony -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
|
Pages: 1 Prev: C++0X Its about (y)time Next: static const integral members |