From: Joe Gottman on
Why is there no operator== defined for the unordered containers?
Assuming a good hash function it is possible to define one for
unordered_set and unordered_map that takes O(N) time (where N = the size
of the collection):

template <class X, class H, class P, class A>
bool operator==(const unorderd_set<T, H, P, A> &lhs,
const unordered_set<T, H, P, A> &rhs)
{
if (lhs.size() != rhs.size()) return false;
for (auto it1 = lhs.begin(); it1!= lhs.end(); ++it1) {
auto it2 = rhs.find(*it1);
if ((it2 == rhs.end()) || (*it1 != *it2)) return false;
}
return true;
}

Whether this is possible for unordered_multiset and unordered_multiset
depends on how unordered we want the unordered container to be.
Consider two hash_multimap<int, char> objects h1 and h2, such that h1
contains the elements { {1, 'a'}, {1, 'b'} and h2 contains the same
elements in the opposite order. The efficiency of operator== on
hash_multimap and hash_multiset depends on whether s1 and s2 should be
considered equal.

If the order of elements with equivalent keys matters, then the
unordered_multiset and unordered_multimap can still be compared in O(N)
time:

template <class X, class H, class P, class A>
bool operator==(const unorderd_multiset<T, H, P, A> &lhs,
const unordered_multiset<T, H, P, A> &rhs)
{
if (lhs.size() != rhs.size()) return false;
for auto it1= lsh.begin(); it1!= lhs.end(); /*No increment*/) {
if (lhs.count(*it1) != rhs.count(*it1)) return false;
auto rng = rhs.equal_range(*it);
for (auto it2 = rng.first; it2 != rng.second; ++it1, ++it2) {
if (*it1 != *it2) return false;
}
}
return true;
}


If the order of elements with equivalent keys doesn't matter then
unordered_multiset and unordered_multimap can be compared in O(Nk) time
where k is the maximum number of elements with equivalent keys.

template <class X, class H, class P, class A>
bool operator==(const unorderd_multiset<T, H, P, A> &lhs,
const unordered_multiset<T, H, P, A> &rhs)
{
if (lhs.size() != rhs.size()) return false;
for auto it1= lsh.begin(); it1!= lhs.end(); /*No increment*/) {
if (lhs.count(*it1) != rhs.count(*it1)) return false;
//Can't just search for all the elements equivalent to *it1 in
rhs because the multiplicities might differ.
auto rng = rhs.equal_range(*it);
list<unordered_multiset::const_iterator> rh_list;

for (auto it2 = rh_rng.first; it2 != rng.second; ++ it2) {
rh_list.insert(it);
}
for (/*No init */ ; !rh_list.empty(); ++it1) {
auto it2 = rh_list.begin();
for (/*No init */; (it2 != rh_list.end()) && (*it1 != *it2);
++it2) {}
if (it2 == rh_list.end()) { //None found equal to *it1
return false;
} else {
rh_list.erase(it2); // Make sure we don't match the
same element twice
}
}
}
}

Note that all these equality operators can blow up if the unordered
container has a bad hash function. But if an unordered container has a
bad hash function then ALL of it's operations become inefficient. I
still think it's worthwhile to be able to compare two hashed containers
for equality.


Joe Gottman

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Tony Delroy on
On Apr 11, 11:39 am, Joe Gottman <jgott...(a)carolina.rr.com> wrote:
> Why is there no operator== defined for the unordered containers?

Ignoring a few things about correcting / improving the algorithms
above, or alternatives. The STL doesn't normally implement
inefficient operators (think [] for linked lists), as it encourages
end-users to use them casually, then C++ gets blamed for the terrible
performance. Kind of a "If you're smart enough to understand what it
really means, then you're definitely smart enough to implement it"
attitude. Often some other insights about the problem domain suggest
a more efficient solution anyway.

Tony

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Pete Becker on
On 2008-04-10 16:39:49 -0400, Joe Gottman <jgottman(a)carolina.rr.com> said:

> Why is there no operator== defined for the unordered containers?
> Assuming a good hash function it is possible to define one for
> unordered_set and unordered_map that takes O(N) time (where N = the size
> of the collection):

The reason they're called "unordered" is that there's no guaranteed
ordering. Two unordered sets that use the same hash function and hold
the same elements and have the same number of buckets will probably
store the elements in the same order (that's not formally required, but
it would take a rather perverse mindset to write an unordered container
that didn't do that). But if they have different numbers of buckets,
then the elements can be distributed differently among the buckets, and
the order of the elements as presented by the containers' iterators can
be different. The number of buckets isn't specified. It's affected by
the load factor, and by how the elements got there. If one container is
built by adding the elements, and the other is built by adding many
more elements then removing some, they're likely to have a different
number of buckets, and, as a result, probably a different order.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Joe Gottman on
Pete Becker wrote:
> On 2008-04-10 16:39:49 -0400, Joe Gottman <jgottman(a)carolina.rr.com> said:
>
>> Why is there no operator== defined for the unordered containers?
>> Assuming a good hash function it is possible to define one for
>> unordered_set and unordered_map that takes O(N) time (where N = the size
>> of the collection):
>
> The reason they're called "unordered" is that there's no guaranteed
> ordering. Two unordered sets that use the same hash function and hold
> the same elements and have the same number of buckets will probably
> store the elements in the same order (that's not formally required, but
> it would take a rather perverse mindset to write an unordered container
> that didn't do that). But if they have different numbers of buckets,
> then the elements can be distributed differently among the buckets, and
> the order of the elements as presented by the containers' iterators can
> be different. The number of buckets isn't specified. It's affected by
> the load factor, and by how the elements got there. If one container is
> built by adding the elements, and the other is built by adding many
> more elements then removing some, they're likely to have a different
> number of buckets, and, as a result, probably a different order.
>

And that's why the usual definition of container equality, which is
based on std::equal, doesn't work for the unordered containers.
However, if we define that two unordered_sets are equal if they contain
the same elements in any order, it is still possible to compute
operator== on two unordered_sets in linear time as long as the hash
function is good, by first comparing sizes, then searching for every
element of the first unordered_set in the second one. Of course if the
hash function is bad then this is inefficient, but if the hash function
is bad then EVERY operation on the unordered_set, including find(),
insert() and erase(), becomes inefficient. This is just the way hashed
containers work. But we still use them because if the hash is good they
are extremely efficient.

Joe Gottman

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Tony Delroy on
On Apr 12, 2:47 am, Pete Becker <p...(a)versatilecoding.com> wrote:
> ... Two unordered sets that use the same hash function and hold
> the same elements and have the same number of buckets will probably
> store the elements in the same order (that's not formally required, but
> it would take a rather perverse mindset to write an unordered container
> that didn't do that). [ But if they have different numbers of buckets...

Even with the same number of buckets preallocated, the order in which
the elements are inserted into the hash may well affect the order of
traversal (i.e. elements hashing to the same bucket may be rehashed/
linked/displaced etc in another order) ....

Tony

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]