From: Lie Ryan on 7 Apr 2010 01:32 On 04/07/10 04:11, Gustavo Narea wrote:> Hello! > > Could you please confirm whether my understanding of equality > operations in sets and lists is correct? This is how I think things > work, partially based on experimentation and the online documentation > for Python: > > When you compare two lists, *every* element of one of the lists is > compared against the element at the same position in the other list; > that comparison is done by the __eq__() method (or the equivalent for > builtin types). This is interrupted when a result is False. > > When you compare two sets, there's a loop over all the elements of the > first set, where the hash for that element is looked up in the second > set: > - If this hash matches the hash for one or more elements in the > second set, the element in the first set is compared (with __eq__ or > equivalent) against the elements in the second set which have the same > hash. When a result is True, nothing else is done on that element and > the loop takes the next element in the first set; when all the results > are False, the loop ends and the two sets are not equivalent. > - If the hash doesn't match that of an element in the second set, > then the loop ends and the two sets are not equivalent. I have not seen python's set implementation, but if you keep a bitmap of hashes that already exist in a set, you can compare 32 or 64 items (i.e. the computer's native word-size) at a time instead of comparing items one-by-one[1]; this could marginally improve set operation's performance for doing comparisons, difference, update, etc. Anyone that have seen python's set source code can confirm whether such thing are implemented in python's set? [1] the possibility of hash collision complicates this a little bit, I haven't fully thought out the the consequences of the interaction of such bitmap with hash collision handling (there was a PyCon 2010 talk "The Mighty Dictionary" by Brandon Craig Rhodes describing collision handling http://python.mirocommunity.org/video/1591/pycon-2010-the-mighty-dictiona). From: Raymond Hettinger on 7 Apr 2010 13:55 [Gustavo Nare]> In other words: The more different elements two collections have, the > faster it is to compare them as sets. And as a consequence, the more > equivalent elements two collections have, the faster it is to compare > them as lists. > > Is this correct? If two collections are equal, then comparing them as a set is always slower than comparing them as a list. Both have to call __eq__ for every element, but sets have to search for each element while lists can just iterate over consecutive pointers. If the two collections have unequal sizes, then both ways immediately return unequal. If the two collections are unequal but have the same size, then the comparison time is data dependent (when the first mismatch is found). Raymond From: Steven D'Aprano on 7 Apr 2010 21:41 On Wed, 07 Apr 2010 10:55:10 -0700, Raymond Hettinger wrote: > [Gustavo Nare] >> In other words: The more different elements two collections have, the >> faster it is to compare them as sets. And as a consequence, the more >> equivalent elements two collections have, the faster it is to compare >> them as lists. >> >> Is this correct? > > If two collections are equal, then comparing them as a set is always > slower than comparing them as a list. Both have to call __eq__ for > every element, but sets have to search for each element while lists can > just iterate over consecutive pointers. > > If the two collections have unequal sizes, then both ways immediately > return unequal. Perhaps I'm misinterpreting what you are saying, but I can't confirm that behaviour, at least not for subclasses of list: >>> class MyList(list): .... def __len__(self): .... return self.n ....>>> L1 = MyList(range(10)) >>> L2 = MyList(range(10)) >>> L1.n = 9 >>> L2.n = 10 >>> L1 == L2 True>>> len(L1) == len(L2) False -- Steven From: Patrick Maupin on 7 Apr 2010 21:53 On Apr 7, 8:41 pm, Steven D'Aprano wrote:> On Wed, 07 Apr 2010 10:55:10 -0700, Raymond Hettinger wrote: > > [Gustavo Nare] > >> In other words: The more different elements two collections have, the > >> faster it is to compare them as sets. And as a consequence, the more > >> equivalent elements two collections have, the faster it is to compare > >> them as lists. > > >> Is this correct? > > > If two collections are equal, then comparing them as a set is always > > slower than comparing them as a list.  Both have to call __eq__ for > > every element, but sets have to search for each element while lists can > > just iterate over consecutive pointers. > > > If the two collections have unequal sizes, then both ways immediately > > return unequal. > > Perhaps I'm misinterpreting what you are saying, but I can't confirm that > behaviour, at least not for subclasses of list: > > >>> class MyList(list): > > ...     def __len__(self): > ...             return self.n > ...>>> L1 = MyList(range(10)) > >>> L2 = MyList(range(10)) > >>> L1.n = 9 > >>> L2.n = 10 > >>> L1 == L2 > True > >>> len(L1) == len(L2) > > False > > -- > Steven I think what he is saying is that the list __eq__ method will look at the list lengths first. This may or may not be considered a subtle bug for the edge case you are showing. If I do the following: >>> L1 = range(10000000) >>> L2 = range(10000000) >>> L3 = range(10000001) >>> L1 == L2 True>>> L1 == L3 False>>> I don't even need to run timeit -- the "True" takes awhile to print out, while the "False" prints out immediately. Regards, Pat From: Raymond Hettinger on 7 Apr 2010 23:14 [Raymond Hettinger]> > If the two collections have unequal sizes, then both ways immediately > > return unequal. [Steven D'Aprano]> Perhaps I'm misinterpreting what you are saying, but I can't confirm that > behaviour, at least not for subclasses of list: For doubters, see list_richcompare() in http://svn.python.org/view/python/trunk/Objects/listobject.c?revision=78522&view=markup if (Py_SIZE(vl) != Py_SIZE(wl) && (op == Py_EQ || op == Py_NE)) { /* Shortcut: if the lengths differ, the lists differ */ PyObject *res; if (op == Py_EQ) res = Py_False; else res = Py_True; Py_INCREF(res); return res; } And see set_richcompare() in http://svn.python.org/view/python/trunk/Objects/setobject.c?revision=78886&view=markup case Py_EQ: if (PySet_GET_SIZE(v) != PySet_GET_SIZE(w)) Py_RETURN_FALSE; if (v->hash != -1 && ((PySetObject *)w)->hash != -1 && v->hash != ((PySetObject *)w)->hash) Py_RETURN_FALSE; return set_issubset(v, w); Raymond First  |  Prev  |  Next  |  Last Pages: 1 2 3 4 Prev: imports againNext: python as pen and paper substitute