From: Lie Ryan on
On 04/07/10 04:11, Gustavo Narea wrote:
> Hello!
>
> Could you please confirm whether my understanding of equality
> operations in sets and lists is correct? This is how I think things
> work, partially based on experimentation and the online documentation
> for Python:
>
> When you compare two lists, *every* element of one of the lists is
> compared against the element at the same position in the other list;
> that comparison is done by the __eq__() method (or the equivalent for
> builtin types). This is interrupted when a result is False.
>
> When you compare two sets, there's a loop over all the elements of the
> first set, where the hash for that element is looked up in the second
> set:
> - If this hash matches the hash for one or more elements in the
> second set, the element in the first set is compared (with __eq__ or
> equivalent) against the elements in the second set which have the same
> hash. When a result is True, nothing else is done on that element and
> the loop takes the next element in the first set; when all the results
> are False, the loop ends and the two sets are not equivalent.
> - If the hash doesn't match that of an element in the second set,
> then the loop ends and the two sets are not equivalent.

I have not seen python's set implementation, but if you keep a bitmap of
hashes that already exist in a set, you can compare 32 or 64 items (i.e.
the computer's native word-size) at a time instead of comparing items
one-by-one[1]; this could marginally improve set operation's performance
for doing comparisons, difference, update, etc. Anyone that have seen
python's set source code can confirm whether such thing are implemented
in python's set?

[1] the possibility of hash collision complicates this a little bit, I
haven't fully thought out the the consequences of the interaction of
such bitmap with hash collision handling (there was a PyCon 2010 talk
"The Mighty Dictionary" by Brandon Craig Rhodes describing collision
handling
http://python.mirocommunity.org/video/1591/pycon-2010-the-mighty-dictiona).
From: Raymond Hettinger on
[Gustavo Nare]
> In other words: The more different elements two collections have, the
> faster it is to compare them as sets. And as a consequence, the more
> equivalent elements two collections have, the faster it is to compare
> them as lists.
>
> Is this correct?

If two collections are equal, then comparing them as a set is always
slower than comparing them as a list. Both have to call __eq__ for
every element, but sets have to search for each element while lists
can just iterate over consecutive pointers.

If the two collections have unequal sizes, then both ways immediately
return unequal.

If the two collections are unequal but have the same size, then
the comparison time is data dependent (when the first mismatch
is found).


Raymond
From: Steven D'Aprano on
On Wed, 07 Apr 2010 10:55:10 -0700, Raymond Hettinger wrote:

> [Gustavo Nare]
>> In other words: The more different elements two collections have, the
>> faster it is to compare them as sets. And as a consequence, the more
>> equivalent elements two collections have, the faster it is to compare
>> them as lists.
>>
>> Is this correct?
>
> If two collections are equal, then comparing them as a set is always
> slower than comparing them as a list. Both have to call __eq__ for
> every element, but sets have to search for each element while lists can
> just iterate over consecutive pointers.
>
> If the two collections have unequal sizes, then both ways immediately
> return unequal.


Perhaps I'm misinterpreting what you are saying, but I can't confirm that
behaviour, at least not for subclasses of list:

>>> class MyList(list):
.... def __len__(self):
.... return self.n
....
>>> L1 = MyList(range(10))
>>> L2 = MyList(range(10))
>>> L1.n = 9
>>> L2.n = 10
>>> L1 == L2
True
>>> len(L1) == len(L2)
False




--
Steven
From: Patrick Maupin on
On Apr 7, 8:41 pm, Steven D'Aprano
<ste...(a)REMOVE.THIS.cybersource.com.au> wrote:
> On Wed, 07 Apr 2010 10:55:10 -0700, Raymond Hettinger wrote:
> > [Gustavo Nare]
> >> In other words: The more different elements two collections have, the
> >> faster it is to compare them as sets. And as a consequence, the more
> >> equivalent elements two collections have, the faster it is to compare
> >> them as lists.
>
> >> Is this correct?
>
> > If two collections are equal, then comparing them as a set is always
> > slower than comparing them as a list.  Both have to call __eq__ for
> > every element, but sets have to search for each element while lists can
> > just iterate over consecutive pointers.
>
> > If the two collections have unequal sizes, then both ways immediately
> > return unequal.
>
> Perhaps I'm misinterpreting what you are saying, but I can't confirm that
> behaviour, at least not for subclasses of list:
>
> >>> class MyList(list):
>
> ...     def __len__(self):
> ...             return self.n
> ...>>> L1 = MyList(range(10))
> >>> L2 = MyList(range(10))
> >>> L1.n = 9
> >>> L2.n = 10
> >>> L1 == L2
> True
> >>> len(L1) == len(L2)
>
> False
>
> --
> Steven

I think what he is saying is that the list __eq__ method will look at
the list lengths first. This may or may not be considered a subtle
bug for the edge case you are showing.

If I do the following:

>>> L1 = range(10000000)
>>> L2 = range(10000000)
>>> L3 = range(10000001)
>>> L1 == L2
True
>>> L1 == L3
False
>>>

I don't even need to run timeit -- the "True" takes awhile to print
out, while the "False" prints out immediately.

Regards,
Pat
From: Raymond Hettinger on
[Raymond Hettinger]
> > If the two collections have unequal sizes, then both ways immediately
> > return unequal.

[Steven D'Aprano]
> Perhaps I'm misinterpreting what you are saying, but I can't confirm that
> behaviour, at least not for subclasses of list:

For doubters, see list_richcompare() in
http://svn.python.org/view/python/trunk/Objects/listobject.c?revision=78522&view=markup

if (Py_SIZE(vl) != Py_SIZE(wl) && (op == Py_EQ || op == Py_NE)) {
/* Shortcut: if the lengths differ, the lists differ */
PyObject *res;
if (op == Py_EQ)
res = Py_False;
else
res = Py_True;
Py_INCREF(res);
return res;
}

And see set_richcompare() in
http://svn.python.org/view/python/trunk/Objects/setobject.c?revision=78886&view=markup

case Py_EQ:
if (PySet_GET_SIZE(v) != PySet_GET_SIZE(w))
Py_RETURN_FALSE;
if (v->hash != -1 &&
((PySetObject *)w)->hash != -1 &&
v->hash != ((PySetObject *)w)->hash)
Py_RETURN_FALSE;
return set_issubset(v, w);


Raymond
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4
Prev: imports again
Next: python as pen and paper substitute