list.pop(0) vs. collections.dequeue [Python]

Prev: distutils not finding all of my pure python modules
Next: iterating lists

From: Steve Howell on 24 Jan 2010 14:26

On Jan 23, 3:04 pm, Terry Reedy <tjre...(a)udel.edu> wrote:
> On 1/23/2010 12:17 PM, Steve Howell wrote:
>
> > Terry Reedy said:
>
> > '''
> > If you try writing a full patch, as I believe someone did, or at least
> > a
> > prototype thereof, when the idea was discussed, you might have a
> > better
> > idea of what the tradeoffs are and why it was rejected.
> > '''
>
> > I have to run, but tomorrow I will try to dig through python-dev
> > archives and find the patch. If anybody has hints on where to look
> > for it (anybody remember the author, for example?), it would be much
> > appreciated.
>
> The approach you outlined in your other response to me is, I believe,
> what was considered, investigated, and then rejected (by Guido, with
> agreement). The discussion may have been on the now-closed and
> (misspelled) pyk3 (?), or maybe on python-ideas, but my memory is more
> likely the former. I am sure that Raymond H. was involved also.
>
> > If the patch looks simple, I will try to pitch the idea that its time
> > has come. Now that the specification of the language itself is
> > frozen, I think there might be more room for improving
> > implementations. Also, I might be able to make the argument that
> > tradeoffs of memory vs. CPU vs. code complexity have different forces
> > in the 2010s.
>
> I am not opposed to a possible change, just hasty, ill-informed
> criticism. If there is not a PEP on this issue, it would be good to have
> one that recorded the proposal and the pros and cons, regardless of the
> outcome, so there would be something to refer people to. If that had
> been already done, it would have shortened this thread considerably.
>

I think it's a good idea to write a PEP on this issue, and I will
attempt a first draft. I think I should submit the first draft to
python-ideas, correct?

I expect the PEP to be at least initially, if not permanently,
rejected, but it would not be an exercise in futility, as I agree it's
good to record pros and cons of the proposal in one place. The PEP
probably would not include a proposed patch until there was a little
bit of consensus behind it, but it would not take me a lot of time to
present the basic argument.

Here is my sketch of what the PEP would look like.

Proposal: Improve list's implementation so that deleting elements from
the front of the list does not require an O(N) memmove operation.

Rationale: Some Python programs that process lists have multiple
methods that consume the first element of the list and pop it off.
The pattern comes up with parsers in particular, but there are other
examples. It is possible now, of course, to use a data structure in
Python that has O(1) for deleting off the top of the list, but none of
the alternatives fully replicate the benefits of list itself.

Specification: Improving CPython's performance does not affect the
language itself, so there are no bikeshed arguments to be had with
respect to syntax, etc. Any patch would, of course, affect the
performance of nearly every Python program in existence, so any patch
would have to, at a bare minimum:

1) Not increase the time or memory complexity of any other list
operation.
2) Not affect list access at all.
3) Minimally affect list operations that mutate the list.
4) Be reasonably simple within CPython itself.
5) Not be grossly wasteful of memory.

Backwards Compatibility:

See above. An implementation of this PEP would not change the
definition of the language in any way, but it would have to minimally
impact the performance of lists for the normal use cases.

Implementation:

There are two ways to make deleting the first item of the list run
more efficiently.

The most ambitious proposal is to fix the memory manager itself to
allow the release of memory from the start of the chunk. The
advantage of this proposal is that it would simplify the changes to
list itself, and possibly have collateral benefits for other CPython
internal data structures. The disadvantage of the proposal is that
there is a strong tradition in CPython to use native memory
management, particularly with respect to the fact that it runs on many
platforms.

The less ambitious proposal is to change the memory management scheme
within list itself. There is already precedent in list_resize() to
optimistically allocate memory, so it is not a great break from
tradition to optimistically defer the release of memory. But it would
complicate things.

References:

I would refer to this thread on comp.lang.python for discussion, and I
would also try to dig up older threads on python-dev or elsewhere.

From: Aahz on 24 Jan 2010 14:28

In article <b4440231-f33f-49e1-9d6f-5fbce0a63bd2(a)b2g2000yqi.googlegroups.com>,
Steve Howell <showell30(a)yahoo.com> wrote:
>
>Even with realloc()'s brokenness, you could improve pop(0) in a way
>that does not impact list access at all, and the patch would not change
>the time complexity of any operation; it would just add negligible
>extract bookkeeping within list_resize() and a few other places.

Again, your responsibility is to provide a patch and a spectrum of
benchmarking tests to prove it. Then you would still have to deal with
the objection that extensions use the list internals -- that might be an
okay sell given the effort otherwise required to port extensions to
Python 3, but that's not the way to bet.

Have you actually read the discussions you were pointed at?
--
Aahz (aahz(a)pythoncraft.com) <*> http://www.pythoncraft.com/

import antigravity

From: Steve Howell on 24 Jan 2010 14:53

On Jan 24, 11:28 am, a...(a)pythoncraft.com (Aahz) wrote:
> In article <b4440231-f33f-49e1-9d6f-5fbce0a63...(a)b2g2000yqi.googlegroups.com>,
> Steve Howell <showel...(a)yahoo.com> wrote:
>
>
>
> >Even with realloc()'s brokenness, you could improve pop(0) in a way
> >that does not impact list access at all, and the patch would not change
> >the time complexity of any operation; it would just add negligible
> >extract bookkeeping within list_resize() and a few other places.
>
> Again, your responsibility is to provide a patch and a spectrum of
> benchmarking tests to prove it. Then you would still have to deal with
> the objection that extensions use the list internals -- that might be an
> okay sell given the effort otherwise required to port extensions to
> Python 3, but that's not the way to bet.

Ok.

> Have you actually read the discussions you were pointed at?

I don't think anybody provided an actual link, but please correct me
if I overlooked it.

From: Terry Reedy on 24 Jan 2010 15:13

On 1/24/2010 2:26 PM, Steve Howell wrote:

> I think it's a good idea to write a PEP on this issue, and I will
> attempt a first draft. I think I should submit the first draft to
> python-ideas, correct?

That is not a *requirement* for drafts in general, but it is a good idea
for a community or community-person generated proposal, such as this one.

> I expect the PEP to be at least initially, if not permanently,
> rejected,

Guido sometimes rejects 'no-chance' proposals without waiting to be
asked, but he usually waits until the PEP author feels the issue is ripe
and asks for a pronouncement.

tjr

From: Paul Rubin on 24 Jan 2010 15:44

Steve Howell <showell30(a)yahoo.com> writes:
> Proposal: Improve list's implementation so that deleting elements from
> the front of the list does not require an O(N) memmove operation. ...
> It is possible now, of course, to use a data structure in
> Python that has O(1) for deleting off the top of the list, but none of
> the alternatives fully replicate the benefits of list itself.

I think you are mostly referring to deque. Why don't you instead
say what you think is wrong with using deque, and how deque can
be improved?

> See above. An implementation of this PEP would not change the
> definition of the language in any way, but it would have to minimally
> impact the performance of lists for the normal use cases.

But you're talking about adding one or two words to EVERY list, and many
normal use cases allocate a LOT of lists. Those use cases are likely
more common than use cases that pop from the front of the list but for
some reason can't use deque.

> The most ambitious proposal is to fix the memory manager itself to
> allow the release of memory from the start of the chunk.

That's inappropriate given the memory fragmentation it would cause.

Really, you're describing a problem that arises in a few programs but up
til now, as far as I know, everyone has found deque to be an adequate
solution. Your approach of snarling against list is not persuading
anyone that list needs to be changed, because most everyone is satisfied
with the existing solution. You might change approaches and discuss
deque, what's wrong with it, and whether it can be fixed. Getting a
change approved for deque is probably much easier than getting one
approved for list, just because nowhere near as many things depend on
deque's performance. And when discussing performance in this context,
additive constants do matter.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Prev: distutils not finding all of my pure python modules
Next: iterating lists