list.pop(0) vs. collections.dequeue [Python]

Prev: distutils not finding all of my pure python modules
Next: iterating lists

From: Terry Reedy on 23 Jan 2010 18:04

On 1/23/2010 12:17 PM, Steve Howell wrote:

> Terry Reedy said:
>
> '''
> If you try writing a full patch, as I believe someone did, or at least
> a
> prototype thereof, when the idea was discussed, you might have a
> better
> idea of what the tradeoffs are and why it was rejected.
> '''
>
> I have to run, but tomorrow I will try to dig through python-dev
> archives and find the patch. If anybody has hints on where to look
> for it (anybody remember the author, for example?), it would be much
> appreciated.

The approach you outlined in your other response to me is, I believe,
what was considered, investigated, and then rejected (by Guido, with
agreement). The discussion may have been on the now-closed and
(misspelled) pyk3 (?), or maybe on python-ideas, but my memory is more
likely the former. I am sure that Raymond H. was involved also.

> If the patch looks simple, I will try to pitch the idea that its time
> has come. Now that the specification of the language itself is
> frozen, I think there might be more room for improving
> implementations. Also, I might be able to make the argument that
> tradeoffs of memory vs. CPU vs. code complexity have different forces
> in the 2010s.

I am not opposed to a possible change, just hasty, ill-informed
criticism. If there is not a PEP on this issue, it would be good to have
one that recorded the proposal and the pros and cons, regardless of the
outcome, so there would be something to refer people to. If that had
been already done, it would have shortened this thread considerably.

Terry Jan Reedy

From: Raymond Hettinger on 23 Jan 2010 23:00

[Steve Howell]
> Why wouldn't you get a competent C programmer simply make
> list_ass_slice smart enough to make list.pop(0) O(1)?

When this suggestion was discussed on python-dev years ago,
it was rejected. One reason is that it was somewhat
common for C code to access the list data structure directly
(bypassing API accessor functions). Changing the list to have
a starting offset would break existing C extensions.

Another reason is that Guido is non-tolerant of space or time
trade-offs for lists and tuples because they pervade the language
and are heavily used internally. Any additional space or time
requirement however small would impact the language performance
as a whole. FWIW, that is also the reason that lists are not
weak-referenceable (it would cost one extra pointer field per
instance and that wasn't deemed to be worth it).

> The brilliant computer scientist, Christian Heimes, provides the
> answers, and I am paraphrasing here, of course:

IMHO, Christian IS a brilliant computer scientist, so I'll ignore
the rude intention and take the sentence literally.

> 1) You can save 64 bits for every list by not storing an extra
> pointer!
> 2) You can simplify the CPython implementation!
> 3) You can avoid the oh-so-expensive check "if ilow == 1" for all
> operations that don't need this optimization!
>
> Sounds like two micro-optimizations to me (and a copout to boot).

Micro or not, these reasons were part of Guido's rationale.
Tim Peters and I also participated in the conversation and did not
disagree.

So, collections.deque() was born and the problem stopped being an
issue.
Also, Daniel Stuzbach has published a blist implementation that offers
high performance insertions and deletions and fast handling of sparse
lists.

Raymond

From: Steve Howell on 24 Jan 2010 05:33

On Jan 23, 8:00 pm, Raymond Hettinger <pyt...(a)rcn.com> wrote:
> [Steve Howell]
>
> > Why wouldn't you get a competent C programmer simply make
> > list_ass_slice smart enough to make list.pop(0) O(1)?
>
> When this suggestion was discussed on python-dev years ago,
> it was rejected. One reason is that it was somewhat
> common for C code to access the list data structure directly
> (bypassing API accessor functions). Changing the list to have
> a starting offset would break existing C extensions.
>
> Another reason is that Guido is non-tolerant of space or time
> trade-offs for lists and tuples because they pervade the language
> and are heavily used internally. Any additional space or time
> requirement however small would impact the language performance
> as a whole. FWIW, that is also the reason that lists are not
> weak-referenceable (it would cost one extra pointer field per
> instance and that wasn't deemed to be worth it).
>
> > The brilliant computer scientist, Christian Heimes, provides the
> > answers, and I am paraphrasing here, of course:
>
> IMHO, Christian IS a brilliant computer scientist, so I'll ignore
> the rude intention and take the sentence literally.
>

You are also a brilliant computer scientist, despite the fact that you
are defending a list implemenation that can't pop the first element
off the list in O(1) time.

From: Steven D'Aprano on 24 Jan 2010 06:20

On Sun, 24 Jan 2010 02:33:36 -0800, Steve Howell wrote:

> You are also a brilliant computer scientist, despite the fact that you
> are defending a list implemenation that can't pop the first element off
> the list in O(1) time.

You say that like it's a bad thing.

It's very simple: the trade-offs that the Python development team have
deliberately chosen aren't the same trade-offs that you prefer. That
doesn't make your trade-offs right and Python's wrong. They're just
different, and if Python lists had your preferred implementation, I
guarantee that somebody would be complaining about it right now.

If you're serious about wanting O(1) pops from the start of the list,
write your own list implementation and use it. You might even like to
make it public, so others can use it as well. But please stop with the
snide remarks and poorly disguised insults and back-handed compliments,
it's getting tedious.

Or just change your algorithm slightly -- it's not hard to turn an
algorithm that pops from the start of a list to one that pops from the
end of the list.

--
Steven

From: Steve Howell on 24 Jan 2010 13:59

On Jan 24, 3:20 am, Steven D'Aprano <st...(a)REMOVE-THIS-
cybersource.com.au> wrote:
> On Sun, 24 Jan 2010 02:33:36 -0800, Steve Howell wrote:
> > You are also a brilliant computer scientist, despite the fact that you
> > are defending a list implemenation that can't pop the first element off
> > the list in O(1) time.
>
> You say that like it's a bad thing.

It is.

> It's very simple: the trade-offs that the Python development team have
> deliberately chosen aren't the same trade-offs that you prefer. That
> doesn't make your trade-offs right and Python's wrong. They're just
> different, and if Python lists had your preferred implementation, I
> guarantee that somebody would be complaining about it right now.
>
> If you're serious about wanting O(1) pops from the start of the list,
> write your own list implementation and use it. You might even like to
> make it public, so others can use it as well. But please stop with the
> snide remarks and poorly disguised insults and back-handed compliments,
> it's getting tedious.

I will stop being snide, but I will be blunt, and if anybody
interprets my criticism as an insult, so be it.

The current algorithm is broken. It's a 20th century implementation
of lists built on top of a 20th century memory manager. It's at least
ten years behind the times.

> Or just change your algorithm slightly -- it's not hard to turn an
> algorithm that pops from the start of a list to one that pops from the
> end of the list.
>

The fact that you are proposing to reverse a list to work around its
performance deficiencies just confirms to me that the algorithm is
broken. I will concede the fact that most of CPython's tradeoffs are
driven by the limitations of the underlying memory manager. If realloc
() allowed you to easily release memory from the front of a previously
allocated block, we'd be talking about maybe a 10-line patch here, and
it wouldn't impact even list_resize() in a meaningful way.

Even with realloc()'s brokenness, you could improve pop(0) in a way
that does not impact list access at all, and the patch would not
change the time complexity of any operation; it would just add
negligible extract bookkeeping within list_resize() and a few other
places.

The objection that the extra pointer would increase the size of list
objects is totally 20th century thinking. It would be totally
negligible for any real world program.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Prev: distutils not finding all of my pure python modules
Next: iterating lists