From: Steve Howell on
On Jan 22, 2:54 pm, Dave Angel <da...(a)ieee.org> wrote:
> Steve Howell wrote:
> > On Jan 22, 12:40 pm, Christian Heimes <li...(a)cheimes.de> wrote:
>
> >> Steve Howell wrote:
>
> >>> Is that really true in CPython?  It seems like you could advance the
> >>> pointer instead of shifting all the elements.  It would create some
> >>> nuances with respect to reclaiming the memory, but it seems like an
> >>> easy way to make lists perform better under a pretty reasonable use
> >>> case.
>
> >>> Does anybody know off the top of their head if the "have-to-be-shifted-
> >>> by-one" warning is actually valid?
>
> >> Why do you think the documentation has such obvious errors?
>
> > I wasn't making any assumptions, hence the question mark.  The Python
> > docs are very good, but even the best of projects make advances that
> > aren't reflected in the docs.
>
> >> CPython's list type uses an array of pointers to store its members. The
> >> type is optimized for the most common list operations in Python:
> >> iteration and appending. Python code rarely changes the head or middle
> >> of a list. The dequeue type is an optimized data structure for popping
> >> and inserting data at both ends.
>
> > I disagree that Python code rarely pops elements off the top of a
> > list.  There are perfectly valid use cases for wanting a list over a
> > dequeue without having to pay O(N) for pop(0).  Maybe we are just
> > quibbling over the meaning of "rarely."
>
> >> When you list.pop() or list.insert() the pointers in the internal array
> >> must be shifted. appending is much faster because the internal array is
> >> overallocated meaning it contains free slots at the tail of the data
> >> structure. A linked list of pointers requires more memory and iteration
> >> is slower since since it can't utilize the CPU's cache as good as an array.
>
> > I am not proposing a linked list of pointers.  I am wondering about
> > something like this:
>
> > p =p[1];
> > (and then reclaim p[0] as free memory, I already said I understood
> > that was the tricky bit)
>
> > The pointer arithmetic for accessing each element would still work in O
> > (1), right?
>
> I think it was Dijkstr (sp?) who said you can accomplish anything with
> just one more level of indirection.  Clearly each attribute (variable)
> that has a binding to a given list must point to a fixed piece of
> memory, that cannot safely be moved, because there's no efficient way to
> find all the attributes.   That fixed piece is the list object, and I
> expect it's 16 or 20 bytes, on a 32bit implementation.  It must in turn
> point to the actual malloc'ed block that contains pointers to all the
> elements of the list.  So that block will be 4*n where n is the number
> of reserved cells, at least as large as len().  This is the block where
> copying takes place when you insert or delete from the beginning.
>

The indirection is already in Python, as it (at least appears to me)
that everything is deferenced off of an ob_item pointer:

http://svn.python.org/view/python/trunk/Objects/listobject.c?view=markup

> The list object must contain a pointer to the beginning of this block,
> or it wouldn't be able to free() it later.  So you'd be suggesting a
> second pointer that actually points to the current 0th pointer.  And a
> pop would simply increment this second pointer.
>

Yes, ob_item would point to the 0th element pointer and pop would
simply increment it.

The additional bookkeeping would be the original pointer.

> Such an approach might be worthwhile if you expect lots of pops and
> pushes, with a minimal net change.  But of course each time you did a
> pop, you'd have to decide whether it was time to normalize/compact  the
> block.
>

Yes, and that of course is the tricky bit.

> As you say, reclaiming the 0th element of this block to the memory pool
> would be tricky.  

I should clarify that a bit. Reclaiming the 0th element *cheaply* is
tricky, unless you want to rewrite the memory manager. But I also
think you can, of course, defer reclaiming the element.

> Doubly so, because  1) the C memory allocator has no
> such notion as resizing the beginning of a block. and 2) it would have
> nothing useful to do with the 4 bytes freed.  The minimum allocated
> block in Python is probably 16 bytes of actual address space.  I'd guess
> that's 4 bytes for malloc's overhead, and 8 bytes for the minimum object
> header, and 4 bytes for data.  To check me, try:
>
>  >>> a = 5.3
>  >>> b = 49.6
>  >>> id(a), id(b)
> (11074136, 11074152)
>
> Anyway, this could be done, where once the two pointers get some
> distance apart, you do a realloc, and copy everything.  But of course
> you'd want to build some hysteresis into it, to avoid thrashing.
> , but

There wouldn't be any additional thrashing beyond what happens now.
You'd simply avoid the first N-1 memmoves of up to kN bytes at the
cost of not reclaiming those k(N-1) bytes right away. I suppose it's
a more "bursty" penalty you'd be paying, but the peak of the "bursty"
curve is no wider than the "constant" curve, it's just N times
narrower!

> There wouldn't be much of a performance hit, but it would increase every
> list size by 4 bytes minimum.  So I doubt it would be a list replacement.
>

I don't think that's the reason people would oppose this, although you
are true about the penalty. Memory's cheap, you'd need about a
quarter million lists just to fill up a meg.

CPU cycles, on the other hand, are expensive, as users' demand for
responsive programs seems to do a better job of keeping up with
Moore's Law.

I'd also argue that the memory you keep around to avoid unnecessary
memmoves() will almost always be dwarfed by the memory used by the
list elements themselves.


From: Steve Howell on
On Jan 22, 3:17 pm, Christian Heimes <li...(a)cheimes.de> wrote:
> Steve Howell wrote:
> > That maybe would be an argument for just striking the paragraph from
> > the tutorial.  If it's rare that people pop the head off the list in
> > code that is performance critical or prominent, why bother to even
> > mention it in the tutorial?
>
> How else are you going to teach new Python developers that they should
> favor append() and pop() in place of insert() and pop(0)? :)
>
> > Impossible is a strong word.  You could be lazy about giving the
> > memory back, and just wait till the whole list is garbage collected.
> > I don't think there's anything in Python's contract that says memory
> > has to be freed the exact moment you stop using it, especially since
> > we're talking about doing an O(N) memmove just to free up one
> > pointer's worth of memory.
>
> Your proposal would increase the size of every list object of
> sizeof(ptr) or ssize_t (32 or 64bits) in order to store the information
> where the first element is. It would also unnecessarily complicate the
> code and possible slow down a lot of list operations.
>

64 bits per list is negligible.

Adding a check for (ilow == 0) is even more negligible.

You are not "unnecessarily" complicating code for something as widely
used as Python lists if it achieves the desired benefit at minimal
cost. You are complicating the code, but not "unnecessarily."

> > I know the Python programmer could simply iterate through the list
> > rather than popping off unused elements, but that just means that you
> > not only tie up the memory for the pointers just as long, but you also
> > prevent the objects themselves from being garbage collected.
>
> > In my case I'm not really concerned about giving the memory back right
> > away, it's more about keeping my code simple.  Once I'm done with an
> > element, I do just want to pop it off and keep all the simplicity of
> > the lists, but I am just concerned enough speed that for a 1000
> > element list, I don't want to be doing 1000 memmoves for an average of
> > 500 pointers, which effectively moves about a million bytes around for
> > no reason.
>
> The simplicity of Python is gained with some performance drawbacks. You
> have to learn to use Python algorithms. You can't simply re implement a
> fast C algorithm and expect it to be fast in Python, too.
>

I actually do expect Python to solve performance problems for me that
are more easily solved in core than in Python itself. So I guess
that's where we differ.
From: Steven D'Aprano on
On Fri, 22 Jan 2010 14:38:18 -0800, Steve Howell wrote:

> I know the Python programmer could simply iterate through the list
> rather than popping off unused elements, but that just means that you
> not only tie up the memory for the pointers just as long, but you also
> prevent the objects themselves from being garbage collected.
>
> In my case I'm not really concerned about giving the memory back right
> away, it's more about keeping my code simple. Once I'm done with an
> element, I do just want to pop it off and keep all the simplicity of the
> lists, but I am just concerned enough speed that for a 1000 element
> list, I don't want to be doing 1000 memmoves for an average of 500
> pointers, which effectively moves about a million bytes around for no
> reason.
>
> I suppose the solution is to either give up the sugar of lists, or try
> to wrap something like deque or list-with-offset into a sequence.


I don't understand what the actual problem you're trying to solve is.
Despite your disclaimer about not being concerned about reclaiming the
memory, it sounds like you're trying to micro-optimize memory usage. That
is, you give me the impression that you prefer this:

while alist:
x = alist.pop(0)
process(x)

over this:

for x in alist:
process(x)
# allow alist to be garbage collected when it goes out of scope


That strikes me as a pointlessly trivial optimization, even if deleting
at the start of the list was efficient.

But whatever your reason, if you want to insert and delete efficiently
from both ends of the sequence, use a deque. If you are only doing a
small number of insertions/deletions at the beginning, and so don't care
about inefficiency, use a list.

If you only want to insert/delete from one end, use a list. Instead of:

alist.insert(0, x)
alist.pop(0)

use:

alist.append(x)
alist.pop()

That's fast and efficient. In some cases it doesn't matter which order
the list is, but if it does matter, the worst case is that you need to
call alist.reverse() occasionally, which is quite fast. Or iterate over
the list in reverse order, which is even faster.

So what am I missing?



--
Steven
From: Roy Smith on
In article <mailman.1283.1264192814.28905.python-list(a)python.org>,
Christian Heimes <lists(a)cheimes.de> wrote:

> Steve Howell wrote:
> > Is that really true in CPython? It seems like you could advance the
> > pointer instead of shifting all the elements. It would create some
> > nuances with respect to reclaiming the memory, but it seems like an
> > easy way to make lists perform better under a pretty reasonable use
> > case.
> >
> > Does anybody know off the top of their head if the "have-to-be-shifted-
> > by-one" warning is actually valid?
>
> Why do you think the documentation has such obvious errors?
>
> CPython's list type uses an array of pointers to store its members. The
> type is optimized for the most common list operations in Python:
> iteration and appending. Python code rarely changes the head or middle
> of a list. The dequeue type is an optimized data structure for popping
> and inserting data at both ends.
>
> When you list.pop() or list.insert() the pointers in the internal array
> must be shifted.

Well, at least in theory you could make pop(0) run in O(1). All you need
to do is have each list store an offset. Initially it's 0, and pop(0)
would just increment the offset. Then, all references to alist[i] would
turn into array[i+offset].

Of course, that's a lot of complexity to optimize a relatively rare use
case. You're probably better off just using a dequeue :-)
From: Roy Smith on
In article
<3ac173bd-4124-434d-b726-0b9baaeec752(a)36g2000yqu.googlegroups.com>,
Steve Howell <showell30(a)yahoo.com> wrote:

> In my case I'm not really concerned about giving the memory back right
> away, it's more about keeping my code simple. Once I'm done with an
> element, I do just want to pop it off and keep all the simplicity of
> the lists, but I am just concerned enough speed that for a 1000
> element list, I don't want to be doing 1000 memmoves for an average of
> 500 pointers, which effectively moves about a million bytes around for
> no reason.

If you really want to pop all the elements from a long list, reverse the
list and pop them off the end. Popping every element off the beginning of
the list is O(n^2), as you pointed out. Reversing the list is O(n), and
each pop after that is O(1), so the overall complexity is O(n).