From: Terry Reedy on
On 1/23/2010 3:23 AM, Steve Howell wrote:
> On Jan 23, 12:13 am, Terry Reedy<tjre...(a)udel.edu> wrote:
>>
>> Challenge yes, mock no.
>>
>> Part of writing good basic data structures is not adding needless
>> complication from featuritis and not penalizing 99.99% of access to
>> satify a .01% need better satisfied another way.
>>
>
> I would like to challenge your assertion that advancing ob_item
> instead of doing memmove during list_ass_slice would impact the
> performance of list accesses in any way. It would only slow down
> operations that add/insert items into the list by, and then only by a
> single conditional statement, and those add/insert operations are
> already O(N) to begin with.

If you try writing a full patch, as I believe someone did, or at least a
prototype thereof, when the idea was discussed, you might have a better
idea of what the tradeoffs are and why it was rejected.

For instance, when you append to a full list, it is resized. I believe
it is now doubled, but the policy has varied over the years. If you turn
list from essentially a stack to a deque, you complicate the resizing
policy and have to consider the addition of a shift policy. Do you split
the over-allocated fore and aft or increase the overallocation from
double to, say, triple? If the former, then for normal usage that never
uses the fore part, the over-allocation has been effectively reduced
from 2x to 1.5x (which is about what it once was, I believe). This means
more frequent resizings and copyings, which means slower operation for
most use cases. Adding extra usually wasted space is also a nuisance.

Terry Jan Reedy

From: Steve Howell on
On Jan 23, 12:13 am, Terry Reedy <tjre...(a)udel.edu> wrote:
> On 1/23/2010 12:58 AM, Steve Howell wrote:
>
> > I really want to use list *normally* with all its perfectly good
> > semantics and reasonable implementation, except for its blind spot
> > with respect to popping the first element off the list.
>
> It was not designed for that. .pop() was added to lists about 10 years
> ago because I asked for it (with no parameter, pop off end only) and
> wrote what would now be a PEP -- and because Tim Peters later supported
> the idea. Adding the optional parameter was something of an afterthought
> (never publicly discussed as far as I know) for occasional use for
> 'short' lists where O(n) is tolerable. You have half persuaded me that
> that the parameter addition was a mistake. Perhaps is is too attractice
> a nuisance for some people ;=).
>

pop(0) is a useful idiom in parsers. You can see examples in
ElementTree and lib2to3.

Even without pop(0), people would still write code like this, found in
pstats.py:

arg = args[0]
args = args[1:]

It is sometimes overkill (and even inappropriate) to use a queue when
really you just want a list. Iterators are great, but they also have
slightly different semantics than the list itself.

There is nothing wrong with a language specification that allows users
to do insert, delete, and pop on a list. Once you freeze the language
specification, then you can turn your attention to improving the
implementation.

From: Alf P. Steinbach on
* Steve Howell:
> On Jan 23, 12:32 am, "Alf P. Steinbach" <al...(a)start.no> wrote:
>> * Steve Howell:
>>
>>> On Jan 23, 12:13 am, Terry Reedy <tjre...(a)udel.edu> wrote:
>>>> Challenge yes, mock no.
>>>> Part of writing good basic data structures is not adding needless
>>>> complication from featuritis and not penalizing 99.99% of access to
>>>> satify a .01% need better satisfied another way.
>>> I would like to challenge your assertion that advancing ob_item
>>> instead of doing memmove during list_ass_slice would impact the
>>> performance of list accesses in any way. It would only slow down
>>> operations that add/insert items into the list by, and then only by a
>>> single conditional statement, and those add/insert operations are
>>> already O(N) to begin with.
>> I'm sorry, no, the last part is incorrect.
>>
>> Appending to a 'list' can currently be constant time, if OS reallocation is
>> constant time (as the string '+' optimization relies on).
>>
>
> That's true, but it's also irrelevant, as the pop optimization would
> happen in a branch of code that never gets executed during list
> appending:
>
> if (d < 0) { /* Delete -d items */
> /* ADD CODE HERE TO AVOID memmove
> when ilow == 0 (i.e. ihigh + d == 0)
> */
> memmove(&item[ihigh+d], &item[ihigh],
> (Py_SIZE(a) - ihigh)*sizeof(PyObject *));
> list_resize(a, Py_SIZE(a) + d);
> item = a->ob_item;
> }
>
>
>> With the pop optimization it can no longer be constant time without risking an
>> accumulation of unused memory, a memory leak, although it can be amortized
>> constant time, at the cost of wasting some percentage of memory.
>
> list_resize already overallocates memory to allow room for growth.
> Whenever you did an append to the list that would force a realloc, you
> could first check to see if there is unused stuff at the front and do
> the memmove then and reclaim the unfreed memory. So instead of doing
> a paying for memmove on every pop [at front], you only pay for it when
> the list goes to size 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, etc.

Oh. If 'list' already uses a doubling buffer then the only overhead from the
optimization would, AFAICS, be a single add in every indexing. Which might be
acceptable (at least it sounds very reasonable in the context of Python).

Re terminology: I write "doubling buffer" to mean increase of buffer size by a
factor. It's often 2, but might be e.g. 1.5, whatever. The point of using a
constant factor is to avoid quadratic time for loops doing appending, i.e. the
constant factor size increase yields amortized constant time per append.

The sizes that you quote above, on the other hand, look like rather arbitrary
buffer size increases where the size to increase by is limited to a certain
small range. With copying or moving of everything at each buffer size increase
that would not yield amortized constant time. It yield linear time, and
quadratic time for a loop doing appends.

But, anyway, if 'list' already uses a doubling buffer then the only overhead
from the pop optimization would, AFAICS, be a single add in every indexing.

On the third & gripping hand, however, a proof-of-concept actual implementation
(patch) would be needed to ensure that one doesn't overlook any showstopper or
serious problem, and to provide timings. And the special case would have to be
documented as a special case. Hm, it would be nice if the Python docs offered
complexity (time) guarantees in general...


Cheers,

- Alf
From: Steve Howell on
On Jan 23, 1:24 am, Terry Reedy <tjre...(a)udel.edu> wrote:
> On 1/23/2010 3:23 AM, Steve Howell wrote:
>
> > On Jan 23, 12:13 am, Terry Reedy<tjre...(a)udel.edu>  wrote:
>
> >> Challenge yes, mock no.
>
> >> Part of writing good basic data structures is not adding needless
> >> complication from featuritis and not penalizing 99.99% of access to
> >> satify a .01% need better satisfied another way.
>
> > I would like to challenge your assertion that advancing ob_item
> > instead of doing memmove during list_ass_slice would impact the
> > performance of list accesses in any way.  It would only slow down
> > operations that add/insert items into the list by, and then only by a
> > single conditional statement, and those add/insert operations are
> > already O(N) to begin with.
>
> If you try writing a full patch, as I believe someone did, or at least a
> prototype thereof, when the idea was discussed, you might have a better
> idea of what the tradeoffs are and why it was rejected.
>
> For instance, when you append to a full list, it is resized. I believe
> it is now doubled, but the policy has varied over the years. If you turn
> list from essentially a stack to a deque, you complicate the resizing
> policy and have to consider the addition of a shift policy. Do you split
> the over-allocated fore and aft or increase the overallocation from
> double to, say, triple? If the former, then for normal usage that never
> uses the fore part, the over-allocation has been effectively reduced
> from 2x to 1.5x (which is about what it once was, I believe). This means
> more frequent resizings and copyings, which means slower operation for
> most use cases. Adding extra usually wasted space is also a nuisance.
>

It looks like most of the complication would be in list_resize.

I'm gonna oversimplify a bit, but tell me if this is the gist of it.

I would have ob_item continue to always refer to first element of the
list, and then I'd have to introduce another variable to refer to the
start of our allocated memory, ob_start_memory, whenever you do a
realloc/free/malloc. I'd have a notion of fore_wastage, which would
either be a variable I maintain or something that I just calculate as
needed from the difference of ob_item and ob_start_memory.

In deciding whether I want to give memory back to the memory manager,
I just need to adjust my calculations to account for fore and aft
wastage to see if it's time to do a shrink, and before shrinking, I do
the memmove.

On growth, I would just always do a memmove right away if there is
fore_wastage, and then do the normal procedure for aft wastage.

For the most common scenario of append, append, append, the only
penalty is having to skip over fore_wastage logic by checking for
fore_wastage == 0 or ob_item == ob_start_memory.

For the scenario of several appends followed by several pops, I get
the big win of only doing log 2 N memmoves instead of N as I shrink
the list down to zero.

If I start alternating between pops and appends, it's basically a
wash...instead of doing the memmove on the pop, I do it on the next
append.

If I were to pop the top element and then prepend a new element, to be
truly efficient, I'd want to use reserved right away, but for
simplicity, I would probably not complicate list_ass_slice and just do
the normal resize() dance, which would give me memmove in one
direction followed immediately by a memmove in the other direction
when I come back to list_ass_slice. (But it would all still be a
wash, since I would have to do the same number of memmoves in the
current implementation.)

A lot of the essential complexity here seems to come from the fact
that realloc() isn't a very robust abstraction. It seems to be
expensive to tell it you want to shrink, and it also does not have an
interface to tell it to give you a little growing room. On the other
hand, the code within list_resize() actually provides a nice framework
for amortizing memmoves exponentially.
From: Steven D'Aprano on
On Fri, 22 Jan 2010 21:42:43 -0800, Steve Howell wrote:

> This innocent program here literally moves about a million bytes of
> memory around for no good reason:
>
> lst = []
> for i in range(2000):
> lst.append(i)
> while lst:
> print lst.pop(0)
>
> Why? Because list.pop(0) is implemented in O(N) instead of O(1).
>
> Why wouldn't you get a competent C programmer simply make list_ass_slice
> smart enough to make list.pop(0) O(1)?

Because there are always trade-offs, and the competent C programmers who
coded the implementation for lists choose different tradeoffs to the ones
you would prefer.

Seems to me that the simple solution to your problem is for you to
implement your own data structure that makes whatever trade-offs you
like. If it is good enough and popular enough, it might even replace the
existing list implementation.



>> That is, you give me the impression that you prefer this:
>>
>> while alist:
>>     x = alist.pop(0)
>>     process(x)
>>
>> over this:
>>
>> for x in alist:
>>     process(x)
>> # allow alist to be garbage collected when it goes out of scope
>>
>>
> No, to be more precise, I prefer this implementation of a recursive
> parser (using lists) to one that would have to use deque's because of
> the lameness of Python's list implementation:
>
> https://bitbucket.org/showell/shpaml_website/src/tip/shpaml.py


That's a lot of code. Am I supposed to study the whole module, or can you
give me a hint as to what you're referring to? The lack of docstrings and
comments doesn't fill me with enthusiasm for reading it.

Nevertheless, on the basis of a quick scan, I suppose that you're
probably talking about the nested function called recurse:

def recurse(prefix_lines):
while prefix_lines:
prefix, line = prefix_lines[0]
if line == '':
prefix_lines.pop(0)
append('')
else:
block_size = get_block(prefix_lines)
if block_size == 1:
prefix_lines.pop(0)
if line == pass_syntax:
pass
elif line.startswith(flush_left_syntax):
append(line[len(flush_left_syntax):])
elif line.startswith(flush_left_empty_line):
append('')
else:
append(prefix + leaf_method(line))
else:
block = prefix_lines[:block_size]
prefix_lines = prefix_lines[block_size:]
branch_method(output, block, recurse)
return


Since you're not even looking at the results of the pop, why don't you
just call del prefix_lines[0]? It probably won't perform any better, but
it is more idiomatic.

An alternative would be to do exactly what you want lists to do: track
the start of the list. Untested:


def recurse(prefix_lines):
start = 0
end = len(prefix_lines)
while start < end:
prefix, line = prefix_lines[start]
if line == '':
start += 1
append('')
else:
block_size = get_block(prefix_lines)
if block_size == 1:
start += 1
if line == pass_syntax:
pass
elif line.startswith(flush_left_syntax):
append(line[len(flush_left_syntax):])
elif line.startswith(flush_left_empty_line):
append('')
else:
append(prefix + leaf_method(line))
else:
block = prefix_lines[:block_size]
start = block_size
branch_method(output, block, recurse)
return


No more O(N) deletions. Problem solved.




--
Steven