From: Paul Rubin on
Steve Howell <showell30(a)yahoo.com> writes:
> I haven't profiled deque vs. list, but I think you are correct about
> pop() possibly being a red herring....
> For really large lists, I suppose memmove() would eventually start to
> become a bottleneck, but it's brutally fast when it just moves a
> couple kilobytes of data around.

One way to think of Python is as a scripting wrapper around a bunch of C
functions, rather than as a full-fledged programming language. Viewed
that way, list operations like pop(0) are essentially constant time
unless the list is quite large. By that I mean you can implement
classic structures like doubly-linked lists using Python tuples, but
even though inserting into the middle of them is theoretically O(1), the
memmove's of the native list operations will be much faster in practice.
Programs dealing with large lists (more than a few thousand elements)
are obviously different and if your program is using such large lists,
you have to plan a little differently when writing the code.

>> I've often run applications with millions of lists
> I bet even in your application, the amount of memory consumed by the
> PyListObjects themselves is greatly dwarfed by other objects, notably
> the list elements themselves

Such lists often would just one element or even be empty. For example,
you might have a dictionary mapping names to addresses. Most people
have just one address, but some might have no address, and a few might
have more than one address, so you would have a list of addresses for
each name. Of course the dictionary slots in that example would also
use space.

> Well, I am not trying to solve problems wastefully here. CPU cycles
> are also scarce, so it seems wasteful to do an O(N) memmove that could
> be avoided by storing an extra pointer per list.

Realistically the CPython interpreter is so slow that the memmove is
unnoticable, and Python (at least CPython) just isn't all that
conductive to writing fast code. It makes up for this in programmer
productivity for the many sorts of problems in which moderate speed
is acceptable.

> Thanks for your patience in responding to me, despite the needlessly
> abrasive tone of my earlier postings.

I wondered whether you might have come over from the Lisp newsgroups,
which are pretty brutal. We try to be friendlier here (not that we're
always successful). Anyway, welcome.

> 1) Summarize all this discussion and my lessons learned in some kind
> of document. It does not have to be a PEP per se, but I could provide
> a useful service to the community by listing pros/cons/etc.

I suppose that can't hurt, but there are probably other areas (multicore
parallelism is a perennial one) of much higher community interest.

http://wiki.python.org/moin/ is probably a good place to put such
a document.

> 2) I would still advocate for removing the warning against list.pop
> (0) from the tutorial. I agree with Steven D'Aprano that docs really
> should avoid describing implementation details in many instances

On general principles I agree with Alex Stepanov that the running time
of a function should be part of its interface (nobody wants to use a
stack of popping an element takes quadratic time) and therefore should
be stated in the docs. Python just has a weird incongruence between the
interpreter layer and the C layer, combined with a library well-evolved
for everyday problem sizes, so the traditional asymptotic approach to
algorithm selection often doesn't give the best practical choice.

I don't feel like looking up what the tutorial says about pop(0), but if
it just warns against it without qualification, it should probably
be updated.
From: Steve Howell on
On Jan 24, 11:28 am, a...(a)pythoncraft.com (Aahz) wrote:
> In article <b4440231-f33f-49e1-9d6f-5fbce0a63...(a)b2g2000yqi.googlegroups.com>,
> Steve Howell  <showel...(a)yahoo.com> wrote:
>
>
>
> >Even with realloc()'s brokenness, you could improve pop(0) in a way
> >that does not impact list access at all, and the patch would not change
> >the time complexity of any operation; it would just add negligible
> >extract bookkeeping within list_resize() and a few other places.
>
> Again, your responsibility is to provide a patch and a spectrum of
> benchmarking tests to prove it.  Then you would still have to deal with
> the objection that extensions use the list internals -- that might be an
> okay sell given the effort otherwise required to port extensions to
> Python 3, but that's not the way to bet.
>

Ok, I just submitted a patch to python-dev that illustrates a 100x
speedup on an admittedly artificial program. It still has a long way
to go, but it demonstrates proof of concept. I'm done for the day,
but tomorrow I will try to polish it up and improve it, even if its
doomed for rejection. Apologies to all I have offended in this
thread. I frankly found some of the pushback to be a bit hasty and
disrespectful, but I certainly overreacted to some of the criticism.
And now I'm in the awkward position of asking the people I offended to
help me with the patch. If anybody can offer me a hand in
understanding some of CPython's internals, particularly with regard to
memory management, it would be greatly appreciated.

(Sorry I don't have a link to the python-dev posting; it is not
showing up in the archives yet for some reason.)


From: Steve Howell on
On Jan 25, 8:31 pm, Paul Rubin <no.em...(a)nospam.invalid> wrote:
> Steve Howell <showel...(a)yahoo.com> writes:
> > I haven't profiled deque vs. list, but I think you are correct about
> > pop() possibly being a red herring....
> > For really large lists, I suppose memmove() would eventually start to
> > become a bottleneck, but it's brutally fast when it just moves a
> > couple kilobytes of data around.
>
> One way to think of Python is as a scripting wrapper around a bunch of C
> functions, rather than as a full-fledged programming language.  Viewed
> that way, list operations like pop(0) are essentially constant time
> unless the list is quite large.  By that I mean you can implement
> classic structures like doubly-linked lists using Python tuples, but
> even though inserting into the middle of them is theoretically O(1), the
> memmove's of the native list operations will be much faster in practice.
> Programs dealing with large lists (more than a few thousand elements)
> are obviously different and if your program is using such large lists,
> you have to plan a little differently when writing the code.

Thanks. That is a good way of looking at.

>
> Realistically the CPython interpreter is so slow that the memmove is
> unnoticable, and Python (at least CPython) just isn't all that
> conductive to writing fast code.  It makes up for this in programmer
> productivity for the many sorts of problems in which moderate speed
> is acceptable.
>

Definitely, and moderate speed is enough in a surprisingly large
number of applications.


> > Thanks for your patience in responding to me, despite the needlessly
> > abrasive tone of my earlier postings.  
>
> I wondered whether you might have come over from the Lisp newsgroups,
> which are pretty brutal.  We try to be friendlier here (not that we're
> always successful).  Anyway, welcome.
>

:)

> >   1) Summarize all this discussion and my lessons learned in some kind
> > of document.  It does not have to be a PEP per se, but I could provide
> > a useful service to the community by listing pros/cons/etc.
>
> I suppose that can't hurt, but there are probably other areas (multicore
> parallelism is a perennial one) of much higher community interest.
>
> http://wiki.python.org/moin/is probably a good place to put such
> a document.
>

Ok, that's where I'll start.

> >   2) I would still advocate for removing the warning against list.pop
> > (0) from the tutorial.  I agree with Steven D'Aprano that docs really
> > should avoid describing implementation details in many instances
>
> On general principles I agree with Alex Stepanov that the running time
> of a function should be part of its interface (nobody wants to use a
> stack of popping an element takes quadratic time) and therefore should
> be stated in the docs.  Python just has a weird incongruence between the
> interpreter layer and the C layer, combined with a library well-evolved
> for everyday problem sizes, so the traditional asymptotic approach to
> algorithm selection often doesn't give the best practical choice.
>
> I don't feel like looking up what the tutorial says about pop(0), but if
> it just warns against it without qualification, it should probably
> be updated.

Here it is:

http://docs.python.org/tutorial/datastructures.html#using-lists-as-queues

My opinion is that the warning should be either removed or qualified,
but it is probably fine as written.

'''
It is also possible to use a list as a queue, where the first element
added is the first element retrieved (“first-in, first-out”); however,
lists are not efficient for this purpose. While appends and pops from
the end of list are fast, doing inserts or pops from the beginning of
a list is slow (because all of the other elements have to be shifted
by one).
'''

The qualifications would be that deque lacks some features that list
has, and that the shift-by-one operation is actually a call to memmove
() and may not apply to all implementations.

From: Steve Holden on
Steve Howell wrote:
> On Jan 24, 11:28 am, a...(a)pythoncraft.com (Aahz) wrote:
>> In article <b4440231-f33f-49e1-9d6f-5fbce0a63...(a)b2g2000yqi.googlegroups.com>,
>> Steve Howell <showel...(a)yahoo.com> wrote:
>>
>>
>>
>>> Even with realloc()'s brokenness, you could improve pop(0) in a way
>>> that does not impact list access at all, and the patch would not change
>>> the time complexity of any operation; it would just add negligible
>>> extract bookkeeping within list_resize() and a few other places.
>> Again, your responsibility is to provide a patch and a spectrum of
>> benchmarking tests to prove it. Then you would still have to deal with
>> the objection that extensions use the list internals -- that might be an
>> okay sell given the effort otherwise required to port extensions to
>> Python 3, but that's not the way to bet.
>>
>
> Ok, I just submitted a patch to python-dev that illustrates a 100x
> speedup on an admittedly artificial program. It still has a long way
> to go, but it demonstrates proof of concept. I'm done for the day,
> but tomorrow I will try to polish it up and improve it, even if its
> doomed for rejection. Apologies to all I have offended in this
> thread. I frankly found some of the pushback to be a bit hasty and
> disrespectful, but I certainly overreacted to some of the criticism.
> And now I'm in the awkward position of asking the people I offended to
> help me with the patch. If anybody can offer me a hand in
> understanding some of CPython's internals, particularly with regard to
> memory management, it would be greatly appreciated.
>
> (Sorry I don't have a link to the python-dev posting; it is not
> showing up in the archives yet for some reason.)
>
>
Fortunately for you this is a very welcoming group, and particularly
responsive to individuals who have seen the error of their ways ;-)

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010 http://us.pycon.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS: http://holdenweb.eventbrite.com/
From: Arnaud Delobelle on
Steve Howell <showell30(a)yahoo.com> writes:

> On Jan 25, 1:32 pm, Arnaud Delobelle <arno...(a)googlemail.com> wrote:
>> Steve Howell <showel...(a)yahoo.com> writes:
>>
>> [...]
>>
>> > My algorithm does exactly N pops and roughly N list accesses, so I
>> > would be going from N*N + N to N + N log N if switched to blist.
>>
>> Can you post your algorithm?  It would be interesting to have a concrete
>> use case to base this discussion on.
>>
>
> I just realized you meant the Python code itself. It is here:
>
> https://bitbucket.org/showell/shpaml_website/src/tip/shpaml.py

Hi - I didn't have the time to look at it before today. You scan
through a list called prefix_lines, and you use pop(0) as a means to
keep track of your position in this list. It seems to me that it would
be more effective to explicitely keep track of it - and it would remove
the need to the numerous copies of sublists of indent_lines that you
have to create.

I have rewritten part of the three relevant functions (html_block_tag,
get_indented_block and recurse, within indent_lines) to show you what I
mean (see below). I have tried to keep the changes to a minimum - you
can see that the code is no more complex like this. The advantage is
that only one list is created, prefix_lines, and there is no need to
mutate it or copy parts of it during the algorithm. I have not had the
time to test it though (only on one of the examples on the examples
webpage).

Code follows:

[...]

def html_block_tag(output, block, start, end, recurse):
append = output.append
prefix, tag = block[start]
if RAW_HTML.regex.match(tag):
append(prefix + tag)
recurse(block, start + 1, end)
else:
start_tag, end_tag = apply_jquery_sugar(tag)
append(prefix + start_tag)
recurse(block, start + 1, end)
append(prefix + end_tag)

[...]

def get_indented_block(prefix_lines, start, end):
prefix, line = prefix_lines[start]
len_prefix = len(prefix)
i = start + 1
while i < end:
new_prefix, line = prefix_lines[i]
if line and len(new_prefix) <= len_prefix:
break
i += 1
while i-1 > start and prefix_lines[i-1][1] == '':
i -= 1
return i

[...]

def indent_lines(lines,
output,
branch_method,
leaf_method,
pass_syntax,
flush_left_syntax,
flush_left_empty_line,
indentation_method,
get_block,
):
append = output.append
def recurse(prefix_lines, start, end):
while start < end:
prefix, line = prefix_lines[start]
if line == '':
start += 1
append('')
else:
block_end = get_block(prefix_lines, start, end)
if block_end == start + 1:
start += 1
if line == pass_syntax:
pass
elif line.startswith(flush_left_syntax):
append(line[len(flush_left_syntax):])
elif line.startswith(flush_left_empty_line):
append('')
else:
append(prefix + leaf_method(line))
else:
branch_method(output, prefix_lines, start, block_end, recurse)
start = block_end
return
prefix_lines = map(indentation_method, lines)
recurse(prefix_lines, 0, len(prefix_lines))