From: Arnaud Delobelle on
Steve Howell <showell30(a)yahoo.com> writes:
[...]
> My algorithm does exactly N pops and roughly N list accesses, so I
> would be going from N*N + N to N + N log N if switched to blist.

Can you post your algorithm? It would be interesting to have a concrete
use case to base this discussion on.

--
Arnaud
From: Steve Howell on
On Jan 25, 1:32 pm, Arnaud Delobelle <arno...(a)googlemail.com> wrote:
> Steve Howell <showel...(a)yahoo.com> writes:
>
> [...]
>
> > My algorithm does exactly N pops and roughly N list accesses, so I
> > would be going from N*N + N to N + N log N if switched to blist.
>
> Can you post your algorithm?  It would be interesting to have a concrete
> use case to base this discussion on.
>

It is essentially this, in list_ass_slice:

if (d < 0) { /* Delete -d items */
if (ilow == 0) {
a->popped -= d;
a->ob_item -= d * sizeof(PyObject *);
list_resize(a, Py_SIZE(a));
}
else {
memmove(&item[ihigh+d], &item[ihigh],
(Py_SIZE(a) - ihigh)*sizeof(PyObject *));
list_resize(a, Py_SIZE(a) + d);
}
item = a->ob_item;
}

I am still working through the memory management issues, but when I
have a complete working patch, I will give more detail.

From: Steve Howell on
On Jan 25, 1:32 pm, Arnaud Delobelle <arno...(a)googlemail.com> wrote:
> Steve Howell <showel...(a)yahoo.com> writes:
>
> [...]
>
> > My algorithm does exactly N pops and roughly N list accesses, so I
> > would be going from N*N + N to N + N log N if switched to blist.
>
> Can you post your algorithm?  It would be interesting to have a concrete
> use case to base this discussion on.
>

I just realized you meant the Python code itself. It is here:

https://bitbucket.org/showell/shpaml_website/src/tip/shpaml.py

From: Steve Howell on
On Jan 25, 1:00 pm, Paul Rubin <no.em...(a)nospam.invalid> wrote:
> Steve Howell <showel...(a)yahoo.com> writes:
> > These are the reasons I am not using deque:
>
> Thanks for these.  Now we are getting somewhere.
>
> >   1) I want to use native lists, so that downstream methods can use
> > them as lists.
>
> It sounds like that could be fixed by making the deque API a proper
> superset of the list API.

That is probably a good idea.

> >   2) Lists are faster for accessing elements.
>
> It sounds like that could be fixed by optimizing deque somewhat.  Also,
> have you profiled your application to show that accessing list elements
> is actually using a significant fraction of its runtime and that it
> would be slowed down noticably by deque?  If not, it's a red herring.

I haven't profiled deque vs. list, but I think you are correct about
pop() possibly being a red herring.

It appears that the main bottleneck might still be the processing I do
on each line of text, which in my cases is regexes.

For really large lists, I suppose memmove() would eventually start to
become a bottleneck, but it's brutally fast when it just moves a
couple kilobytes of data around.

> >   3) I want to be able to insert elements into the middle of the list..
>
> I just checked, and was surprised to find that deque doesn't support
> this.  I'd say go ahead and file a feature request to add it to deque.
>

It might be a good thing to add just for consistency sake. If
somebody first implements an algorithm with lists, then discovers it
has overhead relating to inserting/appending at the end of the list,
then the more deque behaves like a list, the more easily they could
switch over their code to deque. Not knowing much about deque's
internals, I assume its performance for insert() would O(N) just like
list, although maybe a tiny bit slower.

> >   4) I have no need for rotating elements.
>
> That's unpersuasive since you're advocating adding a feature to list
> that many others have no need for.  
>

To be precise, I wasn't really advocating for a new feature but an
internal optimization of a feature that already exists.

> > Adding a word or two to a list is an O(1) addition to a data structure
> > that takes O(N) memory to begin with.
>
> Yes, as mentioned, additive constants matter.
>
> > Another way of looking at it is that you would need to have 250 or so
> > lists in memory at the same time before the extra pointer was even
> > costing you kilobytes of memory.
>
> I've often run applications with millions of lists, maybe tens of
> millions.  Of course it would be 100's of millions if the machines were
> big enough.
>

I bet even in your application, the amount of memory consumed by the
PyListObjects themselves is greatly dwarfed by other objects, notably
the list elements themselves, not to mention any dictionaries that
your app uses.

> > My consumer laptop has 3027908k of memory.
>
> I thought the idea of buying bigger machines was to solve bigger
> problems, not to solve the same problems more wastefully.

Well, I am not trying to solve problems wastefully here. CPU cycles
are also scarce, so it seems wasteful to do an O(N) memmove that could
be avoided by storing an extra pointer per list. I also think that
encouraging the use of pop(0) would actually make many programs more
memory efficient, in the sense that you can garbage collect list
elements earlier.

Thanks for your patience in responding to me, despite the needlessly
abrasive tone of my earlier postings. I am coming around to this
thinking:

1) Summarize all this discussion and my lessons learned in some kind
of document. It does not have to be a PEP per se, but I could provide
a useful service to the community by listing pros/cons/etc.

2) I would still advocate for removing the warning against list.pop
(0) from the tutorial. I agree with Steven D'Aprano that docs really
should avoid describing implementation details in many instances
(although I do not know what he thinks about this particular case). I
also think that the performance penalty for pop(0) is negligible for
most medium-sized programs. For large-sized programs where you really
want to swap in deque, I think most authors are beyond reading the
tutorial and are looking elsewhere for insight on Python data
structures.

3) I am gonna try to implement the patch anyway for my own
edification.

4) I do think that there are ways that deque could be improved, but
it is not high on my priority list. I will try to mention it in the
PEP, though.

From: Steve Howell on
On Jan 25, 1:32 pm, Arnaud Delobelle <arno...(a)googlemail.com> wrote:
> Steve Howell <showel...(a)yahoo.com> writes:
>
> [...]
>
> > My algorithm does exactly N pops and roughly N list accesses, so I
> > would be going from N*N + N to N + N log N if switched to blist.
>
> Can you post your algorithm?  It would be interesting to have a concrete
> use case to base this discussion on.
>

These are the profile results for an admittedly very large file
(430,000 lines), which shows that pop() consumes more time than any
other low level method. So pop() is not a total red herring. But I
have to be honest and admit that I grossly overestimated the penalty
for smaller files. Typical files are a couple hundred lines, and for
that use case, pop()'s expense gets totally drowned out by regex
handling. In other words, it's a lot cheaper to move a couple hundred
pointers per list element pop than it is to apply a series of regexes
to them, which shouldn't be surprising.

ncalls tottime percall cumtime percall filename:lineno
(function)
230001/1 149.508 0.001 222.432 222.432 /home/showell/workspace/
shpaml_website/shpaml.py:192(recurse)
429999 17.667 0.000 17.667 0.000 {method 'pop' of 'list'
objects}
530000 8.428 0.000 14.125 0.000 /home/showell/workspace/
shpaml_website/shpaml.py:143(get_indented_block)
3700008 7.877 0.000 7.877 0.000 {built-in method match}
5410125/5410121 5.697 0.000 5.697 0.000 {len}
300000 3.938 0.000 22.286 0.000 /home/showell/workspace/
shpaml_website/shpaml.py:96(convert_line)
959999 3.847 0.000 6.759 0.000 /home/showell/workspace/
shpaml_website/shpaml.py:29(INDENT)
959999 3.717 0.000 12.547 0.000 /home/showell/workspace/
shpaml_website/shpaml.py:138(find_indentation)
370000 3.495 0.000 20.204 0.000 /home/showell/workspace/
shpaml_website/shpaml.py:109(apply_jquery)
370000 3.322 0.000 6.528 0.000 {built-in method sub}
1469999 2.575 0.000 2.575 0.000 {built-in method groups}

As an aside, I am a little surprised by how often I call len() and
that it also takes a large chunk of time, but that's my problem to
fix.