sum for sequences? [Python]

Prev: Advice needed on parallel processing in python
Next: coding contest

From: Patrick Maupin on 28 Mar 2010 13:27

On Mar 28, 11:56 am, "Alf P. Steinbach" <al...(a)start.no> wrote:
> From a more practical point of view, the sum efficiency could be improved by
> doing the first addition using '+' and the rest using '+=', without changing the
> behavior.

Mod parent up!!!! :-)

From: Steve Howell on 28 Mar 2010 13:34

On Mar 28, 10:21 am, Steve Howell <showel...(a)yahoo.com> wrote:
> import timeit
> M = 10
> N = 8000
>
> def in_place(
> start = [],
> sublists = ([[None] * M]) * N
> ):
> # only macro-optimized
> i = 0
> for sublist in sublists:
> if i == 0:
> accum = start + sublist
> i += 1
> else:
> accum.extend(sublist)

FYI I later obtained similar results with the more general:
accum += sublist

> if i == 0:
> return 'whatever' # semantics here?
> return accum
>
> def with_intermediates(
> start = [],
> sublists = ([[None] * M]) * N
> ):
> accum = start
> for sublist in sublists:
> accum = accum + sublist
> return accum
>

From: Patrick Maupin on 28 Mar 2010 14:16

On Mar 28, 12:34 pm, Steve Howell <showel...(a)yahoo.com> wrote:
> FYI I later obtained similar results with the more general:
> accum += sublist

Yeah, but you still have to create an object of the correct type for
accum. And for summing small lists, that will actually increase the
runtime. (Worst case, a list of a single item: you have to create the
accum object based on the type of the start value, then do two += into
it, for the start value and the first item in the list, vs just doing
a single + which automatically creates an object.)

Personally, I think the most general approach, if sum were open to
enhancements, would be to automatically apply Alf's suggestion of "+"
on summing the first item to the start value, and "+=" on subsequent
items. Also, you *could* add a parameter to sum(), for example,
default "partialsums=False" that would allow the use of partial sums
in those cases where that might give better behavior (such as floats
and tuples).

From: Steve Howell on 28 Mar 2010 14:50

On Mar 28, 11:16 am, Patrick Maupin <pmau...(a)gmail.com> wrote:
> On Mar 28, 12:34 pm, Steve Howell <showel...(a)yahoo.com> wrote:
>
> > FYI I later obtained similar results with the more general:
> > accum += sublist
>
> Yeah, but you still have to create an object of the correct type for
> accum.

I think you overlooked the surrounding code.

Here is the code again:

def in_place(
start = [],
sublists = ([[None] * M]) * N
):
# only macro-optimized
i = 0
for sublist in sublists:
if i == 0:
accum = start + sublist
i += 1
else:
accum += sublist
if i == 0:
return 'whatever' # semantics here?
return accum

No need to infer the type.

> And for summing small lists, that will actually increase the
> runtime. (Worst case, a list of a single item: you have to create the
> accum object based on the type of the start value, then do two += into
> it, for the start value and the first item in the list, vs just doing
> a single + which automatically creates an object.)
>

For small lists, the performance of any sane implementation would
probably be so fast as to be negligible, unless you are in a really
tight loop. If you are in a tight loop, then your use case probably
eventually sums large lists. Even if it doesn't, the slowdown is just
a constant.

For M=5, I get these results on my machine:

M N t1 t2 (t2/t1)
5 1 3.50475311279e-05 3.2901763916e-05 0.938775510204
5 2 2.00271606445e-05 1.59740447998e-05 0.797619047619
5 4 6.79492950439e-05 6.31809234619e-05 0.929824561404
5 8 0.000124931335449 0.000126123428345 1.00954198473
5 64 0.000530958175659 0.00226187705994 4.25999101931
5 1024 0.00262904167175 0.27246594429 103.636981953

t1 = time with add only first element
t2 = time with add all elements

Very small penalty for small lists, very large gain for large lists.

> Personally, I think the most general approach, if sum were open to
> enhancements, would be to automatically apply Alf's suggestion of "+"
> on summing the first item to the start value, and "+=" on subsequent
> items.

See above. That's the approach I would use as well.

From: Paul Rubin on 28 Mar 2010 16:49

Steve Howell <showell30(a)yahoo.com> writes:
> The documentation is pretty clear on the intention that sum() is
> intended for numbers: ...

I've never been big on the idea of duck-typing addition. Who would have
thought that (1,2,3)+(4.5.6) was something other than the vector sum?

I think itertools.chain more directly expresses what the OP was trying
to do, and is preferable for readability, independently of these
performance questions.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: Advice needed on parallel processing in python
Next: coding contest