From: Steven D'Aprano on
On Sun, 28 Mar 2010 13:49:32 -0700, Paul Rubin wrote:

> Steve Howell <showell30(a)yahoo.com> writes:
>> The documentation is pretty clear on the intention that sum() is
>> intended for numbers: ...
>
> I've never been big on the idea of duck-typing addition. Who would have
> thought that (1,2,3)+(4.5.6) was something other than the vector sum?

But your arguments are tuples, not vectors.

There are languages which do treat arithmetic operations on vectors as
vector operations, as do (e.g.) H-P scientific calculators. That's a fine
design choice, and it works for languages where the emphasis is on
scientific calculations. But Python is a more generalist language, so in
my mind it is more appropriate to treat lists as generic lists, and not
numeric vectors:

[1, 2, 3] + [4, "a"] => [1, 2, 3, 4, "a"]

just as

"123" + "4a" => "1234a"



--
Steven
From: Steven D'Aprano on
On Sun, 28 Mar 2010 18:56:26 +0200, Alf P. Steinbach wrote:

> From a more practical point of view, the sum efficiency could be
> improved by doing the first addition using '+' and the rest using '+=',
> without changing the behavior.

But that would change the behaviour. The __iadd__ method is not the same
as the __add__ method, and you can't guarantee that they will behave the
same -- or even similarly.

And what about tuples? And subclasses of list/tuples? How many different
types need to be optimized?

In practical terms, does anyone actually ever use sum on more than a
handful of lists? I don't believe this is more than a hypothetical
problem.

The primary use case for sum is adding numbers when floating point
accuracy is not critical. If you need float accuracy, use math.fsum. If
you need to add more than a handful of small lists, don't use sum: just
calling extend in a loop is probably fast enough, or use itertools.chain.
Trying to turn sum into an all-singing all-dancing function optimised for
an ever-increasing number of objects is a fool's errand. The
implementation of sum in C is already complex enough: 95 lines, excluding
comments, blanks and braces, for something which is a lightweight
function with very simple semantics.

But if anyone wants to submit a patch to the bug tracker, go right ahead.
Without a patch though, I'd say that Python-Dev will consider this a non-
issue.



--
Steven
From: Alf P. Steinbach on
* Steven D'Aprano:
> On Sun, 28 Mar 2010 18:56:26 +0200, Alf P. Steinbach wrote:
>
>> From a more practical point of view, the sum efficiency could be
>> improved by doing the first addition using '+' and the rest using '+=',
>> without changing the behavior.
>
> But that would change the behaviour. The __iadd__ method is not the same
> as the __add__ method, and you can't guarantee that they will behave the
> same -- or even similarly.

Hm, I don't think it's documented (except if the reference implementation serves
as documentation) which one is currently used.


> And what about tuples? And subclasses of list/tuples? How many different
> types need to be optimized?

Point. One would need to check for availability of '+='.


> In practical terms, does anyone actually ever use sum on more than a
> handful of lists? I don't believe this is more than a hypothetical
> problem.

Agreed.


Cheers,

- Alf
From: Patrick Maupin on
On Mar 28, 9:45 pm, Steven D'Aprano
<ste...(a)REMOVE.THIS.cybersource.com.au> wrote:
> And what about tuples? And subclasses of list/tuples? How many different
> types need to be optimized?

One of the beautiful things about Python is that, for most things,
there are few surprises for even new users. "There should be one
obvious way to do it" for the user means that, sometimes, under the
hood, there are a lot of special cases for the implementers.

> In practical terms, does anyone actually ever use sum on more than a
> handful of lists? I don't believe this is more than a hypothetical
> problem.

Right now, it's probably not, because when somebody sums a large list
and gets thwacked on the head by the lack of efficiency, they then
come here and get thwacked because "everybody knows" they should user
itertools or something else; not sum().
>
> The primary use case for sum is adding numbers when floating point
> accuracy is not critical. If you need float accuracy, use math.fsum.

See, I think the very existence of math.fsum() already violates "there
should be one obvious way to do it."

> But if anyone wants to submit a patch to the bug tracker, go right ahead.
> Without a patch though, I'd say that Python-Dev will consider this a non-
> issue.

Agreed. Wish I had the time to do this sort of cleanup.

Regards,
Pat
From: Steve Howell on
On Mar 29, 7:40 am, Patrick Maupin <pmau...(a)gmail.com> wrote:
> On Mar 28, 9:45 pm, Steven D'Aprano
>
> <ste...(a)REMOVE.THIS.cybersource.com.au> wrote:
> > And what about tuples? And subclasses of list/tuples? How many different
> > types need to be optimized?
>
> One of the beautiful things about Python is that, for most things,
> there are few surprises for even new users.  "There should be one
> obvious way to do it" for the user means that, sometimes, under the
> hood, there are a lot of special cases for the implementers.
>

If nothing else, I think it's reasonably for users to expect symmetry.

If you can use "+" to concatentate lists, then it seems reasonable
that something spelled "sum" would concatenate lists as well, and in
reasonable time.

> > In practical terms, does anyone actually ever use sum on more than a
> > handful of lists? I don't believe this is more than a hypothetical
> > problem.
>
> Right now, it's probably not, because when somebody sums a large list
> and gets thwacked on the head by the lack of efficiency, they then
> come here and get thwacked because "everybody knows" they should user
> itertools or something else; not sum().
>

Indeed. It would be nice if the docs for sum() at least pointed to
list(itertools.chain(list_of_lists)), or whatever the most kosher
alternative is supposed to be.

It only takes a handful of sublists, about ten on my box, to expose
the limitation of the Shlemeil-the-Painter O(M*N*N) algorithm that's
under the hood. It only takes 200 sublists to start getting a 10x
degradation in performance.

> > The primary use case for sum is adding numbers when floating point
> > accuracy is not critical. If you need float accuracy, use math.fsum.
>
> See, I think the very existence of math.fsum() already violates "there
> should be one obvious way to do it."
>

The nice thing about math.fsum() is that it is at least documented
from sum(), although I suspect some users try sum() without even
consulting the docs.

You could appease all users with an API where the most obvious choice,
sum(), never behaves badly, and where users can still call more
specialized versions (math.fsum() and friends) directly if they know
what they are doing. This goes back to the statement that Patrick
makes--under the hood, this means more special cases for implementers,
but fewer pitfalls for users.