Kindly show me a better way to do it [Python]

Prev: [ANN] Pyspread 0.1.1 released
Next: A more general solution

From: Steven D'Aprano on 9 May 2010 02:17

On Sat, 08 May 2010 14:06:33 -0700, Oltmans wrote:

> On May 9, 1:53 am, superpollo <ute...(a)esempio.net> wrote:
>
>> add = lambda a,b: a+b
>> for i in reduce(add,a):
>> print i
>
> This is very neat. Thank you. Sounds like magic to me. Can you please
> explain how does that work? Many thanks again.

Don't use this except for small lists, it is very inefficient and will be
slow for large lists. It is a Shlemiel The Painter algorithm:

http://www.joelonsoftware.com/articles/fog0000000319.html

The most idiomatic solution is a simple, straightforward nested iteration:

for sublist in a:
for item in sublist:
do_something_with(item)

Say that there are 10 sublists with 10 items each. Then nested iteration
will iterate 100 times in total. The solution with reduce will iterate:

10+10 # add the first sublist and the second sublist
20+10 # add the third sublist
30+10 # add the fourth sublist
40+10 # and so on...
50+10
60+10
70+10
80+10
90+10 # add the last sublist
100 # and now iterate over the combined list

or 640 times in total. If there are 100 sublists of 10 items each, the
performance is even worse: 51,490 for the reduce solution, versus 1000
for the nested iteration.

Admittedly those iterations will be in fast C code instead of slow Python
code, which is why you might not notice the difference at first, but
you're still doing a lot of unnecessary work which takes time. How much
time? Python makes it easy to find out.

>>> from timeit import Timer
>>> setup = "data = [range(10) for i in range(10)]"
>>> t1 = Timer("""for sublist in data:
.... for item in sublist:
.... pass""", setup)
>>> t2 = Timer("""for item in reduce(lambda x,y: x+y, data):
.... pass""", setup)
>>>
>>> min(t1.repeat(number=100000))
0.94107985496520996
>>> min(t2.repeat(number=100000))
1.7509880065917969

So for ten sublists of ten items each, the solution using reduce is
nearly twice as slow as the nested iteration. If we make the number of
lists ten times larger, the nested for-loop solution takes ten times
longer, as you would expect:

>>> setup = "data = [range(10) for i in range(100)]"
>>> t1 = Timer("""for sublist in data:
.... for item in sublist:
.... pass""", setup)
>>> min(t1.repeat(number=100000))
10.349304914474487

But the reduce solution slows down by a factor of thirty-two rather than
ten:

>>> t2 = Timer("""for item in reduce(lambda x,y: x+y, data):
.... pass""", setup)
>>> min(t2.repeat(number=100000))
58.116463184356689

If we were to increase the number of sublists further, the reduce
solution will perform even more badly.

--
Steven

From: Steven D'Aprano on 9 May 2010 05:18

On Sun, 09 May 2010 15:17:38 +1000, Lie Ryan wrote:

> On 05/09/10 07:09, Günther Dietrich wrote:
>>
>> Why not this way?
>>
>>>>> a = [[1,2,3,4], [5,6,7,8]]
>>>>> for i in a:
>> .... for j in i:
>> .... print(j)
>> ....
>> 1
>> 2
>> 3
>> 4
>> 5
>> 6
>> 7
>> 8
>>
>> Too simple?
>
> IMHO that's more complex due to the nested loop,

What's so complex about a nested loop? And why are you saying that it is
"more complex" than the Original Poster's solution, which also had a
nested loop, plus a pointless list comprehension?

> though I would
> personally do it as:
>
> a = [ [1,2,3,4], [5,6,7,8] ]
> from itertools import chain
> for i in chain.from_iterable(a):
> print i
>
> so it won't choke when 'a' is an infinite stream of iterables.

Neither will a nested for-loop.

--
Steven

From: Lie Ryan on 9 May 2010 08:52

On 05/09/10 19:18, Steven D'Aprano wrote:
> On Sun, 09 May 2010 15:17:38 +1000, Lie Ryan wrote:
>
>> On 05/09/10 07:09, Günther Dietrich wrote:
>>>
>>> Why not this way?
>>>
>>>>>> a = [[1,2,3,4], [5,6,7,8]]
>>>>>> for i in a:
>>> .... for j in i:
>>> .... print(j)
>>> ....
>>> 1
>>> 2
>>> 3
>>> 4
>>> 5
>>> 6
>>> 7
>>> 8
>>>
>>> Too simple?
>>
>> IMHO that's more complex due to the nested loop,
>
> What's so complex about a nested loop?

one more nested tab. That extra whitespaces is quite irritating.

And why are you saying that it is
> "more complex" than the Original Poster's solution, which also had a
> nested loop, plus a pointless list comprehension?

You misunderstood. Tycho Anderson posted an itertools.chain(*chain)
solution for which Gunther Dietrich remarked "why not a nested loop"; I
am replying to Gunther Dietrich's nested loop with "because nested loop
is more complex than chain()" and added that the original[Tycho
Anderson's] chain solution has a subtle bug when facing infinite
generator of iterables.

>> though I would
>> personally do it as:
>>
>> a = [ [1,2,3,4], [5,6,7,8] ]
>> from itertools import chain
>> for i in chain.from_iterable(a):
>> print i
>>
>> so it won't choke when 'a' is an infinite stream of iterables.
>
> Neither will a nested for-loop.

From: Steven D'Aprano on 9 May 2010 13:58

On Sun, 09 May 2010 22:52:55 +1000, Lie Ryan wrote:

>>> IMHO that's more complex due to the nested loop,
>>
>> What's so complex about a nested loop?
>
> one more nested tab. That extra whitespaces is quite irritating.

Then say you don't like it, don't try to make a subjective dislike seem
objectively bad with a spurious claim of complexity. There's nothing
complex about an extra level of indentation. It's *one token*, with zero
run-time cost and virtually no compile-time cost.

--
Steven

From: Jean-Michel Pichavant on 10 May 2010 07:30

Oltmans wrote:
> On May 9, 1:53 am, superpollo <ute...(a)esempio.net> wrote:
>
>
>> add = lambda a,b: a+b
>> for i in reduce(add,a):
>> print i
>>
>
> This is very neat. Thank you. Sounds like magic to me. Can you please
> explain how does that work? Many thanks again.
>
>
shorter <> nicer IMO.
Those alternatives are interesting from a tech point of view, but
nothing can beat the purity of a vintage 'for' loop with *meaningful names*.

salads = [['apple', 'banana'], ['apple', 'lemon', 'kiwi']]

ingredients = []

for salad in salads:
for fruit in salad:
ingredients.append(fruit)

print 'Remember to buy %s' % ingredients

Lame & effective (1st adjective is irrelevant outside a geek contest)

JM

First | Prev |
Pages: 1 2 3
Prev: [ANN] Pyspread 0.1.1 released
Next: A more general solution