From: Henning_Thornblad on
What can be the cause of the large difference between re.search and
grep?

This script takes about 5 min to run on my computer:
#!/usr/bin/env python
import re

row=""
for a in range(156000):
row+="a"
print re.search('[^ "=]*/',row)


While doing a simple grep:
grep '[^ "=]*/' input (input contains 156.000 a in
one row)
doesn't even take a second.

Is this a bug in python?

Thanks...
Henning Thornblad
From: Bruno Desthuilliers on
Henning_Thornblad a �crit :
> What can be the cause of the large difference between re.search and
> grep?
>
> This script takes about 5 min to run on my computer:
> #!/usr/bin/env python
> import re
>
> row=""
> for a in range(156000):
> row+="a"
> print re.search('[^ "=]*/',row)
>
>
> While doing a simple grep:
> grep '[^ "=]*/' input (input contains 156.000 a in
> one row)
> doesn't even take a second.
>
> Is this a bug in python?

Please re-read carefully your python code. Don't you think there's a
subtle difference between reading a file and buildin 156000 string objects ?

From: Bruno Desthuilliers on
Bruno Desthuilliers a �crit :
> Henning_Thornblad a �crit :
>> What can be the cause of the large difference between re.search and
>> grep?
>>
>> This script takes about 5 min to run on my computer:
>> #!/usr/bin/env python
>> import re
>>
>> row=""
>> for a in range(156000):
>> row+="a"
>> print re.search('[^ "=]*/',row)
>>
>>
>> While doing a simple grep:
>> grep '[^ "=]*/' input (input contains 156.000 a in
>> one row)
>> doesn't even take a second.
>>
>> Is this a bug in python?
>
> Please re-read carefully your python code. Don't you think there's a
> subtle difference between reading a file and buildin 156000 string
> objects ?
>

Mmm... This set aside, after testing it (building the string in a
somewhat more efficient way), the call to re.search effectively takes
ages to return. Please forget my previous post.
From: Peter Otten on
Henning_Thornblad wrote:

> What can be the cause of the large difference between re.search and
> grep?

grep uses a smarter algorithm ;)

> This script takes about 5 min to run on my computer:
> #!/usr/bin/env python
> import re
>
> row=""
> for a in range(156000):
> row+="a"
> print re.search('[^ "=]*/',row)
>
>
> While doing a simple grep:
> grep '[^ "=]*/' input (input contains 156.000 a in
> one row)
> doesn't even take a second.
>
> Is this a bug in python?

You could call this a performance bug, but it's not common enough in real
code to get the necessary brain cycles from the core developers.
So you can either write a patch yourself or use a workaround.

re.search('[^ "=]*/', row) if "/" in row else None

might be good enough.

Peter
From: Paddy on
On Jul 4, 1:36 pm, Peter Otten <__pete...(a)web.de> wrote:
> Henning_Thornblad wrote:
> > What can be the cause of the large difference between re.search and
> > grep?
>
> grep uses a smarter algorithm ;)
>
>
>
> > This script takes about 5 min to run on my computer:
> > #!/usr/bin/env python
> > import re
>
> > row=""
> > for a in range(156000):
> >     row+="a"
> > print re.search('[^ "=]*/',row)
>
> > While doing a simple grep:
> > grep '[^ "=]*/' input                  (input contains 156.000 a in
> > one row)
> > doesn't even take a second.
>
> > Is this a bug in python?
>
> You could call this a performance bug, but it's not common enough in real
> code to get the necessary brain cycles from the core developers.
> So you can either write a patch yourself or use a workaround.
>
> re.search('[^ "=]*/', row) if "/" in row else None
>
> might be good enough.
>
> Peter

It is not a smarter algorithm that is used in grep. Python RE's have
more capabilities than grep RE's which need a slower, more complex
algorithm.
You could argue that if the costly RE features are not used then maybe
simpler, faster algorithms should be automatically swapped in but ....

- Paddy.