|
From: Henning_Thornblad on 4 Jul 2008 07:43 What can be the cause of the large difference between re.search and grep? This script takes about 5 min to run on my computer: #!/usr/bin/env python import re row="" for a in range(156000): row+="a" print re.search('[^ "=]*/',row) While doing a simple grep: grep '[^ "=]*/' input (input contains 156.000 a in one row) doesn't even take a second. Is this a bug in python? Thanks... Henning Thornblad
From: Bruno Desthuilliers on 4 Jul 2008 08:29 Henning_Thornblad a �crit : > What can be the cause of the large difference between re.search and > grep? > > This script takes about 5 min to run on my computer: > #!/usr/bin/env python > import re > > row="" > for a in range(156000): > row+="a" > print re.search('[^ "=]*/',row) > > > While doing a simple grep: > grep '[^ "=]*/' input (input contains 156.000 a in > one row) > doesn't even take a second. > > Is this a bug in python? Please re-read carefully your python code. Don't you think there's a subtle difference between reading a file and buildin 156000 string objects ?
From: Bruno Desthuilliers on 4 Jul 2008 08:34 Bruno Desthuilliers a �crit : > Henning_Thornblad a �crit : >> What can be the cause of the large difference between re.search and >> grep? >> >> This script takes about 5 min to run on my computer: >> #!/usr/bin/env python >> import re >> >> row="" >> for a in range(156000): >> row+="a" >> print re.search('[^ "=]*/',row) >> >> >> While doing a simple grep: >> grep '[^ "=]*/' input (input contains 156.000 a in >> one row) >> doesn't even take a second. >> >> Is this a bug in python? > > Please re-read carefully your python code. Don't you think there's a > subtle difference between reading a file and buildin 156000 string > objects ? > Mmm... This set aside, after testing it (building the string in a somewhat more efficient way), the call to re.search effectively takes ages to return. Please forget my previous post.
From: Peter Otten on 4 Jul 2008 08:36 Henning_Thornblad wrote: > What can be the cause of the large difference between re.search and > grep? grep uses a smarter algorithm ;) > This script takes about 5 min to run on my computer: > #!/usr/bin/env python > import re > > row="" > for a in range(156000): > row+="a" > print re.search('[^ "=]*/',row) > > > While doing a simple grep: > grep '[^ "=]*/' input (input contains 156.000 a in > one row) > doesn't even take a second. > > Is this a bug in python? You could call this a performance bug, but it's not common enough in real code to get the necessary brain cycles from the core developers. So you can either write a patch yourself or use a workaround. re.search('[^ "=]*/', row) if "/" in row else None might be good enough. Peter
From: Paddy on 4 Jul 2008 12:40
On Jul 4, 1:36 pm, Peter Otten <__pete...(a)web.de> wrote: > Henning_Thornblad wrote: > > What can be the cause of the large difference between re.search and > > grep? > > grep uses a smarter algorithm ;) > > > > > This script takes about 5 min to run on my computer: > > #!/usr/bin/env python > > import re > > > row="" > > for a in range(156000): > > row+="a" > > print re.search('[^ "=]*/',row) > > > While doing a simple grep: > > grep '[^ "=]*/' input (input contains 156.000 a in > > one row) > > doesn't even take a second. > > > Is this a bug in python? > > You could call this a performance bug, but it's not common enough in real > code to get the necessary brain cycles from the core developers. > So you can either write a patch yourself or use a workaround. > > re.search('[^ "=]*/', row) if "/" in row else None > > might be good enough. > > Peter It is not a smarter algorithm that is used in grep. Python RE's have more capabilities than grep RE's which need a slower, more complex algorithm. You could argue that if the costly RE features are not used then maybe simpler, faster algorithms should be automatically swapped in but .... - Paddy. |