From: Ken D'Ambrosio on
Hi, all. I've got a file which, in turn, contains a couple thousand
filenames. I'm writing a web front-end, and I want to return all the
filenames that match a user-input value. In Perl, this would be something
like,

if (/$value/){print "$_ matches\n";}

But trying to put a variable into regex in Python is challenging me --
and, indeed, I've seen a bit of scorn cast upon those who would do so in
my Google searches ("You come from Perl, don't you?").

Here's what I've got (in ugly, prototype-type code):

file=open('/tmp/event_logs_listing.txt' 'r') # List of filenames
seek = form["serial"].value # Value from web form
for line in file:
match = re.search((seek)",(.*),(.*)", line) # Stuck here

Clearly, the line, above, is just plain ol' wrong, but I'm including it to
give a hint of what I'm trying to do. What's the correct Python-esque way
to go about this?

Thanks!

-Ken


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

From: Thomas Jollans on
On 06/28/2010 07:29 PM, Ken D'Ambrosio wrote:
> Hi, all. I've got a file which, in turn, contains a couple thousand
> filenames. I'm writing a web front-end, and I want to return all the
> filenames that match a user-input value. In Perl, this would be something
> like,
>
> if (/$value/){print "$_ matches\n";}
>
> But trying to put a variable into regex in Python is challenging me --
> and, indeed, I've seen a bit of scorn cast upon those who would do so in
> my Google searches ("You come from Perl, don't you?").
>
> Here's what I've got (in ugly, prototype-type code):
>
> file=open('/tmp/event_logs_listing.txt' 'r') # List of filenames
> seek = form["serial"].value # Value from web form
> for line in file:
> match = re.search((seek)",(.*),(.*)", line) # Stuck here


without re:

for line in file:
if line.startswith(seek): # or: if seek in line...
# do stuff

with re and classic string formatting:

for line in file:
# or use re.match to look only at the beginning of the line
match = re.search('(%s),(.*),(.*)' % seek, line)
if match:
#use match.group(n) for further processing

You could also concatenate strings to get the regexp, as in:
'('+seek+'),(.*),(.*)'

Note that while using re is perl way to do strings, Python has a bunch
of string methods that are often a better choice, such as str.startswith
or the ("xyz" in "vwxyz!") syntax.

if you just want to get the matching lines, you could use list
comprehension:

matching = [ln for ln in file if ln.startswith(seek)]

Just to present some of the ways you can do this in Python. I hope
you're enjoying the language.

-- Thomas

From: Stephen Hansen on
On 6/28/10 10:29 AM, Ken D'Ambrosio wrote:
> Hi, all. I've got a file which, in turn, contains a couple thousand
> filenames. I'm writing a web front-end, and I want to return all the
> filenames that match a user-input value. In Perl, this would be something
> like,
>
> if (/$value/){print "$_ matches\n";}
>
> But trying to put a variable into regex in Python is challenging me --
> and, indeed, I've seen a bit of scorn cast upon those who would do so in
> my Google searches ("You come from Perl, don't you?").

First of all, if you're doing this, you have to be aware that it is
*very* possible to write a pathological regular expression which will
can kill your app and maybe your web server.

So if you're letting them write regular expressions and they aren't
like, smartly-trusted-people, be wary.

> Here's what I've got (in ugly, prototype-type code):
>
> file=open('/tmp/event_logs_listing.txt' 'r') # List of filenames
> seek = form["serial"].value # Value from web form
> for line in file:
> match = re.search((seek)",(.*),(.*)", line) # Stuck here

Now, if you don't need the full power of regular expressions, then what
about:

name, foo, bar = line.split(",")
if seek in name:
# do something with foo and bar

That'll return True if the word 'seek' appears in the first field of
what appears to be the comma-delimited line.

Or maybe, if you're worried about case sensitivity:

if seek.lower() in name.lower():
# la la la

You can do a lot without ever bothering to mess around with regular
expressions. A lot.

Its also faster if you're doing simpler things :)

If they don't need to do the full power of regular expressions, but just
simple globbing? Then maybe change it to:

seeking = re.escape(seek)
seeking = seeking.replace("\\*", ".*")
seeking = seeking.replace("\\?", ".")

match = re.search(seeking + ",(.*),(.*)", line)

FIrst, we escape the user input so they can't put in any crazy regular
expression characters. Then we go and *unescape* "\\*" and turn it into
a ".*" -- becaues when a user enters "*", the really mean ".*" in
traditional glob-esque. Then we do the same with the question mark
turning into a dot.

Then! We go and run our highly restricted regular expression through
basically just as you were doing before-- you just didn't concatenate
the 'seek' string to the rest of your expression.

If you must give them full regex power and you know they won't try to
bomb you, just leave out the 'seeking = ' lines and cross your fingers.


--

... Stephen Hansen
... Also: Ixokai
... Mail: me+list/python (AT) ixokai (DOT) io
... Blog: http://meh.ixokai.io/

From: Rami Chowdhury on
On Monday 28 June 2010 10:29:35 Ken D'Ambrosio wrote:
> Hi, all. I've got a file which, in turn, contains a couple thousand
> filenames. I'm writing a web front-end, and I want to return all the
> filenames that match a user-input value. In Perl, this would be something
> like,
>
> if (/$value/){print "$_ matches\n";}

I presume you're suitably sanitizing $value before you arbitrarily use it ;-)?

> file=open('/tmp/event_logs_listing.txt' 'r') # List of filenames
> seek = form["serial"].value # Value from web form
> for line in file:
> match = re.search((seek)",(.*),(.*)", line) # Stuck here

Not sure what you're trying to do, here, with the extra regex groups? If you
just want to naively check for the presence of that user-input value, you can
do:

seek = sanitize(form["serial"].value)
for line in file:
match = re.search(seek, line) # re.search(pattern, string, flags)
if match is not None:
print "%s matches" % line

One thing that certainly confused me when I learned Python (coming from, among
other things, Perl) was the lack of implicit variables -- is that what's
tripping you up?

Hope that helps,
Rami
----
Rami Chowdhury
"Given enough eyeballs, all bugs are shallow." -- Linus' Law
+1-408-597-7068 / +44-7875-841-046 / +88-01819-245544
From: Sion Arrowsmith on
Stephen Hansen <me+list/python(a)ixokai.io> wrote:
>On 6/28/10 10:29 AM, Ken D'Ambrosio wrote:
>> for line in file:
>> match = re.search((seek)",(.*),(.*)", line) # Stuck here
> [ ... ]
> name, foo, bar = line.split(",")
> if seek in name:
> # do something with foo and bar
>
>That'll return True if the word 'seek' appears in the first field of
>what appears to be the comma-delimited line.

If the file input is comma-delimited, then the OP might very well
want a look at the csv module. Something like:

for line in reader(file):
if line[0] == seek:
# first field matches, do something with line[-1] and line[-2]
# -- I'm not quite sure what the semantics of a pair of greedy
# (.*)s would be

--
\S

under construction