From: Johann Spies on
I am overlooking something stupid.

I have two files: one with keywords and another with data (one record per line).

I want to determine for each keyword which lines in the second file
contains that keyword.

The following code is not working. It loops through the second file
but only uses the first keyword in the first file.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import re

keywords = open("sleutelwoorde",'r')
data = open("sarua_marine_sleutelwoorde.csv",'r')

remove_quotes = re.compile('"')


for sw in keywords:
for r in data:
swc = remove_quotes('',sw)[:-1]
if swc in r.lower():
print swc + ' ---> ' + r
print swc

What am I missing?

Regards
Johann

--
"Finally, brethren, whatsoever things are true, whatsoever things are
honest, whatsoever things are just, whatsoever things are pure,
whatsoever things are lovely, whatsoever things are of good report; if
there be any virtue, and if there be any praise, think on these
things." Philippians 4:8
From: Ian Kelly on
On Fri, Jul 16, 2010 at 8:34 AM, Johann Spies <johann.spies(a)gmail.com> wrote:
> I am overlooking something stupid.
>
> I have two files: one with keywords and another with data (one record per line).
>
> I want to determine for each keyword which lines in the second file
> contains that keyword.
>
> The following code is not working.  It loops through the second file
> but only uses the first keyword in the first file.
>
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
>
> import re
>
> keywords = open("sleutelwoorde",'r')
> data = open("sarua_marine_sleutelwoorde.csv",'r')
>
> remove_quotes = re.compile('"')
>
>
> for sw in keywords:
>    for r in data:
>        swc = remove_quotes('',sw)[:-1]
>        if swc in r.lower():
>                print swc + ' ---> ' + r
>                print swc
>
> What am I missing?

Not sure about the loop, but this line looks incorrect:

swc = remove_quotes('',sw)[:-1]

I don't think a compiled regular expression object is callable; you
have to call one of its methods.

HTH,
Ian
From: MRAB on
Johann Spies wrote:
> I am overlooking something stupid.
>
> I have two files: one with keywords and another with data (one record per line).
>
> I want to determine for each keyword which lines in the second file
> contains that keyword.
>
> The following code is not working. It loops through the second file
> but only uses the first keyword in the first file.
>
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
>
> import re
>
> keywords = open("sleutelwoorde",'r')
> data = open("sarua_marine_sleutelwoorde.csv",'r')
>
> remove_quotes = re.compile('"')
>
>
> for sw in keywords:
> for r in data:
> swc = remove_quotes('',sw)[:-1]
> if swc in r.lower():
> print swc + ' ---> ' + r
> print swc
>
> What am I missing?
>
The line:

for r in data

reads through the file until it the end. The next time around the outer
loop it's already at the end of the file. You need to reset it to the
start of the file with:

data.seek(0)

Incidentally, it would be faster if you read the keywords into a list
first (assuming that there isn't a huge number of keywords) and then
scanned through the file once.
From: Dave Angel on
Johann Spies wrote:
> I am overlooking something stupid.
>
> I have two files: one with keywords and another with data (one record per line).
>
> I want to determine for each keyword which lines in the second file
> contains that keyword.
>
> The following code is not working. It loops through the second file
> but only uses the first keyword in the first file.
>
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
>
> import re
>
> keywords = open("sleutelwoorde",'r')
> data = open("sarua_marine_sleutelwoorde.csv",'r')
>
> remove_quotes = re.compile('"')
>
>
> for sw in keywords:
> for r in data:
> swc = remove_quotes('',sw)[:-1]
> if swc in r.lower():
> print swc + ' ---> ' + r
> print swc
>
> What am I missing?
>
> Regards
> Johann
>
>
Once you've read all the data from 'data' in the first inner loop,
there's no more for the second keyword.

Easiest answer is to do something like:
data.seek(0)
just before the inner loop. That will (re)position to begin of hte
'data' file.

DaveA

From: Alf P. Steinbach /Usenet on
* Johann Spies, on 16.07.2010 16:34:
> I am overlooking something stupid.
>
> I have two files: one with keywords and another with data (one record per line).
>
> I want to determine for each keyword which lines in the second file
> contains that keyword.
>
> The following code is not working. It loops through the second file
> but only uses the first keyword in the first file.
>
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
>
> import re
>
> keywords = open("sleutelwoorde",'r')
> data = open("sarua_marine_sleutelwoorde.csv",'r')
>
> remove_quotes = re.compile('"')
>
>
> for sw in keywords:
> for r in data:
> swc = remove_quotes('',sw)[:-1]
> if swc in r.lower():
> print swc + ' ---> ' + r
> print swc
>
> What am I missing?

For the inner loop, 'data' is an object that represents a file and keeps track
of a current read position of the file. The first execution of the loop moves
that read position all the way to the End Of the File, EOF. The second time this
loop is attempted, which would be for the second keyword, the 'data' object's
read position is already at end of file, and thus nothing's done.

One way to just make it work is to open and close the data file within the outer
loop. Actually with CPython it's automatically closed, as far as I can recall,
so you only need to reopen it, but this (if true) is less than completely
documented. This way is inefficient for small data set, but works.

In order to get a better handle on the general problem -- not the Python
technicalitities -- google up "KWIC", KeyWord In Context. It's a common exercise
problem given to first or second-year students. So I think there should be an
abundance of answers and discussion, although I haven't googled.


Cheers & hth.,

- Alf

--
blog at <url: http://alfps.wordpress.com>