From: Qnmt Mndy on
i am trying to find a set of keys within specific files under a specific
directory. i read the keys from a file and iterate through them opening
and looking all the files under the specified directory. However only
the last key seems to be found in the files..

srcFiles = Dir.glob(File.join("**", "*.txt"))
keys = File.readlines("sp.txt")

keys.each{ |key|
srcFiles.each{|src|
linenumber = 0
File.readlines(src).each{ |line|
linenumber += 1
if line.include? key then
puts "found #{key}"
}
}
}
--
Posted via http://www.ruby-forum.com/.

From: Robert Klemme on
2010/5/3 Qnmt Mndy <quantum.17(a)hotmail.com>:
> i am trying to find a set of keys within specific files under a specific
> directory. i read the keys from a file and iterate through them opening
> and looking all the files under the specified directory. However only
> the last key seems to be found in the files..
>
> srcFiles = Dir.glob(File.join("**", "*.txt"))
> keys = File.readlines("sp.txt")
>
> keys.each{ |key|
>  srcFiles.each{|src|
>    linenumber = 0
>    File.readlines(src).each{ |line|
>      linenumber += 1
>      if line.include? key then
>      puts "found #{key}"
>    }
>  }
> }

This is likely caused by the fact, that you do not postprocess what
you get from File.readlines:

$ echo 111 >| x
$ echo 222 >> x
$ ruby19 -e 'p File.readlines("x")'
["111\n", "222\n"]
$

Note the trailing line delimiter.

Also, your approach is very inefficient: you open and read every file
# of keys times. You better exchange outer and inner loop and open
each file only once while searching for all keys in one line.

Btw, what you attempt to do can be done by GNU find and fgrep already:

$ find . -type f -name '*.txt' -print0 | xargs -r0 fgrep -f sp.txt

Or, with a shell that knows "**" expansion, e.g. zsh

$ fgrep -f sp.txt **/*.txt

If you are only interested in file names you can add option -l to fgrep.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

From: Qnmt Mndy on
thanks for your reply and advices robert.

the problem was really about postprocessing the result of File.readlines
and your idea about switching the loop order significantly improved the
performance.

about doing the same thing with GNU commands, i wrote this for windows
environment and not sure if it has such a command utility

cem

--
Posted via http://www.ruby-forum.com/.