From: Danny Challis on
Hello everyone,
I need to count the number of times a substring occurs in a string.
I am currently doing this using the scan method, but it is simply too
slow. I feel there should be a faster way to do this since the scan
method is really designed for more advanced things than this. I do not
need to do regex matching or to process the matches, just count
substrings. So what I want is something like this:

s = "you like to play with your yo-yo"
s.magical_count_method("yo") => 4

Once again, what I'm really looking for is something fast. I've tried
using external linux commands such as awk, but that was much much
slower. Any ideas?
Thanks,

Danny.
--
Posted via http://www.ruby-forum.com/.

From: Jesús Gabriel y Galán on
On Thu, Jun 24, 2010 at 5:04 PM, Danny Challis <dannychallis(a)gmail.com> wrote:
> Hello everyone,
>   I need to count the number of times a substring occurs in a string.
> I am currently doing this using the scan method, but it is simply too
> slow.  I feel there should be a faster way to do this since the scan
> method is really designed for more advanced things than this.  I do not
> need to do regex matching or to process the matches, just count
> substrings.  So what I want is something like this:
>
> s = "you like to play with your yo-yo"
> s.magical_count_method("yo") => 4
>
> Once again, what I'm really looking for is something fast.  I've tried
> using external linux commands such as awk, but that was much much
> slower. Any ideas?

I don't know how slow is scan for you. An implementation using
String#index and a loop is a little bit faster, but not too much:

require 'benchmark'

TIMES = 100_000
s = "you like to play with your yo-yo"

Benchmark.bmbm do |x|
x.report("scan") do
TIMES.times do
s.scan("yo").size
end
end
x.report("while") do
TIMES.times do
index = -1
count = 0
while (index = s.index("yo", index+1))
count += 1
end
count
end
end
end

$ ruby scan_vs_while.rb
Rehearsal -----------------------------------------
scan 0.560000 0.020000 0.580000 ( 0.585972)
while 0.440000 0.060000 0.500000 ( 0.492969)
-------------------------------- total: 1.080000sec

user system total real
scan 0.510000 0.010000 0.520000 ( 0.519078)
while 0.470000 0.020000 0.490000 ( 0.493562)

Don't know if this is enough for you, probably not :-)

Jesus.

From: Danny Challis on
Thanks Jesus,
This method actually decreased the runtime by quite a bit, so thanks
for the help! However, I still need something even faster if it exists,
so any other ideas would be appreciated. I may have to just implement
this part is C or something.

Danny.

Jesús Gabriel y Galán wrote:
> On Thu, Jun 24, 2010 at 5:04 PM, Danny Challis <dannychallis(a)gmail.com>
> wrote:
>>
>> Once again, what I'm really looking for is something fast. �I've tried
>> using external linux commands such as awk, but that was much much
>> slower. Any ideas?
>
> I don't know how slow is scan for you. An implementation using
> String#index and a loop is a little bit faster, but not too much:
>...
>
> Don't know if this is enough for you, probably not :-)
>
> Jesus.
--
Posted via http://www.ruby-forum.com/.

From: Dave Baldwin on
http://en.wikipedia.org/wiki/Boyer–Moore_string_search_algorithm

If written in Ruby may not beat using the underlying library functions as they are written in C.

I have vague recollections of a ruby quiz being based on something like this
Dave.

On 24 Jun 2010, at 16:04, Danny Challis wrote:

> Hello everyone,
> I need to count the number of times a substring occurs in a string.
> I am currently doing this using the scan method, but it is simply too
> slow. I feel there should be a faster way to do this since the scan
> method is really designed for more advanced things than this. I do not
> need to do regex matching or to process the matches, just count
> substrings. So what I want is something like this:
>
> s = "you like to play with your yo-yo"
> s.magical_count_method("yo") => 4
>
> Once again, what I'm really looking for is something fast. I've tried
> using external linux commands such as awk, but that was much much
> slower. Any ideas?
> Thanks,
>
> Danny.
> --
> Posted via http://www.ruby-forum.com/.
>


From: Jesús Gabriel y Galán on
On Thu, Jun 24, 2010 at 5:45 PM, Danny Challis <dannychallis(a)gmail.com> wrote:
> Thanks Jesus,
>    This method actually decreased the runtime by quite a bit, so thanks
> for the help!  However, I still need something even faster if it exists,
> so any other ideas would be appreciated.  I may have to just implement
> this part is C or something.

I suppose that if you implement a C method that does what I did in
Ruby, that would be faster.
I mean doing the loop in C and calling String#index from there.

Jesus.