检查一个字符串中的单词是否在另一个字符串中的最快方法是什么?

Mik*_*cic 7 ruby regex performance ruby-on-rails

我有一串话; 我们打电话给他们bad:

bad = "foo bar baz"
Run Code Online (Sandbox Code Playgroud)

我可以将此字符串保留为以空格分隔的字符串或列表:

bad = bad.split(" ");
Run Code Online (Sandbox Code Playgroud)

如果我有另一个字符串,如下所示:

str = "This is my first foo string"
Run Code Online (Sandbox Code Playgroud)

什么是检查从任何单词的禁食方式bad串是我比较字符串中,并且如果它发现了什么是去除最快的方式说一句话?

#Find if a word is there
bad.split(" ").each do |word|
  found = str.include?(word)
end

#Remove the word
bad.split(" ").each do |word|
  str.gsub!(/#{word}/, "")
end
Run Code Online (Sandbox Code Playgroud)

ste*_*lag 9

如果坏词列表变得很大,那么散列会快得多:

    require 'benchmark'

    bad = ('aaa'..'zzz').to_a    # 17576 words
    str= "What's the fasted way to check if any word from the bad string is within my "
    str += "comparison string, and what's the fastest way to remove said word if it's "
    str += "found" 
    str *= 10

    badex = /\b(#{bad.join('|')})\b/i

    bad_hash = {}
    bad.each{|w| bad_hash[w] = true}

    n = 10
    Benchmark.bm(10) do |x|

      x.report('regex:') {n.times do 
        str.gsub(badex,'').squeeze(' ')
      end}

      x.report('hash:') {n.times do
        str.gsub(/\b\w+\b/){|word| bad_hash[word] ? '': word}.squeeze(' ')
      end}

    end
                user     system      total        real
regex:     10.485000   0.000000  10.485000 ( 13.312500)
hash:       0.000000   0.000000   0.000000 (  0.000000)
Run Code Online (Sandbox Code Playgroud)