Mik*_*cic 7 ruby regex performance ruby-on-rails
我有一串话; 我们打电话给他们bad
:
bad = "foo bar baz"
Run Code Online (Sandbox Code Playgroud)
我可以将此字符串保留为以空格分隔的字符串或列表:
bad = bad.split(" ");
Run Code Online (Sandbox Code Playgroud)
如果我有另一个字符串,如下所示:
str = "This is my first foo string"
Run Code Online (Sandbox Code Playgroud)
什么是检查从任何单词的禁食方式bad
串是我比较字符串中,并且如果它发现了什么是去除最快的方式说一句话?
#Find if a word is there
bad.split(" ").each do |word|
found = str.include?(word)
end
#Remove the word
bad.split(" ").each do |word|
str.gsub!(/#{word}/, "")
end
Run Code Online (Sandbox Code Playgroud)
如果坏词列表变得很大,那么散列会快得多:
require 'benchmark'
bad = ('aaa'..'zzz').to_a # 17576 words
str= "What's the fasted way to check if any word from the bad string is within my "
str += "comparison string, and what's the fastest way to remove said word if it's "
str += "found"
str *= 10
badex = /\b(#{bad.join('|')})\b/i
bad_hash = {}
bad.each{|w| bad_hash[w] = true}
n = 10
Benchmark.bm(10) do |x|
x.report('regex:') {n.times do
str.gsub(badex,'').squeeze(' ')
end}
x.report('hash:') {n.times do
str.gsub(/\b\w+\b/){|word| bad_hash[word] ? '': word}.squeeze(' ')
end}
end
user system total real
regex: 10.485000 0.000000 10.485000 ( 13.312500)
hash: 0.000000 0.000000 0.000000 ( 0.000000)
Run Code Online (Sandbox Code Playgroud)