检查字符串是否与红宝石中的正则表达式匹配的最快方法？

require 'benchmark'

"test123" =~ /1/
=> 4
Benchmark.measure{ 1000000.times { "test123" =~ /1/ } }
=>   0.610000   0.000000   0.610000 (  0.578133)

"test123"[/1/]
=> "1"
Benchmark.measure{ 1000000.times { "test123"[/1/] } }
=>   0.718000   0.000000   0.718000 (  0.750010)

irb(main):019:0> "test123".match(/1/)
=> #<MatchData "1">
Benchmark.measure{ 1000000.times { "test123".match(/1/) } }
=>   1.703000   0.000000   1.703000 (  1.578146)

Run Code Online (Sandbox Code Playgroud)

所以=~速度更快,但这取决于你想要的东西作为返回值.如果您只想检查文本是否包含正则表达式,请使用=~

正如我所写,我已经发现`= ~`比`match`更快,在更大的正则表达式上运行时性能提升不那么显着.我想知道的是,如果有任何奇怪的方法使这个检查更快,可能在Regexp或一些奇怪的构造中利用一些奇怪的方法. (2认同)
你是对的.我测试了它.不是. (2认同)

Answer 3

gio*_*ele 39

这是我在网上找到一些文章后运行的基准.

在2.4.0中,获胜者re.match?(str)(正如@ wiktor-stribiżew所建议的那样),在以前的版本中,re =~ str似乎是最快的,尽管速度str =~ re几乎一样快.

#!/usr/bin/env ruby
require 'benchmark'

str = "aacaabc"
re = Regexp.new('a+b').freeze

N = 4_000_000

Benchmark.bm do |b|
    b.report("str.match re\t") { N.times { str.match re } }
    b.report("str =~ re\t")    { N.times { str =~ re } }
    b.report("str[re]  \t")    { N.times { str[re] } }
    b.report("re =~ str\t")    { N.times { re =~ str } }
    b.report("re.match str\t") { N.times { re.match str } }
    if re.respond_to?(:match?)
        b.report("re.match? str\t") { N.times { re.match? str } }
    end
end

Run Code Online (Sandbox Code Playgroud)

结果MRI 1.9.3-o551:

$ ./bench-re.rb  | sort -t $'\t' -k 2
       user     system      total        real
re =~ str         2.390000   0.000000   2.390000 (  2.397331)
str =~ re         2.450000   0.000000   2.450000 (  2.446893)
str[re]           2.940000   0.010000   2.950000 (  2.941666)
re.match str      3.620000   0.000000   3.620000 (  3.619922)
str.match re      4.180000   0.000000   4.180000 (  4.180083)

Run Code Online (Sandbox Code Playgroud)

结果MRI 2.1.5:

$ ./bench-re.rb  | sort -t $'\t' -k 2
       user     system      total        real
re =~ str         1.150000   0.000000   1.150000 (  1.144880)
str =~ re         1.160000   0.000000   1.160000 (  1.150691)
str[re]           1.330000   0.000000   1.330000 (  1.337064)
re.match str      2.250000   0.000000   2.250000 (  2.255142)
str.match re      2.270000   0.000000   2.270000 (  2.270948)

Run Code Online (Sandbox Code Playgroud)

结果MRI 2.3.3(正则表达式匹配中存在回归,似乎):

$ ./bench-re.rb  | sort -t $'\t' -k 2
       user     system      total        real
re =~ str         3.540000   0.000000   3.540000 (  3.535881)
str =~ re         3.560000   0.000000   3.560000 (  3.560657)
str[re]           4.300000   0.000000   4.300000 (  4.299403)
re.match str      5.210000   0.010000   5.220000 (  5.213041)
str.match re      6.000000   0.000000   6.000000 (  6.000465)

Run Code Online (Sandbox Code Playgroud)

结果MRI 2.4.0:

$ ./bench-re.rb  | sort -t $'\t' -k 2
       user     system      total        real
re.match? str     0.690000   0.010000   0.700000 (  0.682934)
re =~ str         1.040000   0.000000   1.040000 (  1.035863)
str =~ re         1.040000   0.000000   1.040000 (  1.042963)
str[re]           1.340000   0.000000   1.340000 (  1.339704)
re.match str      2.040000   0.000000   2.040000 (  2.046464)
str.match re      2.180000   0.000000   2.180000 (  2.174691)

Run Code Online (Sandbox Code Playgroud)

Answer 4

小智 7

怎么样re === str(案例比较)？

由于它的计算结果为true或false,并且不需要存储匹配项,返回匹配索引和那些东西,我想知道它是否是一种更快的匹配方式=~.

好的,我测试了这个.=~即使你有多个捕获组,它仍然更快,但它比其他选项更快.

顺便说一句,有什么好处freeze？我无法衡量它的性能提升.

Answer 5

小智 5

根据正则表达式的复杂程度，您可能只使用简单的字符串切片。我不确定这对您的应用程序的实用性，或者它是否真的会提供任何速度改进。

'testsentence'['stsen']
=> 'stsen' # evaluates to true
'testsentence'['koala']
=> nil # evaluates to false

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，2 月前
查看次数：	87203 次
最近记录：	6 年，3 月前