为什么我的tcl正则表达式与perl相比表现如此糟糕？

Question

为什么我的tcl正则表达式与perl相比表现如此糟糕？

set fr [open "x.txt" r]
set fw [open "y.txt" w]
set myRegex {^([0-9]+) ([0-9:]+\.[0-9]+).* ABC\.([a-zA-Z]+)\[([0-9]+)\] DEF\(([a-zA-Z]+)\) HIJ\(([0-9]+)\) KLM\(([0-9\.]+)\) NOP\(([0-9]+)\) QRS\(([0-9]+)\)}
while { [gets $fr line] >= 0 } {
   if { [regexp $myRegex $line match x y w z]} {
       if { [expr $D >> 32] == [lindex $argv 0]} {
         puts $fw "$x"
       }
   }
}
close $fr $fw

Run Code Online (Sandbox Code Playgroud)

上面的tcl代码需要永远(32s或更多)才能执行.在perl中执行基本相同的操作在3秒或更短时间内运行.我知道perl对于一些正则表达式表现更好,但相比之下tcl性能真的会这么糟糕吗？差了10多倍？

我顺便使用TCL 8.4

以下是使用正则表达式运行上述代码的指标以及相同正则表达式的简化版本

32s is the time taken for the above code to execute
22s after removing: QRS\(([0-9]+)\) 
17s after removing: NOP\(([0-9]+)\) QRS\(([0-9]+)\)
13s after removing: KLM\(([0-9\.]+)\) NOP\(([0-9]+)\) QRS\(([0-9]+)\)
9s  after removing: HIJ\(([0-9]+)\) KLM\(([0-9\.]+)\) NOP\(([0-9]+)\) QRS\(([0-9]+)\)
6s  after removing: DEF\(([a-zA-Z]+)\) HIJ\(([0-9]+)\) KLM\(([0-9\.]+)\) NOP\(([0-9]+)\) QRS\(([0-9]+)\)}

Run Code Online (Sandbox Code Playgroud)

Answer 1

Don*_*ows 6

问题是你在RE中有很多捕捉和回溯; 特定组合与Tcl RE引擎的效果不佳.一个层面上的原因是Tcl对Perl使用完全不同类型的RE引擎(尽管它对其他RE更好;这个区域非常重要).

如果可以,请尽早摆脱.*RE:

^([0-9]+) ([0-9:]+\.[0-9]+).* ABC\.([a-zA-Z]+)\[([0-9]+)\] DEF\(([a-zA-Z]+)\) HIJ\(([0-9]+)\) KLM\(([0-9\.]+)\) NOP\(([0-9]+)\) QRS\(([0-9]+)\)
                           ^^

这是麻烦的真正原因.替换为更精确的东西,例如:

(?:[^A]|A[^B]|AB[^C])*

此外,将RE中的捕获组数量减少到您需要的数量.您可以将整个代码转换为:

set fr [open "x.txt" r]
set fw [open "y.txt" w]
set myRegex {^([0-9]+) (?:[0-9:]+\.[0-9]+)(?:[^A]|A[^B]|AB[^C])* ABC\.(?:[a-zA-Z]+)\[([0-9]+)\] DEF\((?:[a-zA-Z]+)\) HIJ\((?:[0-9]+)\) KLM\((?:[0-9\.]+)\) NOP\((?:[0-9]+)\) QRS\((?:[0-9]+)\)}
while { [gets $fr line] >= 0 } {
    # I've combined the [if]s and the [expr]
    if { [regexp $myRegex $line -> A D] && $D >> 32 == [lindex $argv 0]} {
        puts $fw "$A"
    }
}
close $fr $fw

Run Code Online (Sandbox Code Playgroud)

还要注意,这if { [expr ...] }是一个可疑的代码气味,就像任何没有支撑的表达一样.(在非常特殊的情况下,它有时是必要的,但几乎总是表明代码过于复杂.)

归档时间：	12 年，5 月前
查看次数：	497 次
最近记录：	11 年，11 月前