在Perl中使用"index()"vs RegEx进行子字符串搜索的性能差异是什么原因?

DVK*_*DVK 3 string perl search

我假设可能存在效率差异:

if (index($string, "abc") < -1) {}
Run Code Online (Sandbox Code Playgroud)

if ($string !~ /abc/) {}
Run Code Online (Sandbox Code Playgroud)

有人可以确认这是基于两者如何在Perl中实现(而不是纯基准测试)的情况?

我显然可以猜测两者是如何实现的(基于我将如何在C中编写)但是希望根据实际perl源代码理想地获得更明智的答案.


这是我自己的样本基准:

                          Rate regex.FIND_AT_END    index.FIND_AT_END
regex.FIND_AT_END     639345/s                   --                 -88%
index.FIND_AT_END    5291005/s                 728%                   --
                          Rate regex.NOFIND         index.NOFIND
regex.NOFIND          685260/s                   --                 -88%
index.NOFIND         5515720/s                 705%                   --
                          Rate regex.FIND_AT_START  index.FIND_AT_START
regex.FIND_AT_START   672269/s                   --                 -90%
index.FIND_AT_START  7032349/s                 946%                   --
##############################
use Benchmark qw(:all);

my $count = 10000000;
my $re = qr/abc/o;
my %tests = (
    "NOFIND        " => "cvxcvidgds.sdfpkisd[s"
   ,"FIND_AT_END   " => "cvxcvidgds.sdfpabcd[s"
   ,"FIND_AT_START " => "abccvidgds.sdfpkisd[s"
);

foreach my $type (keys %tests) {
    my $str = $tests{$type};
    cmpthese($count, {
        "index.$type" => sub { my $idx = index($str, "abc"); },
        "regex.$type" => sub { my $idx = ($str =~ $re); }
    });
}
Run Code Online (Sandbox Code Playgroud)

Sin*_*nür 5

看看这个功能Perl_instr:

 430 char *
 431 Perl_instr(register const char *big, register const char *little)
 432 {
 433     register I32 first;
 434 
 435     PERL_ARGS_ASSERT_INSTR;
 436 
 437     if (!little)
 438         return (char*)big;
 439     first = *little++;
 440     if (!first)
 441         return (char*)big;
 442     while (*big) {
 443         register const char *s, *x;
 444         if (*big++ != first)
 445             continue;
 446         for (x=big,s=little; *s; /**/ ) {
 447             if (!*x)
 448                 return NULL;
 449             if (*s != *x)
 450                 break;
 451             else {
 452                 s++;
 453                 x++;
 454             }
 455         }
 456         if (!*s)
 457             return (char*)(big-1);
 458     }
 459     return NULL;
 460 }
Run Code Online (Sandbox Code Playgroud)

S_regmatch比较.在我看来,regmatch相比之下有一些开销index;-)

  • 你忘了把"<british accent>"放在低调的地方:) (2认同)