使用正则表达式在perl中打印出它们出现的字母

Cod*_*tor 2 regex perl

实现使用重复字符计数执行字符串压缩的方法.例如,aabcccccaaaaaaa将成为a2b1c5a7.将字符串解压缩为原始字符串.

我尝试下面的代码,但寻找一些衬垫正则表达式解决方案 -

sub print_word{
   my $s=shift;
   my @a=split(//, $s);
   my $c=1;
   my $r='';

   my $t=$a[0];
   for( my $i=1; $i<=$#a; $i++) {
       if($t eq $a[$i]) {
           $c++;
       }else{
           $r.=$t."$c";
           $t=$a[$i];
           $c=1;
       }
   }  
   $r.=$t."$c";
   return $r;
}
print print_word('aabcccccaaaaaaa') . "\n";
Run Code Online (Sandbox Code Playgroud)

请在一行中使用正则表达式提供一些东西.

Sob*_*que 6

好的,这里的诀窍是 - 将引用与字符串匹配;

my $string = 'aabcccccaaaaaaa';

$string =~ s/((\w)\2*)/ "$2". length ($1) /eg;
print $string;
Run Code Online (Sandbox Code Playgroud)

这给出了:

a2b1c5a7
Run Code Online (Sandbox Code Playgroud)

我们'捕获'一个单词字符(\w),我们\2*用来指零或更多(所以因为第一个字母使它'多一个').

然后我们将封装在另一个捕获组中,这意味着我们拥有\2$2作为我们的单个字母,\1或者$1作为同一个字母的子字符串.

我们打印$2然后 - 因为我们e在正则表达式上设置了标志 - 它评估length ( $1 )并插入它.

为了扩展我所说的效率 - 我们需要转到代码分析器.

使用类似的东西Devel::NYTProf:

perl -d:NYTProf script.pl
nytprofhtml --open
Run Code Online (Sandbox Code Playgroud)

您编写的代码:

你的循环

我的例子

我的例子

现在,这里有比例问题 - 我的意思是,如果你反复运行,你可能会发现正则表达式解决方案开始"赢".完全使用正则表达式会产生开销,某些正则表达式可能非常"昂贵".请参阅:http://blog.codinghorror.com/regex-performance/

尝试相同的测试 - 例如 - 在循环中运行100,000次,数字开始均匀.

矿:

正则表达式x 100,000

你:

你的x 100,000

但我仍然建议 - 在你确定需要之前不要担心性能问题.在那之前,请阅读最容易阅读和理解的内容.

我不确定,直到我对另一个问题的反应进行了灾难性回溯的结果,这就是为什么"小心正规用法"在我的脑海里很高.

它们看起来整洁,而且很聪明,但有时它们有点聪明了.但在这种情况下,这似乎并不适用.正则表达式引擎有一个开销,但一旦它开始"工作"并运行得很好.

找出正则表达式"聪明"的有用技巧之一就是你可以 use re 'debug';

以我的例子,这打印:

Compiling REx "((\w)\2*)"
Final program:
   1: OPEN1 (3)
   3:   OPEN2 (5)
   5:     POSIXD[\w] (6)
   6:   CLOSE2 (8)
   8:   CURLYX[2] {0,32767} (13)
  10:     REF2 (12)
  12:   WHILEM[1/1] (0)
  13:   NOTHING (14)
  14: CLOSE1 (16)
  16: END (0)
stclass POSIXD[\w] minlen 1 
Matching REx "((\w)\2*)" against "aabcccccaaaaaaa"
Matching stclass POSIXD[\w] against "aabcccccaaaaaaa" (15 bytes)
   0 <> <aabcccccaa>         |  1:OPEN1(3)
   0 <> <aabcccccaa>         |  3:OPEN2(5)
   0 <> <aabcccccaa>         |  5:POSIXD[\w](6)
   1 <a> <abcccccaaa>        |  6:CLOSE2(8)
   1 <a> <abcccccaaa>        |  8:CURLYX[2] {0,32767}(13)
   1 <a> <abcccccaaa>        | 12:  WHILEM[1/1](0)
                                    whilem: matched 0 out of 0..32767
   1 <a> <abcccccaaa>        | 10:    REF2: "a"(12)
   2 <aa> <bcccccaaaa>       | 12:    WHILEM[1/1](0)
                                      whilem: matched 1 out of 0..32767
   2 <aa> <bcccccaaaa>       | 10:      REF2: "a"(12)
                                        failed...
                                      whilem: failed, trying continuation...
   2 <aa> <bcccccaaaa>       | 13:      NOTHING(14)
   2 <aa> <bcccccaaaa>       | 14:      CLOSE1(16)
   2 <aa> <bcccccaaaa>       | 16:      END(0)
Match successful!
Matching REx "((\w)\2*)" against "bcccccaaaaaaa"
Matching stclass POSIXD[\w] against "bcccccaaaaaaa" (13 bytes)
   2 <aa> <bcccccaaaa>       |  1:OPEN1(3)
   2 <aa> <bcccccaaaa>       |  3:OPEN2(5)
   2 <aa> <bcccccaaaa>       |  5:POSIXD[\w](6)
   3 <aab> <cccccaaaaa>      |  6:CLOSE2(8)
   3 <aab> <cccccaaaaa>      |  8:CURLYX[2] {0,32767}(13)
   3 <aab> <cccccaaaaa>      | 12:  WHILEM[1/1](0)
                                    whilem: matched 0 out of 0..32767
   3 <aab> <cccccaaaaa>      | 10:    REF2: "b"(12)
                                      failed...
                                    whilem: failed, trying continuation...
   3 <aab> <cccccaaaaa>      | 13:    NOTHING(14)
   3 <aab> <cccccaaaaa>      | 14:    CLOSE1(16)
   3 <aab> <cccccaaaaa>      | 16:    END(0)
Match successful!
Matching REx "((\w)\2*)" against "cccccaaaaaaa"
Matching stclass POSIXD[\w] against "cccccaaaaaaa" (12 bytes)
   3 <aab> <cccccaaaaa>      |  1:OPEN1(3)
   3 <aab> <cccccaaaaa>      |  3:OPEN2(5)
   3 <aab> <cccccaaaaa>      |  5:POSIXD[\w](6)
   4 <aabc> <ccccaaaaaa>     |  6:CLOSE2(8)
   4 <aabc> <ccccaaaaaa>     |  8:CURLYX[2] {0,32767}(13)
   4 <aabc> <ccccaaaaaa>     | 12:  WHILEM[1/1](0)
                                    whilem: matched 0 out of 0..32767
   4 <aabc> <ccccaaaaaa>     | 10:    REF2: "c"(12)
   5 <aabcc> <cccaaaaaaa>    | 12:    WHILEM[1/1](0)
                                      whilem: matched 1 out of 0..32767
   5 <aabcc> <cccaaaaaaa>    | 10:      REF2: "c"(12)
   6 <abccc> <ccaaaaaaa>     | 12:      WHILEM[1/1](0)
                                        whilem: matched 2 out of 0..32767
   6 <abccc> <ccaaaaaaa>     | 10:        REF2: "c"(12)
   7 <bcccc> <caaaaaaa>      | 12:        WHILEM[1/1](0)
                                          whilem: matched 3 out of 0..32767
   7 <bcccc> <caaaaaaa>      | 10:          REF2: "c"(12)
   8 <ccccc> <aaaaaaa>       | 12:          WHILEM[1/1](0)
                                            whilem: matched 4 out of 0..32767
   8 <ccccc> <aaaaaaa>       | 10:            REF2: "c"(12)
                                              failed...
                                            whilem: failed, trying continuation...
   8 <ccccc> <aaaaaaa>       | 13:            NOTHING(14)
   8 <ccccc> <aaaaaaa>       | 14:            CLOSE1(16)
   8 <ccccc> <aaaaaaa>       | 16:            END(0)
Match successful!
Matching REx "((\w)\2*)" against "aaaaaaa"
Matching stclass POSIXD[\w] against "aaaaaaa" (7 bytes)
   8 <ccccc> <aaaaaaa>       |  1:OPEN1(3)
   8 <ccccc> <aaaaaaa>       |  3:OPEN2(5)
   8 <ccccc> <aaaaaaa>       |  5:POSIXD[\w](6)
   9 <ccccca> <aaaaaa>       |  6:CLOSE2(8)
   9 <ccccca> <aaaaaa>       |  8:CURLYX[2] {0,32767}(13)
   9 <ccccca> <aaaaaa>       | 12:  WHILEM[1/1](0)
                                    whilem: matched 0 out of 0..32767
   9 <ccccca> <aaaaaa>       | 10:    REF2: "a"(12)
  10 <cccccaa> <aaaaa>       | 12:    WHILEM[1/1](0)
                                      whilem: matched 1 out of 0..32767
  10 <cccccaa> <aaaaa>       | 10:      REF2: "a"(12)
  11 <cccccaaa> <aaaa>       | 12:      WHILEM[1/1](0)
                                        whilem: matched 2 out of 0..32767
  11 <cccccaaa> <aaaa>       | 10:        REF2: "a"(12)
  12 <cccccaaaa> <aaa>       | 12:        WHILEM[1/1](0)
                                          whilem: matched 3 out of 0..32767
  12 <cccccaaaa> <aaa>       | 10:          REF2: "a"(12)
  13 <cccccaaaaa> <aa>       | 12:          WHILEM[1/1](0)
                                            whilem: matched 4 out of 0..32767
  13 <cccccaaaaa> <aa>       | 10:            REF2: "a"(12)
  14 <cccccaaaaaa> <a>       | 12:            WHILEM[1/1](0)
                                              whilem: matched 5 out of 0..32767
  14 <cccccaaaaaa> <a>       | 10:              REF2: "a"(12)
  15 <cccccaaaaaaa> <>       | 12:              WHILEM[1/1](0)
                                                whilem: matched 6 out of 0..32767
  15 <cccccaaaaaaa> <>       | 10:                REF2: "a"(12)
                                                  failed...
                                                whilem: failed, trying continuation...
  15 <cccccaaaaaaa> <>       | 13:                NOTHING(14)
  15 <cccccaaaaaaa> <>       | 14:                CLOSE1(16)
  15 <cccccaaaaaaa> <>       | 16:                END(0)
Match successful!
Matching REx "((\w)\2*)" against ""
Regex match can't succeed, so not even tried
Freeing REx: "((\w)\2*)"
Run Code Online (Sandbox Code Playgroud)

正如您所看到的,它实际上在此示例中做了大量工作.但是因为它不需要在任何时候回溯以匹配你的琴弦,所以它并没有真正浪费任何努力.