Perl的`(?PARNO)`在完成时会丢弃它自己的命名捕获吗?

bri*_*foy 32 regex perl named-captures

递归正则表达式是否理解命名捕获?在文档中有一个注释(?{{ code }}),它是一个独立的子模式,它有自己的一组捕获,在子模式完成时被丢弃,并且有一个注释(?PARNO),它的"类似于(?{{ code }}).(?PARNO)在它完成时丢弃它自己的命名捕获?

我正在写关于Perl的Mastering Perl的递归正则表达式.perlre已经有一个平衡parens的例子(我在Perl正则表达式匹配平衡括号中显示它),所以我想我会尝试平衡引号:

#!/usr/bin/perl
# quotes-nested.pl

use v5.10;

$_ =<<'HERE';
He said 'Amelia said "I am a camel"'
HERE

say "Matched!" if m/
    (
        ['"]
            ( 
                (?: 
                    [^'"]+
                    | 
                    ( (?1) ) 
                )* 
            )
        ['"]
    )
    /xg;

print "
1 => $1
2 => $2
3 => $3
4 => $4
5 => $5
";
Run Code Online (Sandbox Code Playgroud)

这有效,两个引号显示在$1$3:

Matched!
1 => 'Amelia said "I am a camel"'
2 => Amelia said "I am a camel"
3 => "I am a camel"
4 => 
5 => 
Run Code Online (Sandbox Code Playgroud)

没关系.我明白那个.但是,我不想知道这些数字.所以,我做第一个捕获组名为捕获,并期待在%-期待看到我之前看到两个子$1$2:

use v5.10;

$_ =<<'HERE';
He said 'Amelia said "I am a camel"'
HERE

say "Matched [$+{said}]!" if m/
    (?<said>
        ['"]
            ( 
                (?: 
                    [^'"]+
                    | 
                    (?1) 
                )* 
            )
        ['"]
    )
    /xg;

use Data::Dumper;
print Dumper( \%- );
Run Code Online (Sandbox Code Playgroud)

我只看到第一个:

Matched ['Amelia said "I am a camel"']!
$VAR1 = {
          'said' => [
                      '\'Amelia said "I am a camel"\''
                    ]
        };
Run Code Online (Sandbox Code Playgroud)

我预计(?1)会重复第一个捕获组中的所有内容,包括命名捕获到said.我可以通过命名一个新的捕获来解决这个问题:

use v5.10;

$_ =<<'HERE';
He said 'Amelia said "I am a camel"'
HERE

say "Matched [$+{said}]!" if m/
    (?<said>
        ['"]
            ( 
                (?: 
                    [^'"]+
                    | 
                    (?<said> (?1) ) 
                )* 
            )
        ['"]
    )
    /xg;

use Data::Dumper;
print Dumper( \%- );
Run Code Online (Sandbox Code Playgroud)

现在我得到了我的期望:

Matched ['Amelia said "I am a camel"']!
$VAR1 = {
          'said' => [
                      '\'Amelia said "I am a camel"\'',
                      '"I am a camel"'
                    ]
        };
Run Code Online (Sandbox Code Playgroud)

我认为我可以通过将命名捕获移动到一个级别来解决这个问题:

use v5.10;

$_ =<<'HERE';
He said 'Amelia said "I am a camel"'
HERE

say "Matched [$+{said}]!" if m/
    (
        (?<said>
        ['"]
            ( 
                (?: 
                    [^'"]+
                    | 
                    (?1)
                )* 
            )
        ['"]
        )
    )
    /xg;

use Data::Dumper;
print Dumper( \%- );
Run Code Online (Sandbox Code Playgroud)

但是,这不会捕获较小的子字符串said:

Matched ['Amelia said "I am a camel"']!
$VAR1 = {
          'said' => [
                      '\'Amelia said "I am a camel"\''
                    ]
        };
Run Code Online (Sandbox Code Playgroud)

我想我理解这一点,但我也知道这里有人真正触及C代码才能实现.:)

而且,当我写这篇文章时,我认为我应该重置STORE领带%-以找出答案,但后来我必须找出如何做到这一点.

bri*_*foy 4

经过一番尝试后,我很满意我在问题中所说的是正确的。每次调用都会(?PARNO)获取一组完整且独立的匹配变量,并在运行结束时丢弃这些变量。

您可以通过使用模式匹配运算符外部的数组并将其推送到重复子模式的末尾来获取每个子模式中匹配的所有内容,如下例所示:

#!/usr/bin/perl
# nested_carat_n.pl

use v5.10;

$_ =<<'HERE';
Outside "Top Level 'Middle Level "Bottom Level" Middle' Outside"
HERE

my @matches;

say "Matched!" if m/
    (?(DEFINE)
        (?<QUOTE_MARK> ['"])
        (?<NOT_QUOTE_MARK> [^'"])
    )
    (
    (?<quote>(?&QUOTE_MARK))
        (?:
            (?&NOT_QUOTE_MARK)++
            |
            (?R)
        )*
    \g{quote}
    )
    (?{ push @matches, $^N })
    /x;

say join "\n", @matches;
Run Code Online (Sandbox Code Playgroud)

我在Mastering Perl 的第 2 章中深入介绍了它,您可以免费阅读(至少阅读一段时间)。