使用PCRE匹配n> 0的^ nb ^ nc ^ n

(?:a…)+子模式的字符a匹配.有了(?=a*,我们直接跳到"柜台".
(\1?+b)捕获组(\1)有效地消耗任何先前已经匹配,如果它的存在,并使用所有格匹配它不允许回溯,如果计数器超出同步匹配失败-也就是说,出现了更多的子模式的b比子模式a.在第一次迭代中,这是不存在的,并且没有任何匹配.然后,b匹配子模式的字符.它被添加到捕获组中,有效地"计算" b组中的一个.有了b*,我们直接跳到下一个"柜台".
(\2?+c)捕获group(\2)有效地消耗以前匹配的任何内容.由于此附加字符捕获的工作方式与上一个组相同,因此允许字符在这些字符组中进行长度同步.假设连续序列a..b..c..:

_{(请原谅我的艺术.)}

第一次迭代:

| The first 'a' is matched by the 'a' in '^(?:a…)'.
| The pointer is stuck after it as we begin the lookahead.
v,- Matcher pointer
aaaa...bbbbbbbb...cccc...
 ^^^   |^^^       ^
skipped| skipped  Matched by c in (\2?+c);
by a*  | by b*         \2 was "nothing",
       |               now it is "c".
       Matched by b
       in (\1?+b).
     \1 was "nothing", now it is "b".

Run Code Online (Sandbox Code Playgroud)

第二次迭代:

 | The second 'a' is matched by the 'a' in '^(?:a…)'.
 | The pointer is stuck after it as we begin the lookahead.
 v,- Matcher pointer
aaaa...bbbbbbbb...cccc...
       /|^^^      |^
eaten by| skipped |Matched by c in (\2?+c);
\1?+    | by b*   |     '\2' was "nothing",
  ^^    |      \2?+     now it is "cc".
 skipped|
 by a*  \ Matched by b
          in (\1?+b).
          '\1' was "nothing", now it is "bb".

Run Code Online (Sandbox Code Playgroud)

如上所讨论的三组"消耗"的每一个中的一个a,b,c分别,他们是在循环式匹配并且由"计数" (?:a…)+,(\1?+b)和(\2?+c)分别的基团.随着更多的锚定和捕捉我们开始,我们可以断言,我们匹配xyz(代表每一组以上),其中x,y和z是,和分别.aⁿbⁿcⁿ

作为奖励,要"计算"更多,可以这样做:

Pattern: ^(?:a(?=a*(\1?+b)b*(\2?+c)))+\1{3}\2$
Matches: abbbc
aabbbbbbcc
aaabbbbbbbbbccc

Pattern: ^(?:a(?=a*(\1?+bbb)b*(\2?+c)))+\1\2$
Matches: abbbc
aabbbbbbcc
aaabbbbbbbbbccc

[*Trick命名咆哮*]这真的叫做Qtax Trick吗？Qtax的答案是指PolygeneLubricants的答案作为来源.无论哪种方式,我认为"自我引用捕获组"更清晰.[*/Trick命名咆哮*] [*致谢和尊重*]尊重所有相关方,包括你Unihedron - 很棒的答案![*/致谢和尊重*] (2认同)

Answer 2

Ham*_*mZa 11

首先,让我们解释一下你所拥有的模式:

^               # Assert begin of line
    (           # Capturing group 1
        a       # Match a
        (?1)?   # Recurse group 1 optionally
        b       # Match b
    )           # End of group 1
$               # Assert end of line

Run Code Online (Sandbox Code Playgroud)

使用以下修饰符:

g: global, match all
m: multiline, match start and end of line with ^ and $ respectively
x: extended, indentation are ignored with the ability to add comments with #

Run Code Online (Sandbox Code Playgroud)

递归部分是可选的,以便最终退出"无限"递归.

我们可以使用上面的模式来解决问题.我们需要添加一些正则表达式来匹配该c部分.问题是当aabb匹配时aabbcc,它已被消耗,这意味着我们无法追溯.

解决方案？使用前瞻!前瞻是零宽度,这意味着它不会消耗并向前移动.看看这个:

^                    # Assert begin of line
    (?=              # First zero-with lookahead
        (            # Capturing group 1
            a        # Match a
            (?1)?    # Recurse group 1 optionally
            b        # Match b
        )            # End of group 1
        c+           # Match c one or more times
    )                # End of the first lookahead

    (?=              # Second zero-with lookahead
        a+           # Match a one or more times
        (            # Capturing group 2
            b        # Match b
            (?2)?    # Recurse group 2 optionally
            c        # Match c
        )            # End of group 2
    )                # End of the second lookahead
a+b+c+               # Match each of a,b and c one or more times
$                    # Assert end of line

Run Code Online (Sandbox Code Playgroud)

Online demo

基本上我们首先断言有一个^ nb ^ n,然后我们断言b ^ nc ^ n,这将导致^ nb ^ nc ^ n.

你不能通过放弃'a + b + c +`来简化第二部分并且不使用前瞻吗？ (2认同)

归档时间：	10 年，7 月前
查看次数：	2806 次
最近记录：	10 年，7 月前