为什么我使用这些 Raku 正则表达式得到不同的回溯？

Question

为什么我使用这些 Raku 正则表达式得到不同的回溯？

Jul*_*lio 8 regex rakudo quantifiers raku

我意外地回溯+了 Raku 正则表达式的量词。

在这个正则表达式中：

'abc' ~~ m/(\w+) {say $0}  <?{ $0.substr(*-1) eq 'b' }>/;

say $0;

Run Code Online (Sandbox Code Playgroud)

我得到了预期的结果：

?abc?  # inner say
?ab?   # inner say

?ab?   # final say

Run Code Online (Sandbox Code Playgroud)

也就是说，（贪婪）+量词获取所有字母，然后条件失败。之后它通过释放最后一个得到的字母开始回溯，直到条件评估为真。

但是，当我将量词放在捕获组之外时，回溯似乎不会以相同的方式工作：

'abc' ~~ m/[(\w)]+ {say $0}  <?{ $0.tail eq 'b' }>/;

say $0;

Run Code Online (Sandbox Code Playgroud)

结果：

[?a? ?b? ?c?]  # inner say
[?a? ?b? ?c?]  # why this extra inner say? Shouldn't this backtrack to [?a? ?b?]?
[?a? ?b? ?c?]  # why this extra inner say? Shouldn't this backtrack to [?a? ?b?]?
[?b? ?c?]      # Since we could not successfully backtrack, We go on matching by increasing the position
[?b? ?c?]      # Previous conditional fails. We get this extra inner say
[?c?]          # Since we could not successfully backtrack, We go on matching by increasing the position

Nil            # final say, no match because we could not find a final 'b'

Run Code Online (Sandbox Code Playgroud)

这种行为是预期的吗？如果是这样：为什么它们的工作方式不同？是否可以模仿第一个正则表达式但仍将量词保留在捕获组之外？

笔记：

使用惰性量词“解决”了问题......这是预期的，因为回溯似乎会发生差异，而惰性量词不会发生这种情况。

'abc' ~~ m/[(\w)]+? {say $0}  <?{ $0.tail eq 'b' }>/;

[?a?]
[?a? ?b?]

[?a? ?b?]

Run Code Online (Sandbox Code Playgroud)

但是出于性能原因，我宁愿使用贪婪的量词（这个问题中的例子是一个简化）。

Answer 1

Pra*_*nna 7

我认为问题不在于回溯。但看起来中间$0暴露保留了先前的迭代捕获。考虑这个表达式，

'abc' ~~ m/[(\w)]+ {say "Match:",$/.Str,";\tCapture:",$0}  <?{ False }>/;

Run Code Online (Sandbox Code Playgroud)

这是输出：

Match:abc;  Capture:[?a? ?b? ?c?]
Match:ab;   Capture:[?a? ?b? ?c?]
Match:a;    Capture:[?a? ?b? ?c?]
Match:bc;   Capture:[?b? ?c?]
Match:b;    Capture:[?b? ?c?]
Match:c;    Capture:[?c?]

Run Code Online (Sandbox Code Playgroud)

如您所见，匹配顺序正确，abc ab a .... 但是ab匹配的捕获数组也是[?a? ?b? ?c?]. 我怀疑这是一个错误。

对于您的情况，有几种方法。

仅$/用于条件检查
```
'abc' ~~ m/[(\w)]+  <?{ $/.Str.substr(*-1) eq 'b' }>/;
```
Run Code Online (Sandbox Code Playgroud)
或者，另外也用限定符捕获组。
```
'abc' ~~ m/([(\w)]+) <?{ $0[0][*-1] eq 'b' }>/;
```
Run Code Online (Sandbox Code Playgroud) 这里$0匹配外部组，$0[0]匹配第一个内部组，$[0][*-1]匹配本次迭代中最终匹配的字符。

比 `$/.Str.substr(*-1) eq 'b'` 更简单的是 `$/.ends-with: 'b'` (3认同)
我已经提出了一个问题[使用`(foo)+`，在回溯期间不会删除相应的子捕获](https://github.com/rakudo/rakudo/issues/4105)。 (2认同)
@jubilatious1，给出的正则表达式，是用于演示该问题的简化正则表达式。根据演示输出，希望您同意存在差异。是的，只有当我们使用回溯和“$0”时才会出现此问题。 (2认同)

归档时间：	5 年，2 月前
查看次数：	154 次
最近记录：	5 年，2 月前