ses*_*sta 5 c# regex balancing-groups expresso
假设我有这个文本输入.
tes{}tR{R{abc}aD{mnoR{xyz}}}
Run Code Online (Sandbox Code Playgroud)
我想提取ff输出:
R{abc}
R{xyz}
D{mnoR{xyz}}
R{R{abc}aD{mnoR{xyz}}}
Run Code Online (Sandbox Code Playgroud)
目前,我只能使用msdn中的平衡组方法提取{}组内的内容.这是模式:
^[^{}]*(((?'Open'{)[^{}]*)+((?'Target-Open'})[^{}]*)+)*(?(Open)(?!))$
Run Code Online (Sandbox Code Playgroud)
有谁知道如何在输出中包含R {}和D {}?
我认为这里需要一种不同的方法。一旦您匹配第一个较大的组R{R{abc}aD{mnoR{xyz}}}(请参阅我对可能的拼写错误的评论),您将无法获取其中的子组,因为正则表达式不允许您捕获各个R{ ... }组。
因此,必须有某种方法来捕获而不是消耗,而显而易见的方法就是使用积极的前瞻。从那里,您可以放置您使用的表达式,尽管需要进行一些更改以适应焦点的新变化,我想出了:
(?=([A-Z](?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)+(?(O)(?!))))
Run Code Online (Sandbox Code Playgroud)
[我还将“Open”重命名为“O”,并删除了右大括号的命名捕获,以使其更短并避免比赛中出现噪音]
在 regexhero.net(迄今为止我所知道的唯一免费的 .NET 正则表达式测试器)上,我得到了以下捕获组:
1: R{R{abc}aD{mnoR{xyz}}}
1: R{abc}
1: D{mnoR{xyz}}
1: R{xyz}
Run Code Online (Sandbox Code Playgroud)
正则表达式的细分:
(?= # Opening positive lookahead
([A-Z] # Opening capture group and any uppercase letter (to match R & D)
(?: # First non-capture group opening
(?: # Second non-capture group opening
(?'O'{) # Get the named opening brace
[^{}]* # Any non-brace
)+ # Close of second non-capture group and repeat over as many times as necessary
(?: # Third non-capture group opening
(?'-O'}) # Removal of named opening brace when encountered
[^{}]*? # Any other non-brace characters in case there are more nested braces
)+ # Close of third non-capture group and repeat over as many times as necessary
)+ # Close of first non-capture group and repeat as many times as necessary for multiple side by side nested braces
(?(O)(?!)) # Condition to prevent unbalanced braces
) # Close capture group
) # Close positive lookahead
Run Code Online (Sandbox Code Playgroud)
我实际上想尝试一下它应该如何在 PCRE 引擎上工作,因为可以选择使用递归正则表达式,并且我认为它更容易,因为我更熟悉它并且产生了更短的正则表达式:)
(?=([A-Z]{(?:[^{}]|(?1))+}))
Run Code Online (Sandbox Code Playgroud)
(?= # Opening positive lookahead
([A-Z] # Opening capture group and any uppercase letter (to match R & D)
{ # Opening brace
(?: # Opening non-capture group
[^{}] # Matches non braces
| # OR
(?1) # Recurse first capture group
)+ # Close non-capture group and repeat as many times as necessary
} # Closing brace
) # Close of capture group
) # Close of positive lookahead
Run Code Online (Sandbox Code Playgroud)