我目前正在研究一个高度利用正则表达式的项目.输入字符串已经是大写的,因此设置了正则表达式IgnoreCase标志.然而,内部MS RegEx引擎将所有情况都改回到较低的位置,这是不必要的打击.将reg expresions模式更改为大写并删除标志有助于提高性能.
有没有人知道一个算法库,它可以大写Reg ex模式而不会影响组名或转义字符?
您可以去搜索前面没有奇数个反斜杠的小写字母:
(?<!(?<!\\)(?:\\\\)*\\)\p{Ll}+
Run Code Online (Sandbox Code Playgroud)
然后将匹配项传递给 a MatchEvaluator
,将其大写并替换原始字符串中的文本。我不懂 C#,所以这可能无法立即工作(从RegexBuddy获取并修改了一些代码片段),但它是一个开始:
string resultString = null;
resultString = Regex.Replace(subjectString,
@"(?<! # Negative lookbehind:
(?<!\\)(?:\\\\)*\\ # Is there no odd number of backslashes
| # nor
\(\?<?\p{L}* # (?<tags or (?modifiers
) # before the current position?
\p{Ll}+ # Then match one or more letters",
new MatchEvaluator(ComputeReplacement), RegexOptions.IgnorePatternWhitespace);
public String ComputeReplacement(Match m) {
// You can vary the replacement text for each match on-the-fly
return @"\0".ToUpper(); // or whatever is needed for uppercasing in .NET
}
Run Code Online (Sandbox Code Playgroud)
解释:
(?<! # assert that the string before the current position doesn't match:
(?<!\\) # assert that we start at the first backslash in the series
(?:\\\\)* # match an even number of backslashes
\\ # match one backslash
)
\p{Ll}+ # now match any sequence of lowercase letters
Run Code Online (Sandbox Code Playgroud)