如何大写正则表达式模式?

gou*_*dos 6 c# regex

我目前正在研究一个高度利用正则表达式的项目.输入字符串已经是大写的,因此设置了正则表达式IgnoreCase标志.然而,内部MS RegEx引擎将所有情况都改回到较低的位置,这是不必要的打击.将reg expresions模式更改为大写并删除标志有助于提高性能.

有没有人知道一个算法库,它可以大写Reg ex模式而不会影响组名或转义字符?

Tim*_*ker 1

您可以去搜索前面没有奇数个反斜杠的小写字母:

(?<!(?<!\\)(?:\\\\)*\\)\p{Ll}+
Run Code Online (Sandbox Code Playgroud)

然后将匹配项传递给 a MatchEvaluator,将其大写并替换原始字符串中的文本。我不懂 C#,所以这可能无法立即工作(从RegexBuddy获取并修改了一些代码片段),但它是一个开始:

string resultString = null;
resultString = Regex.Replace(subjectString, 
    @"(?<!                 # Negative lookbehind:
       (?<!\\)(?:\\\\)*\\  # Is there no odd number of backslashes
      |                    # nor
       \(\?<?\p{L}*        # (?<tags or (?modifiers
      )                    # before the current position?
      \p{Ll}+              # Then match one or more letters", 
    new MatchEvaluator(ComputeReplacement), RegexOptions.IgnorePatternWhitespace);

public String ComputeReplacement(Match m) {
    // You can vary the replacement text for each match on-the-fly
    return @"\0".ToUpper();  // or whatever is needed for uppercasing in .NET
}
Run Code Online (Sandbox Code Playgroud)

解释:

(?<!        # assert that the string before the current position doesn't match:
 (?<!\\)    # assert that we start at the first backslash in the series
 (?:\\\\)*  # match an even number of backslashes
 \\         # match one backslash
)
\p{Ll}+     # now match any sequence of lowercase letters
Run Code Online (Sandbox Code Playgroud)