如何摆脱正则表达式中的重复

Imm*_*ith 5 c# regex duplicates

假设我有一根绳子,"猫猫猫狗狗狗".

我将使用什么正则表达式来替换"猫与狗"的字符串.即删除重复项.但是,表达式只能删除彼此之后的重复项.例如:

"猫猫猫狗狗狗猫猫猫狗"

会回来:

"猫与狗,猫与狗"

Tim*_*ker 9

resultString = Regex.Replace(subjectString, @"\b(\w+)(?:\s+\1\b)+", "$1");
Run Code Online (Sandbox Code Playgroud)

将在一个电话中完成所有替换.

说明:

\b                 # assert that we are at a word boundary
                   # (we only want to match whole words)
(\w+)              # match one word, capture into backreference #1
(?:                # start of non-capturing, repeating group
   \s+             # match at least one space
   \1              # match the same word as previously captured
   \b              # as long as we match it completely
)+                 # do this at least once
Run Code Online (Sandbox Code Playgroud)


Ama*_*osh 2

(\w+)\s+\1用。。。来代替$1

循环执行此操作,直到找不到更多匹配项。设置global标志是不够的,因为它不会取代第三catscats cats cats

\1正则表达式中指的是第一个捕获组的内容。

尝试:

str = "cats cats cats and dogs dogs dogs and cats cats and dogs dogs";
str = Regex.Replace(str, @"(\b\w+\b)\s+(\1(\s+|$))+", "$1 ");
Console.WriteLine(str);
Run Code Online (Sandbox Code Playgroud)