我试图用 C# 编写一个函数,它删除两个字符串之间的字符串。像这样:
string RemoveBetween(string sourceString, string startTag, string endTag)
Run Code Online (Sandbox Code Playgroud)
一开始我觉得这很容易,但后来我遇到了越来越多的问题
所以这是最简单的情况(所有带有 startTag="Start" 和 endTag="End" 的示例)
"Any Text Start remove this End between" => "Any Text StartEnd between"
Run Code Online (Sandbox Code Playgroud)
但它也应该能够处理多个而不删除之间的文本:
"Any Text Start remove this End between should be still there Start and remove this End multiple" => "Any Text StartEnd between should be still there StartEnd multiple"
Run Code Online (Sandbox Code Playgroud)
它应该总是使用最小的字符串来删除:
"So Start followed by Start only remove this End other stuff" => "So Start followed by StartEnd other stuff"
Run Code Online (Sandbox Code Playgroud)
它还应该尊重标签的顺序:
"the End before Start. Start before End is correct" => "the End before Start. StartEnd is correct"
Run Code Online (Sandbox Code Playgroud)
我尝试了一个无效的正则表达式(它无法处理倍数):
public string RemoveBetween(string sourceString, string startTag, string endTag)
{
Regex regex = new Regex(string.Format("{0}(.*){1}", Regex.Escape(startTag), Regex.Escape(endTag)));
return regex.Replace(sourceString, string.Empty);
}
Run Code Online (Sandbox Code Playgroud)
然后我尝试使用 IndexOf 和 Substring,但我看不到尽头。即使它会起作用,这也不是解决这个问题的最优雅的方法。
这是一种方法 string.Remove()
string input = "So Start followed by Start only remove this End other stuff";
int start = input.LastIndexOf("Start") + "Start".Length;
int end = input.IndexOf("End", start);
string result = input.Remove(start, end - start);
Run Code Online (Sandbox Code Playgroud)
我使用LastIndexOf()是因为可以有多个开始,而您想要最后一个。
您必须稍微修改您的函数才能与所有示例进行非贪婪匹配并?使用RegexOptions.RightToLeft:
public static string RemoveBetween(string sourceString, string startTag, string endTag)
{
Regex regex = new Regex(string.Format("{0}(.*?){1}", Regex.Escape(startTag), Regex.Escape(endTag)), RegexOptions.RightToLeft);
return regex.Replace(sourceString, startTag+endTag);
}
Run Code Online (Sandbox Code Playgroud)