如何将正则表达式匹配仅添加到匹配集合中一次?

Sai*_*udo 14 c# regex

我有一个字符串,里面有几个html注释.我需要计算表达式的唯一匹配.

例如,字符串可能是:

var teststring = "<!--X1-->Hi<!--X1-->there<!--X2-->";
Run Code Online (Sandbox Code Playgroud)

我目前用这个来获得比赛:

var regex = new Regex("<!--X.-->");
var matches = regex.Matches(teststring);
Run Code Online (Sandbox Code Playgroud)

结果是3场比赛.但是,我想这只有两场比赛,因为只有两场比赛.

我知道我可以循环生成MatchCollection并删除额外的Match,但我希望有一个更优雅的解决方案.

澄清:样本字符串与实际使用的内容大大简化.很容易就有X8或X9,字符串中可能有几十个.

Svi*_*ish 24

我只是使用Enumerable.Distinct方法,例如:

string subjectString = "<!--X1-->Hi<!--X1-->there<!--X2--><!--X1-->Hi<!--X1-->there<!--X2-->";
var regex = new Regex(@"<!--X\d-->");
var matches = regex.Matches(subjectString);
var uniqueMatches = matches
    .OfType<Match>()
    .Select(m => m.Value)
    .Distinct();

uniqueMatches.ToList().ForEach(Console.WriteLine);
Run Code Online (Sandbox Code Playgroud)

输出:

<!--X1-->  
<!--X2-->
Run Code Online (Sandbox Code Playgroud)

对于正则表达式,你可以使用这个吗?

(<!--X\d-->)(?!.*\1.*)
Run Code Online (Sandbox Code Playgroud)

似乎至少在RegexBuddy中测试你的测试字符串=)

// (<!--X\d-->)(?!.*\1.*)
// 
// Options: dot matches newline
// 
// Match the regular expression below and capture its match into backreference number 1 «(<!--X\d-->)»
//    Match the characters “<!--X” literally «<!--X»
//    Match a single digit 0..9 «\d»
//    Match the characters “-->” literally «-->»
// Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!.*\1.*)»
//    Match any single character «.*»
//       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
//    Match the same text as most recently matched by capturing group number 1 «\1»
//    Match any single character «.*»
//       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Run Code Online (Sandbox Code Playgroud)