C#中带百分比的字符串模糊匹配

Muh*_*ail 1 c# regex string

我的问题是假设我有一个字符串:

"快速的棕色狐狸跳过懒狗"它有8个单词,我有一些其他字符串,我必须比较上面的字符串这些字符串是:

  1. 这是与上面的字符串不匹配的字符串.

  2. 快速布朗狐狸跳跃.

  3. 棕色的狐狸跳过懒惰.

  4. 快速的棕色狐狸在狗身上.

  5. 狐狸跳过懒狗.

  6. 跳过了.

  7. 懒狗.

例如,用户给出阈值(匹配字符串的百分比率)为60%,这意味着

= 8*60/100(这里8是字符串的总字数,60是阈值)

= 4.8

这意味着至少4个单词应该匹配,这意味着结果应该是

  1. 快速布朗狐狸跳跃.

  2. 快速的棕色狐狸在狗身上.

  3. 棕色的狐狸跳过懒惰.

  4. 狐狸跳过懒狗.

我怎么能在c#中做这个模糊匹配请帮帮我..

Dmi*_*nko 6

我宁愿建议比较字典,而不是字符串:

  1. 如果句子中有相同的单词,例如"狐狸跳过狗",该怎么办?
  2. 标点符号:句号,逗号等
  3. 情况下,说: "狐狸", "狐狸精", "FOX"等.

所以执行

public static Dictionary<String, int> WordsToCounts(String value) {
  if (String.IsNullOrEmpty(value))
    return new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);

  return value
    .Split(' ', '\r', '\n', '\t')
    .Select(item => item.Trim(',', '.', '?', '!', ':', ';', '"'))
    .Where(item => !String.IsNullOrEmpty(item))
    .GroupBy(item => item, StringComparer.OrdinalIgnoreCase)
    .ToDictionary(chunk => chunk.Key, 
                  chunk => chunk.Count(), 
                  StringComparer.OrdinalIgnoreCase);
}

public static Double DictionaryPercentage(
  IDictionary<String, int> left,
  IDictionary<String, int> right) {

  if (null == left)
    if (null == right)
      return 1.0;
    else
      return 0.0;
  else if (null == right)
    return 0.0;

  int all = left.Sum(pair => pair.Value);

  if (all <= 0)
    return 0.0;

  double found = 0.0;

  foreach (var pair in left) {
    int count;

    if (!right.TryGetValue(pair.Key, out count))
      count = 0;

    found += count < pair.Value ? count : pair.Value;
  }

  return found / all;
}

public static Double StringPercentage(String left, String right) {
  return DictionaryPercentage(WordsToCounts(left), WordsToCounts(right));
}
Run Code Online (Sandbox Code Playgroud)

并且您提供的样本将是

  String original = "Quick Brown Fox Jumps over the lazy dog";

  String[] extracts = new String[] {
    "This is un-match string with above string.",
    "Quick Brown fox Jumps.",
    "brown fox jumps over the lazy.",
    "quick brown fox over the dog.",
    "fox jumps over the lazy dog.",
    "jumps over the.",
    "lazy dog.",
  };

  var data = extracts
    .Select(item => new {
      text = item,
      perCent = StringPercentage(original, item) * 100.0
    })
    //.Where(item => item.perCent >= 60.0) // uncomment this to apply threshold
    .Select(item => String.Format(CultureInfo.InvariantCulture, 
      "\"{0}\" \t {1:F2}%", 
      item.text, item.perCent));

  String report = String.Join(Environment.NewLine, data);

  Console.write(report);
Run Code Online (Sandbox Code Playgroud)

报告是

  "This is un-match string with above string."   0.00%
  "Quick Brown fox Jumps."                      50.00%
  "brown fox jumps over the lazy."              75.00%
  "quick brown fox over the dog."               75.00%
  "fox jumps over the lazy dog."                75.00%
  "jumps over the."                             37.50%
  "lazy dog."                                   25.00%
Run Code Online (Sandbox Code Playgroud)