字符串按长度分割并仅按最近的空格分割

Gok*_*l E 5 c# linq string ienumerable

我有一条类似的文字

\n\n
var data = "\xc3\xa2\xc3\xb4\xc2\xa2\xc2\xac\xc3\xb4\xc3\xa8\xc3\xb7\xc2\xa2 : \xc2\xaa\xc3\xae\xc3\xb8\xc2\xa2\xc3\xa8\xc2\xa4\xc3\xb4\xc2\xa2 - \xc3\xa3\xc2\xbf\xc3\xb1\xc2\xac\xc3\xb4 \xc3\xb1\xc3\xa8\xc3\xb9\xc2\xa2 \xc2\xaa\xc3\xb0\xc2\xbc\xc3\xb1\xc2\xa3\xc3\xb7\xc2\xa2 \xc3\xaf\xc2\xa4\xc3\xb4\xc3\xb1\xc2\xa2,\xc2\xab\xc3\xb1\xc3\xb8\xc2\xa2\xc3\xa8\xc2\xa4\xc3\xb4\xc2\xa2 - \xc2\xaa\xc3\xb0\xc2\xbc\xc3\xb1\xc2\xa3\xc3\xb7\xc2\xa2 \xc3\xb1\xc3\xa8\xc3\xb9\xc2\xa2 \xc3\x9d\xc3\x81\xc2\xba\xc3\xa8\xc3\xb1\xc2\xa2 \xc3\xaf\xc2\xa4\xc3\xb4\xc3\xae\xc2\xa2\xc2\xb6\xc3\xa8\xc2\xa2\xc2\xb0\xc3\xb1\xc2\xa2 \xc3\xb1\xc2\xa4\xc3\xac\xc2\xa2\xc3\xac\xc2\xa3 \xc3\x9c\xc3\xb2\xc2\xa2\xc3\xb2\xc3\xb1\xc2\xa2 \xc2\xaa\xc3\xb0\xc2\xbc\xc3\xb1\xc2\xa3\xc3\xb1\xc2\xa2\xc3\xb0\xc3\xac\xc2\xa2\xc2\xae \xc3\xa8\xc2\xa4\xc3\xb3\xc2\xa3\xc3\xb1 \xc3\xa2\xc3\xb4\xc2\xa2\xc2\xac\xc3\xb4\xc3\xa8\xc2\xa2\xc2\xb0\xc3\xb1\xc2\xa2,\xc3\xb5\xc3\xac\xc3\xa8\xc2\xa2\xc3\xa8\xc2\xa4\xc3\xb4\xc2\xa2 - \xc3\x9d\xc3\x81\xc2\xba\xc3\xa8\xc3\xb1\xc2\xa2 \xc3\xaf\xc2\xa4\xc3\xb4\xc3\xb1\xc2\xa2,\xc3\xa8\xc2\xa4\xc3\xb6\xc3\xa8\xc2\xa2\xc3\xa8\xc2\xa4\xc3\xb4\xc2\xa2 - \xc3\xb4\xc3\xac\xc2\xa2\xc2\xb2\xc3\xb1\xc2\xa4 \xc3\xaf\xc2\xa4\xc3\xb4\xc3\xb1\xc2\xa2 \xc3\xb1\xc3\xb8\xc2\xa2\xc3\x81\xc3\xb1\xc2\xa2 1,22 \xc2\xaa\xc3\xaa \xc3\xaf\xc2\xa4\xc3\xb4\xc3\xb1\xc2\xa2 \xc3\xb0\xc2\xa3\xc3\xae\xc2\xa2\xc3\xae\xc2\xa4\xc3\xb2\xc3\xb1\xc2\xa2";\n
Run Code Online (Sandbox Code Playgroud)\n\n

我有扩展方法来分割字符串

\n\n
public static IEnumerable<string> EnumByLength(this string s, int length)\n{\n    for (int i = 0; i < s.Length; i += length)\n    {\n        if (i + length <= s.Length)\n        {\n            yield return s.Substring(i, length);\n        }\n        else\n        {\n            yield return s.Substring(i);\n        }\n    }\n}\npublic static string[] SplitByLength(this string s, int maxLen)\n{\n    var v = EnumByLength(s, maxLen);\n    if (v == null)\n        return new string[] { s };\n    else\n        return s.EnumByLength(maxLen).ToArray();\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

现在我的问题是

\n\n

要按最大长度拆分此字符串150,并且拆分必须仅通过其中最近的空格来完成..(在之前150或之后150..而不是在单词中间。

\n\n

如何?

\n

Dmi*_*nko 5

我的版本:

// Enumerate by nearest space
// Split String value by closest to length spaces
// e.g. for length = 3 
// "abcd efghihjkl m n p qrstsf" -> "abcd", "efghihjkl", "m n", "p", "qrstsf" 
public static IEnumerable<String> EnumByNearestSpace(this String value, int length) {
  if (String.IsNullOrEmpty(value))
    yield break;

  int bestDelta = int.MaxValue;
  int bestSplit = -1;

  int from = 0;

  for (int i = 0; i < value.Length; ++i) {
    var Ch = value[i];

    if (Ch != ' ')
      continue;

    int size = (i - from);
    int delta = (size - length > 0) ? size - length : length - size;

    if ((bestSplit < 0) || (delta < bestDelta)) {
      bestSplit = i;
      bestDelta = delta;
    }
    else {
      yield return value.Substring(from, bestSplit - from);

      i = bestSplit;

      from = i + 1;
      bestSplit = -1;
      bestDelta = int.MaxValue;
    }
  }

  // String's tail
  if (from < value.Length) {
    if (bestSplit >= 0) {
      if (bestDelta < value.Length - from)
        yield return value.Substring(from, bestSplit - from);

      from = bestSplit + 1;
    }

    if (from < value.Length)
      yield return value.Substring(from);
  }
}

...

var list = data.EnumByNearestSpace(150).ToList();
Run Code Online (Sandbox Code Playgroud)

  • 我发现“String's tail”存在问题,“from = bestSplit + 1;”行应该位于上面的 if 语句块内。示例 `Console.WriteLine(string.Join("#", EnumByNearestSpace("Thank you for shopping with us! We real recognize you!", 40)));` 将导致 `appreciate` 丢失。 (4认同)