Mat*_*att 103 c# string whitespace
假设我有一个字符串,例如:
"Hello how are you doing?"
Run Code Online (Sandbox Code Playgroud)
我想要一个将多个空格转换为一个空格的函数.
所以我会得到:
"Hello how are you doing?"
Run Code Online (Sandbox Code Playgroud)
我知道我可以使用正则表达式或电话
string s = "Hello how are you doing?".replace(" "," ");
Run Code Online (Sandbox Code Playgroud)
但我必须多次调用它以确保所有连续的空格只用一个替换.
是否已有内置方法?
Tim*_*han 188
string cleanedString = System.Text.RegularExpressions.Regex.Replace(dirtyString,@"\s+"," ");
Run Code Online (Sandbox Code Playgroud)
Jon*_*eet 53
这个问题并不像其他海报那样简单(并且正如我原先认为的那样) - 因为这个问题不是很精确,因为它需要.
"空间"和"空白"之间存在差异.如果你只是指空格,那么你应该使用正则表达式" {2,}".如果你的意思是任何空格,那就是另一回事了.是否应将所有空格转换为空格?在开始和结束时空间应该发生什么?
对于下面的基准测试,我假设您只关心空间,并且您不想对单个空间做任何事情,即使在开始和结束时也是如此.
请注意,正确性几乎总是比性能更重要.Split/Join解决方案删除任何前导/尾随空格(即使只是单个空格)这一事实对于您指定的要求(当然可能不完整)是不正确的.
该基准测试使用MiniBench.
using System;
using System.Text.RegularExpressions;
using MiniBench;
internal class Program
{
public static void Main(string[] args)
{
int size = int.Parse(args[0]);
int gapBetweenExtraSpaces = int.Parse(args[1]);
char[] chars = new char[size];
for (int i=0; i < size/2; i += 2)
{
// Make sure there actually *is* something to do
chars[i*2] = (i % gapBetweenExtraSpaces == 1) ? ' ' : 'x';
chars[i*2 + 1] = ' ';
}
// Just to make sure we don't have a \0 at the end
// for odd sizes
chars[chars.Length-1] = 'y';
string bigString = new string(chars);
// Assume that one form works :)
string normalized = NormalizeWithSplitAndJoin(bigString);
var suite = new TestSuite<string, string>("Normalize")
.Plus(NormalizeWithSplitAndJoin)
.Plus(NormalizeWithRegex)
.RunTests(bigString, normalized);
suite.Display(ResultColumns.All, suite.FindBest());
}
private static readonly Regex MultipleSpaces =
new Regex(@" {2,}", RegexOptions.Compiled);
static string NormalizeWithRegex(string input)
{
return MultipleSpaces.Replace(input, " ");
}
// Guessing as the post doesn't specify what to use
private static readonly char[] Whitespace =
new char[] { ' ' };
static string NormalizeWithSplitAndJoin(string input)
{
string[] split = input.Split
(Whitespace, StringSplitOptions.RemoveEmptyEntries);
return string.Join(" ", split);
}
}
Run Code Online (Sandbox Code Playgroud)
一些测试运行:
c:\Users\Jon\Test>test 1000 50
============ Normalize ============
NormalizeWithSplitAndJoin 1159091 0:30.258 22.93
NormalizeWithRegex 26378882 0:30.025 1.00
c:\Users\Jon\Test>test 1000 5
============ Normalize ============
NormalizeWithSplitAndJoin 947540 0:30.013 1.07
NormalizeWithRegex 1003862 0:29.610 1.00
c:\Users\Jon\Test>test 1000 1001
============ Normalize ============
NormalizeWithSplitAndJoin 1156299 0:29.898 21.99
NormalizeWithRegex 23243802 0:27.335 1.00
Run Code Online (Sandbox Code Playgroud)
这里第一个数字是迭代次数,第二个是所用的时间,第三个是缩放分数,1.0是最好的.
这表明,至少在某些情况下(包括这一个),正则表达式可以胜过Split/Join解决方案,有时候会非常显着.
但是,如果您更改为"所有空白"要求,则Split/Join 确实会获胜.像往常一样,魔鬼在细节......
Jon*_*eet 18
虽然现有的答案都很好,但我想指出一种不起作用的方法:
public static string DontUseThisToCollapseSpaces(string text)
{
while (text.IndexOf(" ") != -1)
{
text = text.Replace(" ", " ");
}
return text;
}
Run Code Online (Sandbox Code Playgroud)
这可以永远循环.有人在乎为什么要猜?(几年前,当我被问为新闻组问题时,我才遇到过这个问题......有人实际上遇到了这个问题.)
Bra*_*don 17
定期表达将是最简单的方式.如果以正确的方式编写正则表达式,则不需要多次调用.
把它改成这个:
string s = System.Text.RegularExpressions.Regex.Replace(s, @"\s{2,}", " ");
Run Code Online (Sandbox Code Playgroud)
这是我使用的解决方案。没有 RegEx 和 String.Split。
public static string TrimWhiteSpace(this string Value)
{
StringBuilder sbOut = new StringBuilder();
if (!string.IsNullOrEmpty(Value))
{
bool IsWhiteSpace = false;
for (int i = 0; i < Value.Length; i++)
{
if (char.IsWhiteSpace(Value[i])) //Comparion with WhiteSpace
{
if (!IsWhiteSpace) //Comparison with previous Char
{
sbOut.Append(Value[i]);
IsWhiteSpace = true;
}
}
else
{
IsWhiteSpace = false;
sbOut.Append(Value[i]);
}
}
}
return sbOut.ToString();
}
Run Code Online (Sandbox Code Playgroud)
这样你就可以:
string cleanedString = dirtyString.TrimWhiteSpace();
Run Code Online (Sandbox Code Playgroud)
Felipe Machado 的快速额外空白去除器。(RW修改为多空格去除)
static string DuplicateWhiteSpaceRemover(string str)
{
var len = str.Length;
var src = str.ToCharArray();
int dstIdx = 0;
bool lastWasWS = false; //Added line
for (int i = 0; i < len; i++)
{
var ch = src[i];
switch (ch)
{
case '\u0020': //SPACE
case '\u00A0': //NO-BREAK SPACE
case '\u1680': //OGHAM SPACE MARK
case '\u2000': // EN QUAD
case '\u2001': //EM QUAD
case '\u2002': //EN SPACE
case '\u2003': //EM SPACE
case '\u2004': //THREE-PER-EM SPACE
case '\u2005': //FOUR-PER-EM SPACE
case '\u2006': //SIX-PER-EM SPACE
case '\u2007': //FIGURE SPACE
case '\u2008': //PUNCTUATION SPACE
case '\u2009': //THIN SPACE
case '\u200A': //HAIR SPACE
case '\u202F': //NARROW NO-BREAK SPACE
case '\u205F': //MEDIUM MATHEMATICAL SPACE
case '\u3000': //IDEOGRAPHIC SPACE
case '\u2028': //LINE SEPARATOR
case '\u2029': //PARAGRAPH SEPARATOR
case '\u0009': //[ASCII Tab]
case '\u000A': //[ASCII Line Feed]
case '\u000B': //[ASCII Vertical Tab]
case '\u000C': //[ASCII Form Feed]
case '\u000D': //[ASCII Carriage Return]
case '\u0085': //NEXT LINE
if (lastWasWS == false) //Added line
{
src[dstIdx++] = ' '; // Updated by Ryan
lastWasWS = true; //Added line
}
continue;
default:
lastWasWS = false; //Added line
src[dstIdx++] = ch;
break;
}
}
return new string(src, 0, dstIdx);
}
Run Code Online (Sandbox Code Playgroud)
基准...
| | Time | TEST 1 | TEST 2 | TEST 3 | TEST 4 | TEST 5 |
| Function Name |(ticks)| dup. spaces | spaces+tabs | spaces+CR/LF| " " -> " " | " " -> " " |
|---------------------------|-------|-------------|-------------|-------------|-------------|-------------|
| SwitchStmtBuildSpaceOnly | 5.2 | PASS | FAIL | FAIL | PASS | PASS |
| InPlaceCharArraySpaceOnly | 5.6 | PASS | FAIL | FAIL | PASS | PASS |
| DuplicateWhiteSpaceRemover| 7.0 | PASS | PASS | PASS | PASS | PASS |
| SingleSpacedTrim | 11.8 | PASS | PASS | PASS | FAIL | FAIL |
| Fubo(StringBuilder) | 13 | PASS | FAIL | FAIL | PASS | PASS |
| User214147 | 19 | PASS | PASS | PASS | FAIL | FAIL |
| RegExWithCompile | 28 | PASS | FAIL | FAIL | PASS | PASS |
| SwitchStmtBuild | 34 | PASS | FAIL | FAIL | PASS | PASS |
| SplitAndJoinOnSpace | 55 | PASS | FAIL | FAIL | FAIL | FAIL |
| RegExNoCompile | 120 | PASS | PASS | PASS | PASS | PASS |
| RegExBrandon | 137 | PASS | FAIL | PASS | PASS | PASS |
Run Code Online (Sandbox Code Playgroud)
基准测试说明:发布模式,未连接调试器,i7 处理器,平均 4 次运行,仅测试短字符串
SwitchStmtBuildSpaceOnly 由Felipe Machado 2015 并由 Sunsetquest 修改
InPlaceCharArraySpaceOnly 由Felipe Machado 2015 并由 Sunsetquest 修改
SwitchStmtBuild 由Felipe Machado 2015 并由 Sunsetquest 修改
SwitchStmtBuild2 由Felipe Machado 2015 并由 Sunsetquest 修改
SingleSpacedTrim 作者:David S 2013
Fubo(StringBuilder) by fubo 2014
《SplitAndJoinOnSpace》作者:Jon Skeet 2009
RegExWithCompile 作者:Jon Skeet 2009
用户214147 通过用户214147
正则表达式布兰登 作者:布兰登
RegExNoCompile 作者:Tim Hoolihan
| 归档时间: |
|
| 查看次数: |
96307 次 |
| 最近记录: |