如何从字符串[]中删除无字母字符?

Hai*_*shi 4 c# regex

这是代码:

StringBuilder sb = new StringBuilder();
Regex rgx = new Regex("[^a-zA-Z0-9 -]");

var words = Regex.Split(textBox1.Text, @"(?=(?<=[^\s])\s+\w)");
for (int i = 0; i < words.Length; i++)
{
    words[i] = rgx.Replace(words[i], "");
}
Run Code Online (Sandbox Code Playgroud)

当我正在做的Regex.Split()话包含内部字符串的字符串exmaple:

Daniel>

要么

Hello:

要么

\r\nNew

要么

hello---------------------------

我只需要得到没有所有标志的单词

所以我试图使用这个循环,但我结束说,有很多地方"" 和一些地方只有------------------------

我不能在我的代码中使用它作为字符串.

key*_*rdP 10

您不需要正则表达式来清除非字母.这将删除所有非unicode字母.

public string RemoveNonUnicodeLetters(string input)
{
    StringBuilder sb = new StringBuilder();
    foreach(char c in input)
    {
        if(Char.IsLetter(c))
           sb.Append(c);
    }

    return sb.ToString();
}
Run Code Online (Sandbox Code Playgroud)

或者,如果您只想允许拉丁字母,您可以使用它

public string RemoveNonLatinLetters(string input)
{
    StringBuilder sb = new StringBuilder();
    foreach(char c in input)
    {
        if(c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')
           sb.Append(c);
    }

    return sb.ToString();
}
Run Code Online (Sandbox Code Playgroud)

基准与正则表达

public static string RemoveNonUnicodeLetters(string input)
{
       StringBuilder sb = new StringBuilder();
       foreach (char c in input)
       {
            if (Char.IsLetter(c))
                sb.Append(c);
       }

            return sb.ToString();
}



static readonly Regex nonUnicodeRx = new Regex("\\P{L}");

public static string RemoveNonUnicodeLetters2(string input)
{
     return nonUnicodeRx.Replace(input, "");
}


static void Main(string[] args)
{

    Stopwatch sw = new Stopwatch();

    StringBuilder sb = new StringBuilder();


    //generate guids as input
    for (int j = 0; j < 1000; j++)
    {
        sb.Append(Guid.NewGuid().ToString());
    }

    string input = sb.ToString();

    sw.Start();

    for (int i = 0; i < 1000; i++)
    {
        RemoveNonUnicodeLetters(input);
    }

    sw.Stop();
    Console.WriteLine("SM: " + sw.ElapsedMilliseconds);

    sw.Restart();
    for (int i = 0; i < 1000; i++)
    {
        RemoveNonUnicodeLetters2(input);
    }

    sw.Stop();
    Console.WriteLine("RX: " + sw.ElapsedMilliseconds);


}
Run Code Online (Sandbox Code Playgroud)

输出(SM =字符串操作,RX =正则表达式)

SM: 581
RX: 9882

SM: 545
RX: 9557

SM: 664
RX: 10196
Run Code Online (Sandbox Code Playgroud)

  • 我打赌当谈到描述时,人们更喜欢方法名而不是正则表达式.正则表达式,恕我直言,适用于字符串解析太复杂或冗长的情况. (3认同)
  • 我和你在一起.正则表达式很棒,但清晰的代码更好. (3认同)