如何使用正则表达式拆分字符串

hui*_*hui 5 c# regex string

我想将一个字符串拆分成一个列表或数组.

输入: green,"yellow,green",white,orange,"blue,black"

拆分字符是逗号(,),但它必须忽略引号内的逗号.

输出应该是:

  • 绿色
  • 黄绿色
  • 白色
  • 橙子
  • 蓝黑

谢谢.

Fai*_*Dev 12

实际上这很容易使用匹配:

        string subjectString = @"green,""yellow,green"",white,orange,""blue,black""";
        try
        {
            Regex regexObj = new Regex(@"(?<="")\b[a-z,]+\b(?="")|[a-z]+", RegexOptions.IgnoreCase);
            Match matchResults = regexObj.Match(subjectString);
            while (matchResults.Success)
            {
                Console.WriteLine("{0}", matchResults.Value);
                // matched text: matchResults.Value
                // match start: matchResults.Index
                // match length: matchResults.Length
                matchResults = matchResults.NextMatch();
            }
Run Code Online (Sandbox Code Playgroud)

输出:

green
yellow,green
white
orange
blue,black
Run Code Online (Sandbox Code Playgroud)

说明:

@"
             # Match either the regular expression below (attempting the next alternative only if this one fails)
   (?<=         # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
      ""            # Match the character “""” literally
   )
   \b           # Assert position at a word boundary
   [a-z,]       # Match a single character present in the list below
                   # A character in the range between “a” and “z”
                   # The character “,”
      +            # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   \b           # Assert position at a word boundary
   (?=          # Assert that the regex below can be matched, starting at this position (positive lookahead)
      ""            # Match the character “""” literally
   )
|            # Or match regular expression number 2 below (the entire match attempt fails if this one fails to match)
   [a-z]        # Match a single character in the range between “a” and “z”
      +            # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"
Run Code Online (Sandbox Code Playgroud)

  • 哈哈,第一个接受的答案被否决了。有这个徽章吗?:D (2认同)

Mar*_*own 5

你有什么不规则的语言.换句话说,字符的含义取决于字符之前或之后的字符序列.顾名思义正则表达式用于解析常规语言.

你需要的是TokenizerParser,一个优秀的互联网搜索引擎应该引导你到例子.事实上,因为令牌只是字符,你可能甚至不需要Tokenizer.

虽然你可以使用正则表达式来完成这个简单的情况,但它可能非常慢.如果引号没有平衡,它也可能导致问题,因为正则表达式不会检测到此错误,因为解析器会这样.

如果要导入CSV文件,您可能需要查看解析CSV文件的Microsoft.VisualBasic.FileIO.TextFieldParser类(只需在C#项目中添加对Microsoft.VisualBasic.dll的引用).

另一种方法是编写自己的状态机(例如下面的代码),尽管这仍然无法解决值中间的引用问题:

using System;
using System.Text;

namespace Example
{
    class Program
    {
        static void Main(string[] args)
        {
            string subjectString = @"green,""yellow,green"",white,orange,""blue,black""";

            bool inQuote = false;
            StringBuilder currentResult = new StringBuilder();
            foreach (char c in subjectString)
            {
                switch (c)
                {
                    case '\"':
                        inQuote = !inQuote;
                        break;

                    case ',':
                        if (inQuote)
                        {
                            currentResult.Append(c);
                        }
                        else
                        {
                            Console.WriteLine(currentResult);
                            currentResult.Clear();
                        }
                        break;

                    default:
                        currentResult.Append(c);
                        break;
                }
            }
            if (inQuote)
            {
                throw new FormatException("Input string does not have balanced Quote Characters");
            }
            Console.WriteLine(currentResult);
        }
    }
}
Run Code Online (Sandbox Code Playgroud)