我想将一个字符串拆分成一个列表或数组.
输入: green,"yellow,green",white,orange,"blue,black"
拆分字符是逗号(,),但它必须忽略引号内的逗号.
输出应该是:
谢谢.
Fai*_*Dev 12
实际上这很容易使用匹配:
string subjectString = @"green,""yellow,green"",white,orange,""blue,black""";
try
{
Regex regexObj = new Regex(@"(?<="")\b[a-z,]+\b(?="")|[a-z]+", RegexOptions.IgnoreCase);
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success)
{
Console.WriteLine("{0}", matchResults.Value);
// matched text: matchResults.Value
// match start: matchResults.Index
// match length: matchResults.Length
matchResults = matchResults.NextMatch();
}
Run Code Online (Sandbox Code Playgroud)
输出:
green
yellow,green
white
orange
blue,black
Run Code Online (Sandbox Code Playgroud)
说明:
@"
# Match either the regular expression below (attempting the next alternative only if this one fails)
(?<= # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
"" # Match the character “""” literally
)
\b # Assert position at a word boundary
[a-z,] # Match a single character present in the list below
# A character in the range between “a” and “z”
# The character “,”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\b # Assert position at a word boundary
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
"" # Match the character “""” literally
)
| # Or match regular expression number 2 below (the entire match attempt fails if this one fails to match)
[a-z] # Match a single character in the range between “a” and “z”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"
Run Code Online (Sandbox Code Playgroud)
你有什么不规则的语言.换句话说,字符的含义取决于字符之前或之后的字符序列.顾名思义正则表达式用于解析常规语言.
你需要的是Tokenizer和Parser,一个优秀的互联网搜索引擎应该引导你到例子.事实上,因为令牌只是字符,你可能甚至不需要Tokenizer.
虽然你可以使用正则表达式来完成这个简单的情况,但它可能非常慢.如果引号没有平衡,它也可能导致问题,因为正则表达式不会检测到此错误,因为解析器会这样.
如果要导入CSV文件,您可能需要查看解析CSV文件的Microsoft.VisualBasic.FileIO.TextFieldParser类(只需在C#项目中添加对Microsoft.VisualBasic.dll的引用).
另一种方法是编写自己的状态机(例如下面的代码),尽管这仍然无法解决值中间的引用问题:
using System;
using System.Text;
namespace Example
{
class Program
{
static void Main(string[] args)
{
string subjectString = @"green,""yellow,green"",white,orange,""blue,black""";
bool inQuote = false;
StringBuilder currentResult = new StringBuilder();
foreach (char c in subjectString)
{
switch (c)
{
case '\"':
inQuote = !inQuote;
break;
case ',':
if (inQuote)
{
currentResult.Append(c);
}
else
{
Console.WriteLine(currentResult);
currentResult.Clear();
}
break;
default:
currentResult.Append(c);
break;
}
}
if (inQuote)
{
throw new FormatException("Input string does not have balanced Quote Characters");
}
Console.WriteLine(currentResult);
}
}
}
Run Code Online (Sandbox Code Playgroud)