我想使用.Net Regex.Split方法将此输入字符串拆分为数组.它必须在空格上拆分,除非它包含在引号中.
输入:这是"我的字符串"它有"六个匹配"
预期产量:
我需要什么样的模式?我还需要指定任何RegexOptions吗?
Bar*_*bat 63
无需选项
正则表达式:
\w+|"[\w\s]*"
Run Code Online (Sandbox Code Playgroud)
C#:
Regex regex = new Regex(@"\w+|""[\w\s]*""");
Run Code Online (Sandbox Code Playgroud)
或者,如果您需要排除"字符:
Regex
.Matches(input, @"(?<match>\w+)|\""(?<match>[\w\s]*)""")
.Cast<Match>()
.Select(m => m.Groups["match"].Value)
.ToList()
.ForEach(s => Console.WriteLine(s));
Run Code Online (Sandbox Code Playgroud)
Tim*_*ers 16
Lieven的解决方案大部分都在那里,正如他在评论中所述,这只是将结局改为Bartek解决方案的问题.最终结果是以下工作regEx:
(?<=")\w[\w\s]*(?=")|\w+|"[\w\s]*"
Run Code Online (Sandbox Code Playgroud)
输入:这是"我的字符串"它有"六个匹配"
输出:
不幸的是,它包括引号.如果您改为使用以下内容:
(("((?<token>.*?)(?<!\\)")|(?<token>[\w]+))(\s)*)
Run Code Online (Sandbox Code Playgroud)
并明确捕获"令牌"匹配,如下所示:
RegexOptions options = RegexOptions.None;
Regex regex = new Regex( @"((""((?<token>.*?)(?<!\\)"")|(?<token>[\w]+))(\s)*)", options );
string input = @" Here is ""my string"" it has "" six matches"" ";
var result = (from Match m in regex.Matches( input )
where m.Groups[ "token" ].Success
select m.Groups[ "token" ].Value).ToList();
for ( int i = 0; i < result.Count(); i++ )
{
Debug.WriteLine( string.Format( "Token[{0}]: '{1}'", i, result[ i ] ) );
}
Run Code Online (Sandbox Code Playgroud)
调试输出:
Token[0]: 'Here'
Token[1]: 'is'
Token[2]: 'my string'
Token[3]: 'it'
Token[4]: 'has'
Token[5]: ' six matches'
Run Code Online (Sandbox Code Playgroud)
最佳答案对我来说并不适用.我试图用空格分割这种字符串,但它看起来像是分裂点('.').
"the lib.lib" "another lib".lib
Run Code Online (Sandbox Code Playgroud)
我知道问题是关于正则表达式,但我最终编写了一个非正则表达式函数来执行此操作:
/// <summary>
/// Splits the string passed in by the delimiters passed in.
/// Quoted sections are not split, and all tokens have whitespace
/// trimmed from the start and end.
public static List<string> split(string stringToSplit, params char[] delimiters)
{
List<string> results = new List<string>();
bool inQuote = false;
StringBuilder currentToken = new StringBuilder();
for (int index = 0; index < stringToSplit.Length; ++index)
{
char currentCharacter = stringToSplit[index];
if (currentCharacter == '"')
{
// When we see a ", we need to decide whether we are
// at the start or send of a quoted section...
inQuote = !inQuote;
}
else if (delimiters.Contains(currentCharacter) && inQuote == false)
{
// We've come to the end of a token, so we find the token,
// trim it and add it to the collection of results...
string result = currentToken.ToString().Trim();
if (result != "") results.Add(result);
// We start a new token...
currentToken = new StringBuilder();
}
else
{
// We've got a 'normal' character, so we add it to
// the curent token...
currentToken.Append(currentCharacter);
}
}
// We've come to the end of the string, so we add the last token...
string lastResult = currentToken.ToString().Trim();
if (lastResult != "") results.Add(lastResult);
return results;
}
Run Code Online (Sandbox Code Playgroud)
我正在使用Bartek Szabat的答案,但我需要在我的代币中捕获的不仅仅是"\ w"字符.为了解决这个问题,我略微修改了他的正则表达式,类似于Grzenio的回答:
Regular Expression: (?<match>[^\s"]+)|(?<match>"[^"]*")
C# String: (?<match>[^\\s\"]+)|(?<match>\"[^\"]*\")
Run Code Online (Sandbox Code Playgroud)
Bartek的代码(返回标记被删除的封闭引号)变为:
Regex
.Matches(input, "(?<match>[^\\s\"]+)|(?<match>\"[^\"]*\")")
.Cast<Match>()
.Select(m => m.Groups["match"].Value)
.ToList()
.ForEach(s => Console.WriteLine(s));
Run Code Online (Sandbox Code Playgroud)
我发现这个答案中的正则表达式非常有用.要使它在C#中工作,您必须使用MatchCollection类.
//need to escape \s
string pattern = "[^\\s\"']+|\"([^\"]*)\"|'([^']*)'";
MatchCollection parsedStrings = Regex.Matches(line, pattern);
for (int i = 0; i < parsedStrings.Count; i++)
{
//print parsed strings
Console.Write(parsedStrings[i].Value + " ");
}
Console.WriteLine();
Run Code Online (Sandbox Code Playgroud)