Ana*_*lia 4 c# split string-parsing
我有string以下格式。
string instance = "{112,This is the first day 23/12/2009},{132,This is the second day 24/12/2009}"
private void parsestring(string input)
{
string[] tokens = input.Split(','); // I thought this would split on the , seperating the {}
foreach (string item in tokens) // but that doesn't seem to be what it is doing
{
Console.WriteLine(item);
}
}
Run Code Online (Sandbox Code Playgroud)
我期望的输出应如下所示:
112,This is the first day 23/12/2009
132,This is the second day 24/12/2009
Run Code Online (Sandbox Code Playgroud)
但是目前,我得到以下内容:
{112
This is the first day 23/12/2009
{132
This is the second day 24/12/2009
Run Code Online (Sandbox Code Playgroud)
我是C#的新手,我们将不胜感激。
Don't fixate on Split() being the solution! This is a simple thing to parse without it. Regex answers are probably also OK, but I imagine in terms of raw efficiency making "a parser" would do the trick.
IEnumerable<string> Parse(string input)
{
var results = new List<string>();
int startIndex = 0;
int currentIndex = 0;
while (currentIndex < input.Length)
{
var currentChar = input[currentIndex];
if (currentChar == '{')
{
startIndex = currentIndex + 1;
}
else if (currentChar == '}')
{
int endIndex = currentIndex - 1;
int length = endIndex - startIndex + 1;
results.Add(input.Substring(startIndex, length));
}
currentIndex++;
}
return results;
}
Run Code Online (Sandbox Code Playgroud)
So it's not short on lines. It iterates once, and only performs one allocation per "result". With a little tweaking I could probably make a C#8 version with Index types that cuts on allocations? This is probably good enough.
You could spend a whole day figuring out how to understand the regex, but this is as simple as it comes:
{, note the next character is the start of a result.}, consider everything from the last noted "start" until the index before this character as "a result".This won't catch mismatched brackets and could throw exceptions for strings like "}}{". You didn't ask for handling those cases, but it's not too hard to improve this logic to catch it and scream about it or recover.
For example, you could reset startIndex to something like -1 when } is found. From there, you can deduce if you find { when startIndex != -1 you've found "{{". And you can deduce if you find } when startIndex == -1, you've found "}}". And if you exit the loop with startIndex < -1, that's an opening { with no closing }. that leaves the string "}whoops" as an uncovered case, but it could be handled by initializing startIndex to, say, -2 and checking for that specifically. Do that with a regex, and you'll have a headache.
The main reason I suggest this is you said "efficiently". icepickle's solution is nice, but Split() makes one allocation per token, then you perform allocations for each TrimX() call. That's not "efficient". That's "n + 2 allocations".
使用Regex此:
string[] tokens = Regex.Split(input, @"}\s*,\s*{")
.Select(i => i.Replace("{", "").Replace("}", ""))
.ToArray();
Run Code Online (Sandbox Code Playgroud)
模式说明:
\s* -匹配零个或多个空格字符
好吧,如果您有一个称为的方法,则ParseString它返回某件事是一件好事(可以说它确实ParseTokens不是一件坏事)。因此,如果执行此操作,则可以转到以下代码
private static IEnumerable<string> ParseTokens(string input)
{
return input
// removes the leading {
.TrimStart('{')
// removes the trailing }
.TrimEnd('}')
// splits on the different token in the middle
.Split( new string[] { "},{" }, StringSplitOptions.None );
}
Run Code Online (Sandbox Code Playgroud)
之所以以前对您不起作用,是因为您对split方法的工作方式的理解是错误的,它将有效地拆分,您的示例中的所有内容。
现在,如果将所有这些放在一起,就会在dotnetfiddle中得到类似的结果
using System;
using System.Collections.Generic;
public class Program
{
private static IEnumerable<string> ParseTokens(string input)
{
return input
// removes the leading {
.TrimStart('{')
// removes the trailing }
.TrimEnd('}')
// splits on the different token in the middle
.Split( new string[] { "},{" }, StringSplitOptions.None );
}
public static void Main()
{
var instance = "{112,This is the first day 23/12/2009},{132,This is the second day 24/12/2009}";
foreach (var item in ParseTokens( instance ) ) {
Console.WriteLine( item );
}
}
}
Run Code Online (Sandbox Code Playgroud)