如何通过特定行分隔符来读取文本文件?

Use*_*404 32 .net c# file-handling

使用streamreader读取文本文件.

using (StreamReader sr = new StreamReader(FileName, Encoding.Default))
{
     string line = sr.ReadLine();
}
Run Code Online (Sandbox Code Playgroud)

我想迫使该行分隔符应该是\n没有\r.那我该怎么做呢?

Pet*_*ete 33

我会实现类似George的答案,但作为一种扩展方法,可以避免一次加载整个文件(未经测试,但是这样):

static class ExtensionsForTextReader
{
     public static IEnumerable<string> ReadLines (this TextReader reader, char delimiter)
     {
            List<char> chars = new List<char> ();
            while (reader.Peek() >= 0)
            {
                char c = (char)reader.Read ();

                if (c == delimiter) {
                    yield return new String(chars.ToArray());
                    chars.Clear ();
                    continue;
                }

                chars.Add(c);
            }
     }
}
Run Code Online (Sandbox Code Playgroud)

然后可以使用它,如:

using (StreamReader sr = new StreamReader(FileName, Encoding.Default))
{
     foreach (var line in sr.ReadLines ('\n'))
           Console.WriteLine (line);
}
Run Code Online (Sandbox Code Playgroud)

  • 使用扩展方法而不是记忆猪的通用答案+1.:d (3认同)

Geo*_*ton 22

string text = sr.ReadToEnd();
string[] lines = text.Split('\r');
foreach(string s in lines)
{
   // Consume
}
Run Code Online (Sandbox Code Playgroud)

  • 这很简单,但如果文件包含100万行,这可能会结束糟糕:) (32认同)
  • 是的,如果它假设包含10,100,1,000或10,000,则不会.每个答案都有一个假设的缺点.;) (6认同)
  • 是的,我正在添加注释,因为一般情况下如果你正在使用流,那么你一次处理一些字节,这样你就不必将整个文件加载到内存中了(好吧,也许这里的"一般")对我来说是"一般的").我倾向于处理大文件,因此将整个内容加载到内存中可能是个问题. (2认同)
  • @pstrjds我理解,但这实际上取决于他的要求.如果这个解决方案由于内存限制而无效,他可以轻松地添加一些代码来流式传输数据块,并根据需要进行拆分,例如`ReadBlock()`.我会留下它.他不需要接受这个答案,但对于可能不会面临同样限制的其他人可能会有所帮助.:) (2认同)

sov*_*emp 7

我很喜欢@Pete给出的答案.我只想提一点修改.这将允许您传递字符串分隔符而不是仅传递单个字符:

using System;
using System.IO;
using System.Collections.Generic;
internal static class StreamReaderExtensions
{
    public static IEnumerable<string> ReadUntil(this StreamReader reader, string delimiter)
    {
        List<char> buffer = new List<char>();
        CircularBuffer<char> delim_buffer = new CircularBuffer<char>(delimiter.Length);
        while (reader.Peek() >= 0)
        {
            char c = (char)reader.Read();
            delim_buffer.Enqueue(c);
            if (delim_buffer.ToString() == delimiter || reader.EndOfStream)
            {
                if (buffer.Count > 0)
                {
                    if (!reader.EndOfStream)
                    {
                        yield return new String(buffer.ToArray()).Replace(delimiter.Substring(0, delimiter.Length - 1), string.Empty);
                    }
                    else
                    {
                        buffer.Add(c);
                        yield return new String(buffer.ToArray());
                    }
                    buffer.Clear();
                }
                continue;
            }
            buffer.Add(c);
        }
    }

    private class CircularBuffer<T> : Queue<T>
    {
        private int _capacity;

        public CircularBuffer(int capacity)
            : base(capacity)
        {
            _capacity = capacity;
        }

        new public void Enqueue(T item)
        {
            if (base.Count == _capacity)
            {
                base.Dequeue();
            }
            base.Enqueue(item);
        }

        public override string ToString()
        {
            List<String> items = new List<string>();
            foreach (var x in this)
            {
                items.Add(x.ToString());
            };
            return String.Join("", items);
        }
    }
}
Run Code Online (Sandbox Code Playgroud)


Mar*_*tin 5

根据文件:

http://msdn.microsoft.com/en-us/library/system.io.streamreader.readline.aspx

一行被定义为一个字符序列,后跟一个换行符("\n"),一个回车符("\ r"),或一个回车符后面紧跟一个换行符("\ r \n").

默认情况下,StreamReader ReadLine方法将通过/ n或\ r来识别一条线


小智 5

这是sovemp答案的改进。抱歉,我想发表评论,尽管我的声誉不允许我这样做。此改进解决了两个问题:

  1. 带有定界符“ \ r \ n”的示例序列“ text \ rtest \ r \ n”也将删除第一个非预期的“ \ r”。
  2. 当流中的最后一个字符等于定界符时,函数将错误地返回包含定界符的字符串。

    using System;
    using System.IO;
    using System.Collections.Generic;
    internal static class StreamReaderExtensions
    {
        public static IEnumerable<string> ReadUntil(this StreamReader reader, string delimiter)
        {
            List<char> buffer = new List<char>();
            CircularBuffer<char> delim_buffer = new CircularBuffer<char>(delimiter.Length);
            while (reader.Peek() >= 0)
            {
                char c = (char)reader.Read();
                delim_buffer.Enqueue(c);
                if (delim_buffer.ToString() == delimiter || reader.EndOfStream)
                {
                    if (buffer.Count > 0)
                    {
                        if (!reader.EndOfStream)
                        {
                            buffer.Add(c);
                            yield return new String(buffer.ToArray()).Substring(0, buffer.Count - delimeter.Length);
                        }
                        else
                        {
                            buffer.Add(c);
                            if (delim_buffer.ToString() != delimiter)
                                yield return new String(buffer.ToArray());
                            else
                                yield return new String(buffer.ToArray()).Substring(0, buffer.Count - delimeter.Length);
                        }
                        buffer.Clear();
                    }
                    continue;
                }
                buffer.Add(c);
            }
        }
    
        private class CircularBuffer<T> : Queue<T>
        {
            private int _capacity;
    
            public CircularBuffer(int capacity)
                : base(capacity)
            {
                _capacity = capacity;
            }
    
            new public void Enqueue(T item)
            {
                if (base.Count == _capacity)
                {
                    base.Dequeue();
                }
                base.Enqueue(item);
            }
    
            public override string ToString()
            {
                List<String> items = new List<string>();
                foreach (var x in this)
                {
                    items.Add(x.ToString());
                };
                return String.Join("", items);
            }
        }
    }
    
    Run Code Online (Sandbox Code Playgroud)


小智 5

我需要一个读取到“\r\n”并且不会在“\n”处停止的解决方案。jp1980 的解决方案有效,但处理大文件时速度极慢。因此,我将 Mike Sackton 的解决方案转换为 read,直到找到指定的字符串。

public static string ReadLine(this StreamReader sr, string lineDelimiter)
    {
        StringBuilder line = new StringBuilder();
        var matchIndex = 0;

        while (sr.Peek() > 0)
        {
            var nextChar = (char)sr.Read();
            line.Append(nextChar);

            if (nextChar == lineDelimiter[matchIndex])
            {
                if (matchIndex == lineDelimiter.Length - 1)
                {
                    return line.ToString().Substring(0, line.Length - lineDelimiter.Length);
                }
                matchIndex++;
            }
            else
            {
                matchIndex = 0;
                //did we mistake one of the characters as the delimiter? If so let's restart our search with this character...
                if (nextChar == lineDelimiter[matchIndex])
                {
                    if (matchIndex == lineDelimiter.Length - 1)
                    {
                        return line.ToString().Substring(0, line.Length - lineDelimiter.Length);
                    }
                    matchIndex++;
                }
            }
        }

        return line.Length == 0
            ? null
            : line.ToString();
    }
Run Code Online (Sandbox Code Playgroud)

而且是这样叫的……

using (StreamReader reader = new StreamReader(file))
{
    string line;
    while((line = reader.ReadLine("\r\n")) != null)
    {
        Console.WriteLine(line);
    }
}
Run Code Online (Sandbox Code Playgroud)

  • 完美的。使用自定义行分隔符,如 Environment.NewLine + "go" + Environment.NewLine; (2认同)