解析大型csv文件时,FileHelpers会抛出OutOfMemoryException

Bow*_*opa 4 c# csv filehelpers

我正在尝试使用FileHelpers(http://www.filehelpers.net/)解析一个非常大的csv文件.该文件为1GB压缩文件,解压缩约20GB.

        string fileName = @"c:\myfile.csv.gz";
        using (var fileStream = File.OpenRead(fileName))
        {
            using (GZipStream gzipStream = new GZipStream(fileStream, CompressionMode.Decompress, false))
            {
                using (TextReader textReader = new StreamReader(gzipStream))
                {
                    var engine = new FileHelperEngine<CSVItem>();
                    CSVItem[] items = engine.ReadStream(textReader);                        
                }
            }
        }
Run Code Online (Sandbox Code Playgroud)

FileHelpers然后抛出OutOfMemoryException.

测试失败:抛出了类型'System.OutOfMemoryException'的异常.System.OutOfMemoryException:抛出了类型'System.OutOfMemoryException'的异常.System.Text.StringBuilder.ExpandByABlock(Int32 minBlockCharCount)在System.Text.StringBuilder.Append(Char值,Int32 repeatCount)的System.Text.StringBuilder.Append(Char值)处于FileHelpers.StringHelper.ExtractQuotedString(LineInfo line,Char在FileHelpers.FileHelperEngine的FileHelpers.RecordInfo.StringToRecord(LineInfo行)的FileHelpers.FieldBase.ExtractValue(LineInfo行)的FileHelpers.DelimitedField.ExtractFieldString(LineInfo行)处的quoteChar,Boolean allowMultiline)1.ReadStream(TextReader reader, Int32 maxRecords, DataTable dt) at FileHelpers.FileHelperEngine.ReadStream(TextReader reader)

是否可以使用FileHelpers解析这么大的文件?如果没有,任何人都可以推荐一种解析文件的方法吗?谢谢.

Mar*_*eli 9

您必须以这种方式按记录工作:

  string fileName = @"c:\myfile.csv.gz";
  using (var fileStream = File.OpenRead(fileName))
  {
      using (GZipStream gzipStream = new GZipStream(fileStream, CompressionMode.Decompress, false))
      {
          using (TextReader textReader = new StreamReader(gzipStream))
          {
            var engine = new FileHelperAsyncEngine<CSVItem>();
            using(engine.BeginReadStream(textReader))
            {
                foreach(var record in engine)
                {
                   // Work with each item
                }
            }
          }
      }
  }
Run Code Online (Sandbox Code Playgroud)

如果你使用这个async aproach,你只会使用内存进行一次记录,这样会更快.

  • 谢谢!FileHelperAsyncEngine就是我想要的. (2认同)