从XLSX导出大量数据 - OutOfMemoryException

Gia*_*ori 7 .net openxml xlsx closedxml

我正在接近以Excel OpenXML格式(xlsx)导出大量数据(115.000行x 30列).我正在使用一些像DocumentFormat.OpenXML,ClosedXML,NPOI这样的库.

每次抛出OutOfMemoryException都会抛出,因为内存中表单的表示会导致指数内存增加.

每1000个关闭文档文件(并释放内存),下一次加载会导致内存增加.

是否有更高效的方式在xlsx中导出数据而不占用大量内存?

pet*_*ids 21

OpenXML SDK是适合这项工作的合适工具,但您需要小心使用SAX(Simple API for XML)方法而不是DOM方法.来自SAX的链接维基百科文章:

在DOM作为整体对文档进行操作的情况下,SAX解析器按顺序操作XML文档的每个部分

大大减少了处理大型Excel文件时消耗的内存量.

这里有一篇很好的文章 - http://polymathprogrammer.com/2012/08/06/how-to-properly-use-openxmlwriter-to-write-large-excel-files/

改编自该文章,这是一个输出115k行,包含30列的示例:

public static void LargeExport(string filename)
{
    using (SpreadsheetDocument document = SpreadsheetDocument.Create(filename, SpreadsheetDocumentType.Workbook))
    {
        //this list of attributes will be used when writing a start element
        List<OpenXmlAttribute> attributes;
        OpenXmlWriter writer;

        document.AddWorkbookPart();
        WorksheetPart workSheetPart = document.WorkbookPart.AddNewPart<WorksheetPart>();

        writer = OpenXmlWriter.Create(workSheetPart);            
        writer.WriteStartElement(new Worksheet());
        writer.WriteStartElement(new SheetData());

        for (int rowNum = 1; rowNum <= 115000; ++rowNum)
        {
            //create a new list of attributes
            attributes = new List<OpenXmlAttribute>();
            // add the row index attribute to the list
            attributes.Add(new OpenXmlAttribute("r", null, rowNum.ToString()));

            //write the row start element with the row index attribute
            writer.WriteStartElement(new Row(), attributes);

            for (int columnNum = 1; columnNum <= 30; ++columnNum)
            {
                //reset the list of attributes
                attributes = new List<OpenXmlAttribute>();
                // add data type attribute - in this case inline string (you might want to look at the shared strings table)
                attributes.Add(new OpenXmlAttribute("t", null, "str"));
                //add the cell reference attribute
                attributes.Add(new OpenXmlAttribute("r", "", string.Format("{0}{1}", GetColumnName(columnNum), rowNum)));

                //write the cell start element with the type and reference attributes
                writer.WriteStartElement(new Cell(), attributes);
                //write the cell value
                writer.WriteElement(new CellValue(string.Format("This is Row {0}, Cell {1}", rowNum, columnNum)));

                // write the end cell element
                writer.WriteEndElement();
            }

            // write the end row element
            writer.WriteEndElement();
        }

        // write the end SheetData element
        writer.WriteEndElement();
        // write the end Worksheet element
        writer.WriteEndElement();
        writer.Close();

        writer = OpenXmlWriter.Create(document.WorkbookPart);
        writer.WriteStartElement(new Workbook());
        writer.WriteStartElement(new Sheets());

        writer.WriteElement(new Sheet()
        {
            Name = "Large Sheet",
            SheetId = 1,
            Id = document.WorkbookPart.GetIdOfPart(workSheetPart)
        });

        // End Sheets
        writer.WriteEndElement();
        // End Workbook
        writer.WriteEndElement();

        writer.Close();

        document.Close();
    }
}

//A simple helper to get the column name from the column index. This is not well tested!
private static string GetColumnName(int columnIndex)
{
    int dividend = columnIndex;
    string columnName = String.Empty;
    int modifier;

    while (dividend > 0)
    {
        modifier = (dividend - 1) % 26;
        columnName = Convert.ToChar(65 + modifier).ToString() + columnName;
        dividend = (int)((dividend - modifier) / 26);
    }

    return columnName;
}
Run Code Online (Sandbox Code Playgroud)