通过Apache POI读取大型Excel文件(xlsx)时出错

jam*_*esT 7 java out-of-memory xlsx apache-poi

我试图通过Apache POI读取大型excel文件xlsx,比如40-50 MB.我失去了内存异常.当前堆内存为3GB.

我可以毫无问题地阅读较小的excel文件.我需要一种方法来读取大型excel文件,然后通过Spring excel视图将它们作为响应返回.

public class FetchExcel extends AbstractView {


    @Override
    protected void renderMergedOutputModel(
            Map model, HttpServletRequest request, HttpServletResponse response) 
    throws Exception {

    String fileName = "SomeExcel.xlsx";

    response.setContentType("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");

    OPCPackage pkg = OPCPackage.open("/someDir/SomeExcel.xlsx");

    XSSFWorkbook workbook = new XSSFWorkbook(pkg);

    ServletOutputStream respOut = response.getOutputStream();

    pkg.close();
    workbook.write(respOut);
    respOut.flush();

    workbook = null;                    

    response.setHeader("Content-disposition", "attachment;filename=\"" +fileName+ "\"");


    }    

}
Run Code Online (Sandbox Code Playgroud)

我第一次开始使用,XSSFWorkbook workbook = new XSSFWorkbook(FileInputStream in); 但每个Apache POI API的成本很高,所以我切换到OPC包的方式,但仍然是相同的效果.我不需要解析或处理文件,只需读取并返回即可.

O.C*_*.C. 15

Here is an example to read a large xls file using sax parser.

public void parseExcel(File file) throws IOException {

        OPCPackage container;
        try {
            container = OPCPackage.open(file.getAbsolutePath());
            ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(container);
            XSSFReader xssfReader = new XSSFReader(container);
            StylesTable styles = xssfReader.getStylesTable();
            XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
            while (iter.hasNext()) {
                InputStream stream = iter.next();

                processSheet(styles, strings, stream);
                stream.close();
            }
        } catch (InvalidFormatException e) {
            e.printStackTrace();
        } catch (SAXException e) {
            e.printStackTrace();
        } catch (OpenXML4JException e) {
            e.printStackTrace();
        }

}

protected void processSheet(StylesTable styles, ReadOnlySharedStringsTable strings, InputStream sheetInputStream) throws IOException, SAXException {

        InputSource sheetSource = new InputSource(sheetInputStream);
        SAXParserFactory saxFactory = SAXParserFactory.newInstance();
        try {
            SAXParser saxParser = saxFactory.newSAXParser();
            XMLReader sheetParser = saxParser.getXMLReader();
            ContentHandler handler = new XSSFSheetXMLHandler(styles, strings, new SheetContentsHandler() {

            @Override
                public void startRow(int rowNum) {
                }
                @Override
                public void endRow() {
                }
                @Override
                public void cell(String cellReference, String formattedValue) {
                }
                @Override
                public void headerFooter(String text, boolean isHeader, String tagName) {

                }

            }, 
            false//means result instead of formula
            );
            sheetParser.setContentHandler(handler);
            sheetParser.parse(sheetSource);
        } catch (ParserConfigurationException e) {
            throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
}
Run Code Online (Sandbox Code Playgroud)


Gre*_*eek 5

您没有提及是否需要修改电子表格.

这可能是显而易见的,但如果您不需要修改电子表格,那么您不需要解析它并将其写回来,您只需从文件中读取字节,然后写出字节,就像您一样,说图像,或任何其他二进制格式.

如果您确实需要在将电子表格发送给用户之前对其进行修改,那么据我所知,您可能需要采取不同的方法.

我知道用Java读取Excel文件的每个库都将整个电子表格读入内存,因此每个可能同时处理的电子表格必须有50MB的内存.正如其他人所指出的,这涉及调整VM可用的堆.

如果您需要同时处理大量电子表格,并且无法分配足够的内存,请考虑使用可以流式传输的格式,而不是一次性读取到内存中.可以通过Excel打开CSV格式,过去我通过将content-type设置为application/vnd.ms-excel,将附件文件名设置为以".xls"结尾的内容,但实际上返回CSV,我已经取得了很好的效果内容.我没有在几年内尝试过这个,所以YMMV.