我有1,300,000条记录.每条记录本身就是一个数组.我读取了数组的每个记录,并将该记录的每个桶插入excel表格的一行单元格中,最后,我将那个excell表格写入excel文件.写完100k的记录后,它变得越来越慢,然后在最后打破.我使用POI apache来做这个,这是我的代码,我不确定是什么原因导致写入过程减慢了很多.任何提示?
try {
//save to excel file
FileOutputStream out = new FileOutputStream(new File(path));
XSSFWorkbook resultWorkBook = new XSSFWorkbook();
XSSFSheet sheet = resultWorkBook.createSheet("Comparison_result");
int sizeOfOriginalTermMain = 0;
int sizeOfOriginalTermMatch = 0;
//blue cell style
CellStyle blueStyle = resultWorkBook.createCellStyle();
XSSFFont cellFont = resultWorkBook.createFont();
cellFont.setColor(IndexedColors.BLUE.getIndex());
blueStyle.setFont(cellFont);
//yellow bg cell style
CellStyle GreenStyle = resultWorkBook.createCellStyle();
GreenStyle.setFillBackgroundColor(IndexedColors.GREEN.getIndex());
//create heading
Row heading = sheet.createRow(0);
heading.createCell(0).setCellValue("Main List ID");
heading.createCell(1).setCellValue("Match number > 0");
heading.createCell(2).setCellValue("Found Match ID");
heading.createCell(3).setCellValue("Source list: 2");
heading.createCell(4).setCellValue("Matched Trems");
for(int i=0; i<5;i++) {
CellStyle styleRowHeading = resultWorkBook.createCellStyle();
XSSFFont font = resultWorkBook.createFont();
font.setBold(true);
font.setFontName(XSSFFont.DEFAULT_FONT_NAME);
font.setFontHeightInPoints((short)11);
styleRowHeading.setFont(font);
heading.getCell(i).setCellStyle(styleRowHeading);
}
ArrayList<Object> currentList = new ArrayList<Object>();
RecordId mainRecordId = new RecordId();
String mainRecordIdValue = "";
LinkedHashSet<String> commonStrings = new LinkedHashSet<String>();
int numberOfMatch=0;
RecordId matchRecordId = new RecordId();
String matchRecordIdValue = "";
int size = processResult.size();
int matchRecordIdListNumber = 0;
String concatenatedMatchTerms = "";
ArrayList<String> OrininalTemrsInMainList = new ArrayList<String>();
ArrayList<String> OrininalTemrsInMatchList = new ArrayList<String>();
//adding value to each row of the excel sheet
int q= 0;
for (int i = 0; i < size; i++) {
currentList = processResult.get(i);
Row row = sheet.createRow(i+1);
//object ppmsID column
Cell mainIdCell = row.createCell(0);
mainRecordId = (RecordId)(currentList.get(0));
mainRecordIdValue = mainRecordId.getIdValue();
mainIdCell.setCellValue(mainRecordIdValue);
mainIdCell.setCellStyle(blueStyle);
//productDB column
Cell matchNumberCell = row.createCell(1);
commonStrings = (LinkedHashSet<String>)(currentList.get(2));
numberOfMatch = commonStrings.size();
matchNumberCell.setCellValue(Integer.toString(numberOfMatch));
//match record Id
Cell matchIdCell = row.createCell(2);
matchRecordId = (RecordId)(currentList.get(1));
matchRecordIdValue = matchRecordId.getIdValue();
matchRecordIdListNumber = matchRecordId.getListNumber();
matchIdCell.setCellValue(matchRecordIdValue);
Cell sourceListNumber = row.createCell(3);
sourceListNumber.setCellValue(Integer.toString(matchRecordIdListNumber));
//terms of match
Cell matchTerms = row.createCell(4);
concatenatedMatchTerms = getConcatenatedStringFromList(commonStrings);
matchTerms.setCellValue(concatenatedMatchTerms);
OrininalTemrsInMainList = (ArrayList<String>) currentList.get(3);
sizeOfOriginalTermMain = OrininalTemrsInMainList.size();
OrininalTemrsInMatchList = (ArrayList<String>) currentList.get(4);
sizeOfOriginalTermMatch = OrininalTemrsInMatchList.size();
for (int k = 0; k<sizeOfOriginalTermMain;k++) {
Cell newCell = row.createCell(5+k);
newCell.setCellValue(OrininalTemrsInMainList.get(k));
newCell.setCellStyle(blueStyle);
}
Cell emptyCell = row.createCell(5+sizeOfOriginalTermMain);
emptyCell.setCellValue("emptyCell");
emptyCell.setCellStyle(GreenStyle);
for (int n = 0; n<OrininalTemrsInMatchList.size();n++) {
Cell newCell = row.createCell(5+sizeOfOriginalTermMain+1+n);
newCell.setCellValue(OrininalTemrsInMatchList.get(n));
}
}
resultWorkBook.write(out);
out.close();
resultWorkBook.close();
}catch(Exception e) {
System.out.println(e.getMessage());
}
Run Code Online (Sandbox Code Playgroud)
不要XSSF用于创建包含这么多单元格的电子表格.
XSSF依赖于消耗大量内存的对象.
而是使用SXSSF它是Streaming Usermodel API.
SXSSF(包:org.apache.poi.xssf.streaming)是XSSF的API兼容流式扩展,用于在必须生成非常大的电子表格时使用,并且堆空间有限.SXSSF通过限制对滑动窗口内行的访问来实现其低内存占用,而XSSF允许访问文档中的所有行.不再在窗口中的旧行变得不可访问,因为它们被写入磁盘.
更新其使用代码XSSF使用SXSSF是相当一块蛋糕.
两件重要的事情:
窗口大小(内存中可访问的行数):使用默认值或在适当时明确配置它
您可以通过新的SXSSFWorkbook(int windowSize)在工作簿构造时指定窗口大小,也可以通过SXSSFSheet#setRandomAccessWindowSize(int windowSize)在每个工作表中设置它
当通过createRow()创建新行并且未刷新记录的总数超过指定的窗口大小时,将刷新具有最低索引值的行,并且不能再通过getRow()访问该行.
默认窗口大小为100,由SXSSFWorkbook.DEFAULT_WINDOW_SIZE定义.
清理要求
SXSSF通过调用dispose方法分配必须始终明确清理的临时文件.
它应该被调用:
SXSSFWorkbook.dispose();
Run Code Online (Sandbox Code Playgroud)
所以你应该写一些东西:
SXSSFWorkbook wb = new SXSSFWorkbook(100); // keep 100 rows in memory, exceeding rows will be flushed to disk
// write rows ...
...
// dispose of temporary files backing this workbook on disk
wb.dispose();
Run Code Online (Sandbox Code Playgroud)
关于SXSSF限制:
由于实现的流式特性,与XSSF相比存在以下限制:
在某个时间点只能访问有限数量的行.
不支持Sheet.clone().
不支持公式评估
关于您损坏的文件:
根据官方SXSSF限制,如果您不依赖于公式评估,则损坏的excel文件的原因可能与SXSSF 模型无关.
在尝试任何操作之前,您可以更新到上一个稳定的POI版本.
然后,很难给出具体的指示,但作为一般建议,隔离事物以试图理解究竟发生了什么.
您可以从减少生成的行数开始,只处理一些特定的cols来查看是否能解决问题.
如果它不起作用,您还可以使用默认样式进行测试.
| 归档时间: |
|
| 查看次数: |
2185 次 |
| 最近记录: |