我有大量的Excel工作表,我希望能够使用PHPExcel读入MySQL.
我正在使用最近的补丁,它允许您在不打开整个文件的情况下读取工作表.这样我就可以一次阅读一个工作表.
但是,一个Excel文件大27MB.我可以成功阅读第一个工作表,因为它很小,但第二个工作表是如此之大,以至于22:00启动进程的cron作业在上午8:00没有完成,工作表很简单.
是否有任何方法可以逐行读取工作表,例如:
$inputFileType = 'Excel2007';
$inputFileName = 'big_file.xlsx';
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$worksheetNames = $objReader->listWorksheetNames($inputFileName);
foreach ($worksheetNames as $sheetName) {
//BELOW IS "WISH CODE":
foreach($row = 1; $row <=$max_rows; $row+= 100) {
$dataset = $objReader->getWorksheetWithRows($row, $row+100);
save_dataset_to_database($dataset);
}
}
Run Code Online (Sandbox Code Playgroud)
@mark,我使用您发布的代码创建以下示例:
function readRowsFromWorksheet() {
$file_name = htmlentities($_POST['file_name']);
$file_type = htmlentities($_POST['file_type']);
echo 'Read rows from worksheet:<br />';
debug_log('----------start');
$objReader = PHPExcel_IOFactory::createReader($file_type);
$chunkSize = 20;
$chunkFilter = new ChunkReadFilter();
$objReader->setReadFilter($chunkFilter);
for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
$chunkFilter->setRows($startRow, $chunkSize);
$objPHPExcel = $objReader->load('data/' . $file_name);
debug_log('reading chunk starting at row '.$startRow);
$sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, true);
var_dump($sheetData);
echo '<hr />';
}
debug_log('end');
}
Run Code Online (Sandbox Code Playgroud)
如下面的日志文件所示,它在一个小的8K Excel文件上运行正常,但是当我在一个3 MB的 Excel文件上运行时,它永远不会超过第一个块,是否有任何方法可以优化此代码以提高性能,否则看起来它不足以从大型Excel文件中获取块:
2011-01-12 11:07:15: ----------start
2011-01-12 11:07:15: reading chunk starting at row 2
2011-01-12 11:07:15: reading chunk starting at row 22
2011-01-12 11:07:15: reading chunk starting at row 42
2011-01-12 11:07:15: reading chunk starting at row 62
2011-01-12 11:07:15: reading chunk starting at row 82
2011-01-12 11:07:15: reading chunk starting at row 102
2011-01-12 11:07:15: reading chunk starting at row 122
2011-01-12 11:07:15: reading chunk starting at row 142
2011-01-12 11:07:15: reading chunk starting at row 162
2011-01-12 11:07:15: reading chunk starting at row 182
2011-01-12 11:07:15: reading chunk starting at row 202
2011-01-12 11:07:15: reading chunk starting at row 222
2011-01-12 11:07:15: end
2011-01-12 11:07:52: ----------start
2011-01-12 11:08:01: reading chunk starting at row 2
(...at 11:18, CPU usage at 93% still running...)
Run Code Online (Sandbox Code Playgroud)
当我评论出来时:
//$sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, true);
//var_dump($sheetData);
Run Code Online (Sandbox Code Playgroud)
然后它以可接受的速度(大约每秒2行)进行解析,无论如何都要提高性能toArray()
?
2011-01-12 11:40:51: ----------start
2011-01-12 11:40:59: reading chunk starting at row 2
2011-01-12 11:41:07: reading chunk starting at row 22
2011-01-12 11:41:14: reading chunk starting at row 42
2011-01-12 11:41:22: reading chunk starting at row 62
2011-01-12 11:41:29: reading chunk starting at row 82
2011-01-12 11:41:37: reading chunk starting at row 102
2011-01-12 11:41:45: reading chunk starting at row 122
2011-01-12 11:41:52: reading chunk starting at row 142
2011-01-12 11:42:00: reading chunk starting at row 162
2011-01-12 11:42:07: reading chunk starting at row 182
2011-01-12 11:42:15: reading chunk starting at row 202
2011-01-12 11:42:22: reading chunk starting at row 222
2011-01-12 11:42:22: end
Run Code Online (Sandbox Code Playgroud)
这似乎适用,例如,至少在3 MB文件上:
for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ', $startRow, ' to ', ($startRow + $chunkSize - 1), '<br />';
$chunkFilter->setRows($startRow, $chunkSize);
$objPHPExcel = $objReader->load('data/' . $file_name);
debug_log('reading chunk starting at row ' . $startRow);
foreach ($objPHPExcel->getActiveSheet()->getRowIterator() as $row) {
$cellIterator = $row->getCellIterator();
$cellIterator->setIterateOnlyExistingCells(false);
echo '<tr>';
foreach ($cellIterator as $cell) {
if (!is_null($cell)) {
//$value = $cell->getCalculatedValue();
$rawValue = $cell->getValue();
debug_log($rawValue);
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
可以使用"读取过滤器"以"块"的形式读取工作表,但我无法保证效率.
$inputFileType = 'Excel5';
$inputFileName = './sampleData/example2.xls';
/** Define a Read Filter class implementing PHPExcel_Reader_IReadFilter */
class chunkReadFilter implements PHPExcel_Reader_IReadFilter
{
private $_startRow = 0;
private $_endRow = 0;
/** Set the list of rows that we want to read */
public function setRows($startRow, $chunkSize) {
$this->_startRow = $startRow;
$this->_endRow = $startRow + $chunkSize;
}
public function readCell($column, $row, $worksheetName = '') {
// Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow
if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {
return true;
}
return false;
}
}
echo 'Loading file ',pathinfo($inputFileName,PATHINFO_BASENAME),' using IOFactory with a defined reader type of ',$inputFileType,'<br />';
/** Create a new Reader of the type defined in $inputFileType **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
echo '<hr />';
/** Define how many rows we want to read for each "chunk" **/
$chunkSize = 20;
/** Create a new Instance of our Read Filter **/
$chunkFilter = new chunkReadFilter();
/** Tell the Reader that we want to use the Read Filter that we've Instantiated **/
$objReader->setReadFilter($chunkFilter);
/** Loop to read our worksheet in "chunk size" blocks **/
/** $startRow is set to 2 initially because we always read the headings in row #1 **/
for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ',$startRow,' to ',($startRow+$chunkSize-1),'<br />';
/** Tell the Read Filter, the limits on which rows we want to read this iteration **/
$chunkFilter->setRows($startRow,$chunkSize);
/** Load only the rows that match our filter from $inputFileName to a PHPExcel Object **/
$objPHPExcel = $objReader->load($inputFileName);
// Do some processing here
$sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);
var_dump($sheetData);
echo '<br /><br />';
}
Run Code Online (Sandbox Code Playgroud)
请注意,此读取过滤器将始终读取工作表的第一行以及块规则定义的行.
使用读取过滤器时,PHPExcel仍会解析整个文件,但只加载与定义的读取过滤器匹配的单元格,因此它仅使用该数量的单元格所需的内存.但是,它会多次解析文件,每个块一次,因此速度会慢一些.此示例一次读取20行:要逐行读取,只需将$ chunkSize设置为1.
如果您的公式引用不同"块"中的单元格,这也会导致问题,因为数据根本不适用于当前"块"之外的单元格.
目前阅读.xlsx
,.csv
最好.ods
的选择是电子表格阅读器(https://github.com/nuovo/spreadsheet-reader),因为它可以读取文件而无需将其全部加载到内存中。对于.xls
扩展来说,它有局限性,因为它使用 PHPExcel 进行读取。
归档时间: |
|
查看次数: |
49267 次 |
最近记录: |