通过PHPExcel读取.xls文件会引发致命错误:允许内存大小...即使使用块读取器也是如此

Lub*_*Suk 6 php memory-leaks phpexcel

即时通讯使用PHPExcel读取.xls文件.我见面的时间很短

Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 730624 bytes) in Excel\PHPExcel\Shared\OLERead.php on line 93
Run Code Online (Sandbox Code Playgroud)

经过一些谷歌搜索,我尝试chunkReader来防止这种情况(甚至在PHPExcel主页上提到),但我仍然坚持这个错误.

我的想法是,通过大块阅读器,我将逐个阅读文件,我的记忆不会溢出.但是必须有一些严重的记忆漏洞?或者我释放一些记忆力不好?我甚至试图将服务器ram提升到1GB.我试图阅读的文件大小约为700k,这不是那么多(我也读取~20MB pdf,xlsx,docx,doc等文件没有问题).所以我假设我可能会忽略一些小的巨魔.

代码看起来像这样

function parseXLS($fileName){
    require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel/IOFactory.php';
    require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel/ChunkReadFilter.php';

    $inputFileType = 'Excel5';

    /**  Create a new Reader of the type defined in $inputFileType  **/
    $objReader = PHPExcel_IOFactory::createReader($inputFileType);
    /**  Define how many rows we want to read for each "chunk"  **/ 
    $chunkSize = 20;
    /**  Create a new Instance of our Read Filter  **/ 
    $chunkFilter = new chunkReadFilter(); 
    /**  Tell the Reader that we want to use the Read Filter that we've Instantiated  **/ 
    $objReader->setReadFilter($chunkFilter); 

    /**  Loop to read our worksheet in "chunk size" blocks  **/ 
    /**  $startRow is set to 2 initially because we always read the headings in row #1  **/
    for ($startRow = 2; $startRow <= 65536; $startRow += $chunkSize) { 
        /**  Tell the Read Filter, the limits on which rows we want to read this iteration  **/ 
        $chunkFilter->setRows($startRow,$chunkSize); 
        /**  Load only the rows that match our filter from $inputFileName to a PHPExcel Object  **/ 
        $objPHPExcel = $objReader->load($fileName); 
        //    Do some processing here 

        //    Free up some of the memory 
        $objPHPExcel->disconnectWorksheets(); 
        unset($objPHPExcel); 
    }
}
Run Code Online (Sandbox Code Playgroud)

这里是chunkReader的代码

class chunkReadFilter implements PHPExcel_Reader_IReadFilter
{
    private $_startRow = 0;
    private $_endRow = 0;

    /**  Set the list of rows that we want to read  */ 
    public function setRows($startRow, $chunkSize) { 
        $this->_startRow    = $startRow; 
        $this->_endRow      = $startRow + $chunkSize;
    } 

    public function readCell($column, $row, $worksheetName = '') {
        //  Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow 
        if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) { 
           return true;
        }
        return false;
    } 
}
Run Code Online (Sandbox Code Playgroud)

Lub*_*Suk 4

所以我在这里找到了有趣的解决方案How to read large worksheets from large Excel files (27MB+) with PHPExcel?

作为相关附录 3

edit1:同样通过这个解决方案,我遇到了我最喜欢的错误消息,但我发现了一些关于缓存的内容,所以我实现了这个

$cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_phpTemp;
$cacheSettings = array(' memoryCacheSize ' => '8MB');
PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);
Run Code Online (Sandbox Code Playgroud)

最近我只对小于 10MB 的 xls 文件进行了测试,但它似乎可以工作(也是我设置的$objReader->setReadDataOnly(true);),并且看起来足够平衡以实现速度和内存消耗。(如果可能的话,我会更多地走我的荆棘路)

edit2:所以我做了一些进一步的研究,发现块阅读器对我来说是不必要的。(在我看来,内存问题与块读取器和没有它相同。)所以我对问题的最终答案是类似的,它读取 .xls 文件(仅来自单元格的数据,不格式化,甚至过滤掉公式)。当我使用时,cache_tp_php_temp我能够在几秒钟内读取 xls 文件(测试为 10MB)以及大约 10k 行和多列,并且没有内存问题

function parseXLS($fileName){

/** PHPExcel_IOFactory */
    require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel/IOFactory.php';
    require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel/ChunkReadFilter.php';
    require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel.php';

    $inputFileName = $fileName;
    $fileContent = "";

    //get inputFileType (most of time Excel5)
    $inputFileType = PHPExcel_IOFactory::identify($inputFileName);

    //initialize cache, so the phpExcel will not throw memory overflow
    $cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_phpTemp;
    $cacheSettings = array(' memoryCacheSize ' => '8MB');
    PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);

    //initialize object reader by file type
    $objReader = PHPExcel_IOFactory::createReader($inputFileType);

    //read only data (without formating) for memory and time performance
    $objReader->setReadDataOnly(true);

    //load file into PHPExcel object
    $objPHPExcel = $objReader->load($inputFileName);

    //get worksheetIterator, so we can loop sheets in workbook
    $worksheetIterator = $objPHPExcel->getWorksheetIterator();

    //loop all sheets
    foreach ($worksheetIterator as $worksheet) {    

            //use worksheet rowIterator, to get content of each row
            foreach ($worksheet->getRowIterator() as $row) {
                //use cell iterator, to get content of each cell in row
                $cellIterator = $row->getCellIterator();
                //dunno
                $cellIterator->setIterateOnlyExistingCells(false);      

                //iterate each cell
                foreach ($cellIterator as $cell) {
                    //check if cell exists
                    if (!is_null($cell)) {
                        //get raw value (without formating, and all unnecessary trash)
                        $rawValue = $cell->getValue();
                        //if cell isnt empty, print its value
                        if ((trim($rawValue) <> "") and (substr(trim($rawValue),0,1) <> "=")){
                            $fileContent .= $rawValue . " ";                                            
                        }
                    }
                }       
            }       
    }

    return $fileContent;
}
Run Code Online (Sandbox Code Playgroud)