如何加快处理庞大的文本文件?

kit*_*ski 1 php sorting text

我有一个800mb文本文件,其中包含18,990,870行(每行是一条记录),我需要选择某些记录,如果有匹配则将它们写入数据库.

它花了一个时间来完成它们,所以我想知道是否有办法更快地完成它?

我的PHP一次读取一行,如下所示:

    $fp2 = fopen('download/pricing20100714/application_price','r');
if (!$fp2) {echo 'ERROR: Unable to open file.'; exit;}
while (!feof($fp2)) {
$line = stream_get_line($fp2,128,$eoldelimiter); //use 2048 if very long lines
if ($line[0] === '#') continue;  //Skip lines that start with # 
    $field = explode ($delimiter, $line);
list($export_date, $application_id, $retail_price, $currency_code, $storefront_id ) = explode($delimiter, $line);
if ($currency_code == 'USD' and $storefront_id == '143441'){
// does application_id exist? 
$application_id = mysql_real_escape_string($application_id); 
$query = "SELECT * FROM jos_mt_links WHERE link_id='$application_id';"; 
$res = mysql_query($query); 
if (mysql_num_rows($res) > 0 ) { 
 echo $application_id . "application id has price of " . $retail_price . "with currency of " . $currency_code. "\n";
} // end if exists in SQL  
} else 
{
// no, application_id doesn't exist 
}  // end check for currency and storefront
} // end while statement
fclose($fp2);
Run Code Online (Sandbox Code Playgroud)

Stu*_*tLC 8

猜测,性能问题是因为它为每个带有USD和店面的application_id发出查询.

如果空间和IO不是问题,您可能只是盲目地将所有19M记录写入新的临时数据库表,添加索引然后与过滤器匹配?

  • @kitenski:你试过吗?我的意思是,它只有大约10行代码.并且执行19M顺序,异步插入将比19M随机选择更快. (3认同)