luq*_*ita 9 php parallel-processing
我有一个函数需要从一个数组中查看大约20K行,并为每个行应用一个外部脚本.这是一个缓慢的过程,因为PHP在继续下一行之前等待脚本执行.
为了使这个过程更快,我正在考虑同时在不同的部分运行该功能.因此,例如,行0到2000作为一个函数,2001到4000在另一个函数上,依此类推.我怎样才能以一种干净的方式做到这一点?我可以创建不同的cron作业,每个函数一个用不同的参数:myFunction(0, 2000),然后另一个cron作业myFunction(2001, 4000),等等,但这似乎不太干净.这样做的好方法是什么?
您唯一的等待时间是获取数据和处理数据.处理数据实际上完全是阻塞的(你只需要等待它).通过将进程数增加到您拥有的核心数量,您不可能获得任何好处.基本上我认为这意味着进程数量很少,因此安排2-8进程的执行听起来并不那么可怕.如果您担心在检索数据时无法处理数据,理论上您可以在数据库中以小块的形式获取数据,然后在几个进程之间分配处理负载,每个进程一个.
我认为我更多地使用forking child processes方法来实际运行处理线程.pcntl_fork doc页面上的注释中有一个精彩的演示,显示了一个job daemon类的实现
http://php.net/manual/en/function.pcntl-fork.php
<?php
declare(ticks=1);
//A very basic job daemon that you can extend to your needs.
class JobDaemon{
public $maxProcesses = 25;
protected $jobsStarted = 0;
protected $currentJobs = array();
protected $signalQueue=array();
protected $parentPID;
public function __construct(){
echo "constructed \n";
$this->parentPID = getmypid();
pcntl_signal(SIGCHLD, array($this, "childSignalHandler"));
}
/**
* Run the Daemon
*/
public function run(){
echo "Running \n";
for($i=0; $i<10000; $i++){
$jobID = rand(0,10000000000000);
while(count($this->currentJobs) >= $this->maxProcesses){
echo "Maximum children allowed, waiting...\n";
sleep(1);
}
$launched = $this->launchJob($jobID);
}
//Wait for child processes to finish before exiting here
while(count($this->currentJobs)){
echo "Waiting for current jobs to finish... \n";
sleep(1);
}
}
/**
* Launch a job from the job queue
*/
protected function launchJob($jobID){
$pid = pcntl_fork();
if($pid == -1){
//Problem launching the job
error_log('Could not launch new job, exiting');
return false;
}
else if ($pid){
// Parent process
// Sometimes you can receive a signal to the childSignalHandler function before this code executes if
// the child script executes quickly enough!
//
$this->currentJobs[$pid] = $jobID;
// In the event that a signal for this pid was caught before we get here, it will be in our signalQueue array
// So let's go ahead and process it now as if we'd just received the signal
if(isset($this->signalQueue[$pid])){
echo "found $pid in the signal queue, processing it now \n";
$this->childSignalHandler(SIGCHLD, $pid, $this->signalQueue[$pid]);
unset($this->signalQueue[$pid]);
}
}
else{
//Forked child, do your deeds....
$exitStatus = 0; //Error code if you need to or whatever
echo "Doing something fun in pid ".getmypid()."\n";
exit($exitStatus);
}
return true;
}
public function childSignalHandler($signo, $pid=null, $status=null){
//If no pid is provided, that means we're getting the signal from the system. Let's figure out
//which child process ended
if(!$pid){
$pid = pcntl_waitpid(-1, $status, WNOHANG);
}
//Make sure we get all of the exited children
while($pid > 0){
if($pid && isset($this->currentJobs[$pid])){
$exitCode = pcntl_wexitstatus($status);
if($exitCode != 0){
echo "$pid exited with status ".$exitCode."\n";
}
unset($this->currentJobs[$pid]);
}
else if($pid){
//Oh no, our job has finished before this parent process could even note that it had been launched!
//Let's make note of it and handle it when the parent process is ready for it
echo "..... Adding $pid to the signal queue ..... \n";
$this->signalQueue[$pid] = $status;
}
$pid = pcntl_waitpid(-1, $status, WNOHANG);
}
return true;
}
}
Run Code Online (Sandbox Code Playgroud)