如何并行运行 perl 脚本并捕获文件中的输出?

dc_*_*erd 2 parallel-processing perl redirect background stdout

我需要并行运行 Perl 测试,并在每个测试文件的单独文件中捕获 STDOUT 和 STDERR。即使在一个文件中捕获,我也没有成功。我一直都这样,但没有运气。这就是我开始的地方(我不会给你带来所有的变化)。任何帮助是极大的赞赏。谢谢!

foreach my $file ( @files) {
    next unless $file =~ /\.t$/;
    print "\$file = $file\n";

    $file =~ /^(\w+)\.\w+/;
    
    my $file_pfx = $1;
    my $this_test_file_name = $file_pfx . '.txt';
    
    system("perl $test_dir\\$file > results\\$test_file_name.txt &") && die "cmd failed: $!\n";

}
Run Code Online (Sandbox Code Playgroud)

zdi*_*dim 5

这是一个使用Parallel::ForkManager生成单独进程的简单示例。

\n

在每个进程中,流STDOUTSTDERR流都被重定向,对于演示来说有两种方式:STDOUT重定向到变量,然后可以根据需要传递该变量(此处转储到文件中),以及STDERR直接重定向到文件。或者使用库,并在单独的代码片段中提供示例。

\n

这些数字1..6代表每个孩子将选择处理的数据批次。仅立即启动三个进程,然后当一个进程完成时,另一个进程将在其位置启动。\xe2\x80\xa0 (在这里他们几乎立即退出,“工作”是微不足道的。)

\n
use warnings;\nuse strict;\nuse feature \'say\';\n\nuse Carp qw(carp)        \nuse Path::Tiny qw(path); \nuse Parallel::ForkManager; \n\nmy $pm = Parallel::ForkManager->new(3); \n\nforeach my $data (1..6) { \n    $pm->start and next;     # start a child process\n    proc_in_child($data);    # code that runs in the child process\n    $pm->finish;             # exit it\n}\n$pm->wait_all_children;      # reap all child processes\n\nsay "\\nParent $$ done\\n";\n    \nsub proc_in_child {\n    my ($data) = @_; \n    say "Process $$ with data $data";  # still shows on terminal\n\n    # Will dump all that was printed to streams to these files\n    my (outfile, $errfile) = \n        map { "proc_data-${data}_" . $_ . ".$$.out" } qw(stdout stderr);\n\n    # Redirect streams\n    # One way to do it, redirect to a variable (for STDOUT)...  \n    open my $fh_stdout, ">", \\my $so or carp "Can\'t open handle to variable: $!";\n    my $fh_STDOUT = select $fh_stdout;\n    # ...another way to do it, directly to a file (for any stream)\n    # (first \'dup\' it so it can be restored if needed)\n    open my $SAVEERR, ">&STDERR"  or carp "Can\'t dup STDERR: $!";\n    open *STDERR, ">", $errfile or carp "Can\'t redirect STDERR to $errfile: $!";\n\n    # Prints wind up in a variable (for STDOUT) and a file (for STDERR)\n    say  "STDOUT: Child process with pid $$, processing data #$data"; \n    warn "STDERR: Child process with pid $$, processing data #$data"; \n\n    close $fh_stdout;\n    # If needed to restore (not in this example which exits right away)\n    select $fh_STDOUT;\n    open STDERR, \'>&\', $SAVEERR  or carp "Can\'t reopen STDERR: $!";\n\n    # Dump all collected STDOUT to a file (or pass it around, it\'s a variable)\n    path( $outfile )->spew($so);\n\n    return 1\n}\n
Run Code Online (Sandbox Code Playgroud)\n

虽然STDOUT重定向到变量,但STDERR不能以这种方式重定向,这里它直接转到文件。见。然而,也有一些方法可以将其捕获到变量中。

\n

然后,您可以使用模块从子进程返回到父进程的功能,然后父进程可以处理这些变量。例如,请参阅这篇文章这篇文章这篇文章。(还有更多,这些只是我所知道的。)或者实际上只是将它们转储到文件中,就像这里所做的那样。

\n

另一种方法是使用可以运行代码和重定向输出的模块,例如Capture::Tiny

\n
use Capture::Tiny qw(capture);\n\nsub proc_in_child {\n    my ($data) = @_; \n    say "Process $$ with data $data";  # on terminal\n\n    # Run code and capture all output\n    my ($stdout, $stderr, @results) = capture {\n          say  "STDOUT: Child process $$, processing data #$data";\n          warn "STDERR: Child process $$, processing data #$data"; \n\n          # return results perhaps...\n          1 .. 4;\n    }\n\n    # Do as needed with variables with collected STDOUT and STDERR\n    # Return to parent, or dump to file:\n    my ($outfile, $errfile) = \n        map { "proc_data-${data}_" . $_ . ".$$.out" } qw(stdout stderr);\n\n    path($outfile) -> spew( $stdout );\n    path($errfile) -> spew( $stderr );\n\n    return 1\n}    \n
Run Code Online (Sandbox Code Playgroud)\n
\n

\xe2\x80\xa0这会保持相同数量的进程运行。或者,可以将其设置为等待整批完成,然后开始另一批。一些操作细节请看这篇文章

\n