Python等待Slurm工作？

Question

Python等待Slurm工作？

我有一个 python 脚本，它应该为要调用的外部程序生成一堆输入。对外部程序的调用将通过 slurm。

我想要的是让我的脚本等到对外部程序的所有生成的调用都完成（不是 slurm 命令，外部程序的实际执行），然后解析外部程序生成的输出，做一些事情数据。

我尝试了子进程调用，但它只等待 slurm 提交命令。有什么建议吗？

Answer 1

您可以像之前尝试的那样在子进程中异步运行 sbatch 命令，但对 sbatch 使用 -W 或 --wait 命令行选项。这将导致子进程在作业终止之前不会返回。然后，您可以阻止主程序的执行，直到所有子进程完成。作为奖励，这也将允许您处理来自外部程序的意外返回值。有关更多信息，请参阅sbatch 文档

Answer 2

use*_*663 5

solution 1

I would suggest breaking your pipeline up in smaller steps, which can then be automated in a bash script etc. First you generate all the commands that needs to be run through slurm. If you submit them as a slurm job array (see e.g. here), you can then simultaneous submit the script that parses the output of all these commands. Using slurm dependencies, you can make this job start only after the job array has finished.

solution 2

You could do a while loop in your python script and check the status of the jobs:

import time
t = time.time()
while True:
    # Break if this takes more than some_limit
    if time.time() - t > some_limit:
        break
    # Check if the jobs are done. This could be done by
    # grep'ing squeue for your username and some tags
    # that you name your jobs
    check_for_completion()
    # Sleep for a while depending on the estimated completion time of the jobs
    time.sleep(some_time)

Run Code Online (Sandbox Code Playgroud)

solution 3

Reserve N nodes on slurm and run your script there. This avoids cluttering the front end. I suggest gnu parallel to distribute your jobs on the node.

归档时间：	7 年，2 月前
查看次数：	2101 次
最近记录：	6 年，11 月前