SLURM文档中使用的术语"作业","任务"和"步骤"如何相互关联?
AFAICT,一个工作可能包含多个任务,并且它由多个步骤组成,但是,假设这是真的,我仍然不清楚任务和步骤是如何相关的.
查看显示作业/任务/步骤完全复杂性的示例会很有帮助.
dam*_*ois 14
一个工作由一个或多个步骤,每个由一个或多个任务的每一个使用一个或多个CPU.
乔布斯通常与所创建的sbatch命令,与创建步骤srun的命令中,要求任务(作业级或步级)与--ntasks和CPU请求每个任务使用--cpus-per-task.请注意,提交的作业sbatch有一个隐含的步骤; Bash脚本本身.
假设假设的工作:
#SBATCH --nodes 8
#SBATCH --tasks-per-node 8
# The job requests 64 CPUs, on 8 nodes.
# First step, with a sub-allocation of 8 tasks (one per node) to create a tmp dir.
# No need for more than one task per node, but it has to run on every node
srun --nodes 8 --tasks 8 mkdir -p /tmp/$USER/$SLURM_JOBID
# Second step with the full allocation (64 tasks) to run an MPI
# program on some data to produce some output.
srun process.mpi <input.dat >output.txt
# Third step with a sub allocation of 48 tasks (because for instance
# that program does not scale as well) to post-process the output and
# extract meaningful information
srun --ntasks 48 --nodes 6 --exclusive postprocess.mpi <output.txt >result.txt &
# Four step with a sub-allocation on a single node (because maybe
# it is a multithreaded program that cannot use CPUs on distinct nodes)
# to compress the raw output. This step runs at the same time as
# the previous one thanks to the ampersand `&`
OMP_NUM_THREAD=12 srun --ntasks 12 --nodes 1 --exclusive compress output.txt &
wait
Run Code Online (Sandbox Code Playgroud)
创建了四个步骤,因此该作业的会计信息将有5行; 每步一个加一个Bash脚本本身.
| 归档时间: |
|
| 查看次数: |
2163 次 |
| 最近记录: |