GNU Parallel - 哪个工作失败了?

Wee*_*Dom 7 parallel-processing bash

我正在使用GNU parallel在几个不同的服务器(最多25个)上运行一个作业.

实现此目的的shell脚本当前执行:

parallel --tag --nonall -S $some_list_of_servers "some_command"
state=$?
echo -n "RESULT: "
if [ "$state" -eq "0" ]
then
    echo "All jobs successful"
else
    echo "$state jobs failed"
fi
return $state
Run Code Online (Sandbox Code Playgroud)

其中some_list_of_servers是一个数组,而install_command例如是git fetch.

我想要的是更多的信息,而不仅仅是失败的工作量.我想知道哪个命令和哪个服务器失败了.

我已经浏览了手册页,谷歌和SO,但找不到我正在寻找的开关.

任何帮助感激不尽.

WeeDom

编辑回答答案1:

我试过了,发生了一些奇怪的事情.

weedom@host1: ~/$ parallel --tag --nonall  -j8 --joblog test.log -S host1,host2 uptime 
host2   10:41:17 up 36 days, 20:45,  1 user,  load average: 0.00, 0.00, 0.00
host1         10:41:17 up 22:34,  3 users,  load average: 0.06, 0.11, 0.04
weedom@host1: ~/$ cat test.log
Seq     Host    Starttime       Runtime Send    Receive Exitval Signal  Command
1       host1        1403689277.067  0.519999980926514       0       0       0      0       uptime
Run Code Online (Sandbox Code Playgroud)

无论我向-S添加了多少主机,我似乎只能将最后一个主机完成到test.log中

我在这里添加了一个后续问题:GNU Parallel - --joblog只记录上一个作业

Jon*_*rsi 6

您想使用该--joblog选项,如文档中所示.Gnu并行甚至允许重新启动失败的--resume-failed.

例如,运行此脚本:

#!/bin/bash
jobmod=$(( $1 % 3 ))
if [ $jobmod == 0 ]
then
    exit 1
else
    exit 0
fi 
Run Code Online (Sandbox Code Playgroud)

在这样的几个主机上:

$ seq 1 10 | parallel --joblog out.log -S "srv01,srv02,srv03,srv04" ./failjob 
Run Code Online (Sandbox Code Playgroud)

$ more out.log
Seq Host    Starttime   Runtime Send    Receive Exitval Signal  Command
1   srv01   1403542514.713  0.267   0   0   0   0   ./failjob 1
3   srv02   1403542514.717  0.266   0   0   1   0   ./failjob 3
4   srv03   1403542514.719  0.266   0   0   0   0   ./failjob 4
2   srv04   1403542514.715  0.397   0   0   0   0   ./failjob 2
5   srv01   1403542514.983  0.231   0   0   0   0   ./failjob 5
6   srv02   1403542514.986  0.368   0   0   1   0   ./failjob 6
7   srv03   1403542514.988  0.388   0   0   0   0   ./failjob 7
8   srv04   1403542515.121  0.437   0   0   0   0   ./failjob 8
9   srv01   1403542515.221  0.343   0   0   1   0   ./failjob 9
10  srv02   1403542515.356  0.388   0   0   0   0   ./failjob 10
Run Code Online (Sandbox Code Playgroud)