Wee*_*Dom 7 parallel-processing bash
我正在使用GNU parallel在几个不同的服务器(最多25个)上运行一个作业.
实现此目的的shell脚本当前执行:
parallel --tag --nonall -S $some_list_of_servers "some_command"
state=$?
echo -n "RESULT: "
if [ "$state" -eq "0" ]
then
echo "All jobs successful"
else
echo "$state jobs failed"
fi
return $state
Run Code Online (Sandbox Code Playgroud)
其中some_list_of_servers是一个数组,而install_command例如是git fetch.
我想要的是更多的信息,而不仅仅是失败的工作量.我想知道哪个命令和哪个服务器失败了.
我已经浏览了手册页,谷歌和SO,但找不到我正在寻找的开关.
任何帮助感激不尽.
WeeDom
编辑回答答案1:
我试过了,发生了一些奇怪的事情.
weedom@host1: ~/$ parallel --tag --nonall -j8 --joblog test.log -S host1,host2 uptime
host2 10:41:17 up 36 days, 20:45, 1 user, load average: 0.00, 0.00, 0.00
host1 10:41:17 up 22:34, 3 users, load average: 0.06, 0.11, 0.04
weedom@host1: ~/$ cat test.log
Seq Host Starttime Runtime Send Receive Exitval Signal Command
1 host1 1403689277.067 0.519999980926514 0 0 0 0 uptime
Run Code Online (Sandbox Code Playgroud)
无论我向-S添加了多少主机,我似乎只能将最后一个主机完成到test.log中
我在这里添加了一个后续问题:GNU Parallel - --joblog只记录上一个作业
您想使用该--joblog
选项,如文档中所示.Gnu并行甚至允许重新启动失败的--resume-failed
.
例如,运行此脚本:
#!/bin/bash
jobmod=$(( $1 % 3 ))
if [ $jobmod == 0 ]
then
exit 1
else
exit 0
fi
Run Code Online (Sandbox Code Playgroud)
在这样的几个主机上:
$ seq 1 10 | parallel --joblog out.log -S "srv01,srv02,srv03,srv04" ./failjob
Run Code Online (Sandbox Code Playgroud)
给
$ more out.log
Seq Host Starttime Runtime Send Receive Exitval Signal Command
1 srv01 1403542514.713 0.267 0 0 0 0 ./failjob 1
3 srv02 1403542514.717 0.266 0 0 1 0 ./failjob 3
4 srv03 1403542514.719 0.266 0 0 0 0 ./failjob 4
2 srv04 1403542514.715 0.397 0 0 0 0 ./failjob 2
5 srv01 1403542514.983 0.231 0 0 0 0 ./failjob 5
6 srv02 1403542514.986 0.368 0 0 1 0 ./failjob 6
7 srv03 1403542514.988 0.388 0 0 0 0 ./failjob 7
8 srv04 1403542515.121 0.437 0 0 0 0 ./failjob 8
9 srv01 1403542515.221 0.343 0 0 1 0 ./failjob 9
10 srv02 1403542515.356 0.388 0 0 0 0 ./failjob 10
Run Code Online (Sandbox Code Playgroud)