我有一个 Nagios 服务器和一个受监控的服务器。在受监控的服务器上:
[root@Monitored ~]# netstat -an |grep :5666
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN
[root@Monitored ~]# locate check_kvm
/usr/lib64/nagios/plugins/check_kvm
[root@Monitored ~]# /usr/lib64/nagios/plugins/check_kvm -H localhost
hosts:3 OK:3 WARN:0 CRIT:0 - ab2c7:running alpweb5:running istaweb5:running
[root@Monitored ~]# /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_kvm
NRPE: Unable to read output
[root@Monitored ~]# /usr/lib64/nagios/plugins/check_nrpe -H localhost
NRPE v2.14
[root@Monitored ~]# ps -ef |grep nrpe
nagios 21178 1 0 16:11 ? 00:00:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
[root@Monitored ~]#
Run Code Online (Sandbox Code Playgroud)
在 Nagios 服务器上:
[root@Nagios ~]# /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.159 -c check_kvm
NRPE: Unable to read output
[root@Nagios ~]# /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.159
NRPE v2.14
[root@Nagios ~]#
Run Code Online (Sandbox Code Playgroud)
当我使用相同的命令检查网络中的另一台服务器时,它可以工作:
[root@Nagios ~]# /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.80 -c check_kvm
hosts:4 OK:4 WARN:0 CRIT:0 - karmisoft:running ab2c4:running kidumim1:running travel2gether1:running
[root@Nagios ~]#
Run Code Online (Sandbox Code Playgroud)
使用 Nagios 帐户在本地运行检查:
[root@Monitored ~]# su - nagios
-bash-4.1$ /usr/lib64/nagios/plugins/check_kvm
hosts:3 OK:3 WARN:0 CRIT:0 - ab2c7:running alpweb5:running istaweb5:running
-bash-4.1$
Run Code Online (Sandbox Code Playgroud)
使用 Nagios 帐户从 Nagios 服务器远程运行检查:
-bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.159 -c check_kvm
NRPE: Unable to read output
-bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.159
NRPE v2.14
-bash-4.1$
Run Code Online (Sandbox Code Playgroud)
使用 Nagios 帐户对网络中的不同服务器运行相同的 check_kvm:
-bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.80 -c check_kvm
hosts:4 OK:4 WARN:0 CRIT:0 - karmisoft:running ab2c4:running kidumim1:running travel2gether1:running
-bash-4.1$
Run Code Online (Sandbox Code Playgroud)
权限:
-rwxr-xr-x. 1 root root 4684 2013-10-14 17:14 nrpe.cfg (aka /etc/nagios/nrpe.cfg)
drwxrwxr-x. 3 nagios nagios 4096 2013-10-15 03:38 plugins (aka /usr/lib64/nagios/plugins)
Run Code Online (Sandbox Code Playgroud)
/etc/sudoers:
[root@Monitored ~]# grep -i requiretty /etc/sudoers
#Defaults requiretty
Run Code Online (Sandbox Code Playgroud)
iptables/selinux:
[root@Monitored xinetd.d]# service iptables status
iptables: Firewall is not running.
[root@Monitored xinetd.d]# service ip6tables status
ip6tables: Firewall is not running.
[root@Monitored xinetd.d]# grep disable /etc/selinux/config
# disabled - No SELinux policy is loaded.
SELINUX=disabled
[root@Monitored xinetd.d]#
Run Code Online (Sandbox Code Playgroud)
里面的命令/etc/nagios/nrpe.cfg
是:
[root@Monitored ~]# grep kvm /etc/nagios/nrpe.cfg
command[check_kvm]=sudo /usr/lib64/nagios/plugins/check_kvm
Run Code Online (Sandbox Code Playgroud)
并且nagios
用户被添加到/etc/sudoers
:
nagios ALL=(ALL) NOPASSWD:/usr/lib64/nagios/plugins/check_kvm
nagios ALL=(ALL) NOPASSWD:/usr/lib64/nagios/plugins/check_nrpe
Run Code Online (Sandbox Code Playgroud)
这check_kvm
是一个shell脚本,看起来像这样:
#!/bin/sh
LIST=$(virsh list --all | sed '1,2d' | sed '/^$/d'| awk '{print $2":"$3}')
if [ ! "$LIST" ]; then
EXITVAL=3 #Status 3 = UNKNOWN (orange)
echo "Unknown guests"
exit $EXITVAL
fi
OK=0
WARN=0
CRIT=0
NUM=0
for host in $(echo $LIST)
do
name=$(echo $host | awk -F: '{print $1}')
state=$(echo $host | awk -F: '{print $2}')
NUM=$(expr $NUM + 1)
case "$state" in
running|blocked) OK=$(expr $OK + 1) ;;
paused) WARN=$(expr $WARN + 1) ;;
shutdown|shut*|crashed) CRIT=$(expr $CRIT + 1) ;;
*) CRIT=$(expr $CRIT + 1) ;;
esac
done
if [ "$NUM" -eq "$OK" ]; then
EXITVAL=0 #Status 0 = OK (green)
fi
if [ "$WARN" -gt 0 ]; then
EXITVAL=1 #Status 1 = WARNING (yellow)
fi
if [ "$CRIT" -gt 0 ]; then
EXITVAL=2 #Status 2 = CRITICAL (red)
fi
echo hosts:$NUM OK:$OK WARN:$WARN CRIT:$CRIT - $LIST
exit $EXITVAL
Run Code Online (Sandbox Code Playgroud)
编辑(2013 年 10 月 22 日):在所有这些之后,我现在可以从脚本中得到一些响应:
[root@Monitored ~]# /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_kvm
Unknown guests
[root@Monitored ~]# /usr/lib64/nagios/plugins/check_nrpe -H localhost
NRPE v2.14
[root@Monitored ~]# /usr/lib64/nagios/plugins/check_kvm
hosts:3 OK:3 WARN:0 CRIT:0 - ab2c7:running alpweb5:running istaweb5:running
[root@Monitored ~]# su - nagios
-bash-4.1$ /usr/lib64/nagios/plugins/check_kvm
hosts:3 OK:3 WARN:0 CRIT:0 - ab2c7:running alpweb5:running istaweb5:running
-bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_kvm
Unknown guests
-bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H localhost
NRPE v2.14
Run Code Online (Sandbox Code Playgroud)
问题似乎与check_nrpe
命令或与nrpe
服务器上的安装有关的某些内容有关。
编辑 12/2/13:对有问题的服务器的其他检查工作:
伊泰写得好详细!您是否尝试过降低配置的复杂性来看看它是否有效?
对于初学者来说,我首先将行更改nrpe.cfg
为
command[check_kvm]=/usr/lib64/nagios/plugins/check_kvm
Run Code Online (Sandbox Code Playgroud)
并暂时将 /usr/lib64/nagios/plugins/check_kvm 脚本更改为非常简单的内容,例如:
#!/bin/sh
echo Hi
exit 0
Run Code Online (Sandbox Code Playgroud)
如果有效,那么您就可以开始提高复杂性。也许它确实需要访问命令,而不是授予nagios
用户 sudo 对脚本的访问权限,并且您可以在命令行中virsh
省略该部分。sudo
nrpe.cfg