在服务器的所有磁盘上运行 smartctl

Question

在服务器的所有磁盘上运行 smartctl

Zer*_*ive 7 raid logs shell-script hard-disk smartctl

我的问题很简单，我想smartctl -i -A在服务器拥有的所有磁盘上运行该命令。\n我认为我的服务器太多，具有不同数量的磁盘和 RAID 控制器，那么我需要扫描所有驱动程序进行诊断.\n我正在考虑运行smartctl --scan | awk \'{print $1}\' >> test.log，因此如果我打开 test.log，我将在其中包含所有驱动器信息。
\n此后，我需要运行一些 if 或 do 结构来扫描smartctl所有驱动程序。\n我不知道这是否是执行此操作的最佳方法，因为我也需要识别 RAID 控制器。\n我正在前往方向正确吗？

\n\n

编辑：

\n\n

我习惯使用这些命令来排除故障：

\n\n

不带 RAID 控制器

\n\n

for i in {c..d}; do\n    echo "Disk sd$i" $SN $MD\n    smartctl -i -A /dev/sd$i |grep -E "^  "5"|^"197"|^"198"|"FAILING_NOW"|"SERIAL""\ndone\n

Run Code Online (Sandbox Code Playgroud)\n\n

PERC控制器

\n\n

for i in {0..12}; do\n    echo "$i" $SN $MD\n    smartctl -i -A -T permissive /dev/sda -d megaraid,$i |grep -E "^  "5"|^"197"|^"198"|"FAILING_NOW"|"SERIAL""\ndone\n/usr/sbin/megastatus \xe2\x80\x93physical\n/usr/sbin/megastatus --logical\n

Run Code Online (Sandbox Code Playgroud)\n\n

三件控制器

\n\n

for i in {0..10}; do\n    echo "Disk $i" $SN $MD\n    smartctl -i -A /dev/twa0 -d 3ware,$i |grep -E "^  "5"|^"197"|^"198"|"FAILING_NOW"|"SERIAL""\ndone\n

Run Code Online (Sandbox Code Playgroud)\n\n

SmartArray 和 Megaraid 控制器：

\n\n

smartctl \xe2\x80\x93a \xe2\x80\x93d cciss,0 /dev/cciss/c0d0\n/opt/3ware/9500/tw_cli show\ncd /tmp\n

Run Code Online (Sandbox Code Playgroud)\n\n

DD（重写磁盘块（销毁数据））：

\n\n

dd if=/dev/zero of=/dev/HD* bs=4M\nHD*: sda, sdb\xe2\x80\xa6\n

Run Code Online (Sandbox Code Playgroud)\n\n

烧录（压力测试（DESTROY DATA））：

\n\n

/opt/systems/bin/vs-burnin --destructive --time=<hours> /tmp/burninlog.txt\n

Run Code Online (Sandbox Code Playgroud)\n\n

Dmesg&kernerrors：

\n\n

tail /var/log/kernerrors\ndmesg |grep \xe2\x80\x93i \xe2\x80\x93E \xe2\x80\x9c\xe2\x80\x9data\xe2\x80\x9d|\xe2\x80\x9dfault\xe2\x80\x9d|\xe2\x80\x9derror\xe2\x80\x9d\n

Run Code Online (Sandbox Code Playgroud)\n\n

所以我想做的是自动化这些命令。
\n我希望脚本验证主机拥有的所有磁盘并smartctl针对该情况运行适当的命令。
\n类似带有一些选项的菜单，让我选择是否要运行一个smartctl或某些破坏性命令，如果我选择运行smartctl
\n脚本将扫描所有磁盘并根据主机配置运行命令（带/不带 RAID 控制器） ),
\n如果我选择运行破坏性命令，脚本会要求我输入我想要执行此操作的磁盘号。

\n\n

编辑2：

\n\n

我用以下脚本解决了我的问题：

\n\n

#!/bin/bash\n# Troubleshoot.sh\n# A more elaborate version of Troubleshoot.sh.\n\nSUCCESS=0\nE_DB=99    # Error code for missing entry.\n\ndeclare -A address\n#       -A option declares associative array.\n\n\n\nif [ -f Troubleshoot.log ]\nthen\n    rm Troubleshoot.log\nfi\n\nif [ -f HDs.log ]\nthen\n    rm HDs.log\nfi\n\nsmartctl --scan | awk \'{print $1}\' >> HDs.log\nlspci | grep -i raid >> HDs.log\n\ngetArray ()\n{\n    i=0\n    while read line # Read a line\n    do\n        array[i]=$line # Put it into the array\n        i=$(($i + 1))\n    done < $1\n}\n\ngetArray "HDs.log"\n\n\nfor e in "${array[@]}"\ndo\n    if [[ $e =~ /dev/sd* || $e =~ /dev/hd* ]]\n        then\n            echo "smartctl -i -A $e" >> Troubleshoot.log\n            smartctl -i -A $e >> Troubleshoot.log # Run smartctl into all disks that the host have\n    fi\ndone\nexit $?   # In this case, exit code = 99, since that is function return.\n

Run Code Online (Sandbox Code Playgroud)\n\n

我不知道这个解决方案是否正确或最好，但对我有用！
\n感谢所有帮助！

\n

Answer 1

小智 1

所以我想做的就是自动化这些命令。

这已经存在并体现在smartd.

您通常需要在中配置您想要的行为 /etc/smartd.conf

例子：

# DEVICESCAN: tells smartd to scan for all ATA and SCSI devices
# Alternative setting to report more useful raw temperature in syslog.
DEVICESCAN -I 194 -I 231 -I 9

Run Code Online (Sandbox Code Playgroud)

您也可以明确地放置磁盘，例如

/dev/sdc -d 3ware,0 -a -s L/../../7/01

Run Code Online (Sandbox Code Playgroud)

如果smartd发现错误，您将收到一封电子邮件：

/dev/hdc -a -I 194 -W 4,45,55 -R 5 -m admin@example.com

Run Code Online (Sandbox Code Playgroud)

还有许多其他选项和开关，您需要阅读的联机帮助页smartd.conf。

归档时间：	11 年，8 月前
查看次数：	28652 次
最近记录：	2 年，7 月前