dma*_*man 0 ram drivers edac 18.04
仿生 LTS 服务器
我有一个 Ryzen 处理器和 AsRock 主板,两者都运行 ECC 没有问题。
我遇到的问题是syslog我看到Not enabling Memory Error Detection and Correction since EDAC_DRIVER is not set:
root@localhost:/home/one# dmesg | grep edac
[ 4.858773] EDAC MC0: Giving out device to module amd64_edac controller F17h: DEV 0000:00:18.3 (INTERRUPT)
[ 4.858781] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
root@localhost:/home/one# cat /var/log/syslog | grep -i edac
Oct 15 20:50:34 localhost systemd-modules-load[502]: Module 'edac_core' is builtin
Oct 15 20:50:34 localhost systemd[1]: Starting LSB: Initialize EDAC...
Oct 15 20:50:34 localhost edac[832]: * Not enabling Memory Error Detection and Correction since EDAC_DRIVER is not set
Oct 15 20:50:34 localhost edac[832]: ...done.
Oct 15 20:50:34 localhost edac[832]: * Loading DIMM labels for Memory Error Detection and Correction edac
Oct 15 20:50:34 localhost kernel: [ 0.156551] EDAC MC: Ver: 3.0.0
Oct 15 20:50:34 localhost kernel: [ 4.858684] EDAC amd64: Node 0: DRAM ECC enabled.
Oct 15 20:50:34 localhost kernel: [ 4.858685] EDAC amd64: F17h detected (node 0).
Oct 15 20:50:34 localhost kernel: [ 4.858719] EDAC MC: UMC0 chip selects:
Oct 15 20:50:34 localhost kernel: [ 4.858720] EDAC amd64: MC: 0: 0MB 1: 0MB
Oct 15 20:50:34 localhost kernel: [ 4.858720] EDAC amd64: MC: 2: 0MB 3: 0MB
Oct 15 20:50:34 localhost kernel: [ 4.858721] EDAC amd64: MC: 4: 0MB 5: 0MB
Oct 15 20:50:34 localhost kernel: [ 4.858721] EDAC amd64: MC: 6: 0MB 7: 0MB
Oct 15 20:50:34 localhost kernel: [ 4.858723] EDAC MC: UMC1 chip selects:
Oct 15 20:50:34 localhost kernel: [ 4.858723] EDAC amd64: MC: 0: 0MB 1: 0MB
Oct 15 20:50:34 localhost kernel: [ 4.858724] EDAC amd64: MC: 2: 16383MB 3: 16383MB
Oct 15 20:50:34 localhost kernel: [ 4.858725] EDAC amd64: MC: 4: 0MB 5: 0MB
Oct 15 20:50:34 localhost kernel: [ 4.858725] EDAC amd64: MC: 6: 0MB 7: 0MB
Oct 15 20:50:34 localhost kernel: [ 4.858725] EDAC amd64: using x8 syndromes.
Oct 15 20:50:34 localhost kernel: [ 4.858726] EDAC amd64: MCT channel count: 1
Oct 15 20:50:34 localhost kernel: [ 4.858773] EDAC MC0: Giving out device to module amd64_edac controller F17h: DEV 0000:00:18.3 (INTERRUPT)
Oct 15 20:50:34 localhost kernel: [ 4.858781] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
Oct 15 20:50:34 localhost kernel: [ 4.858781] AMD64 EDAC driver v3.5.0
Oct 15 20:50:34 localhost edac[832]: ...done.
Oct 15 20:50:34 localhost systemd[1]: Started LSB: Initialize EDAC.
Run Code Online (Sandbox Code Playgroud)
在 /etc/modules 我放置了edac_core. 我还看到内核中启用了 ECC:
root@localhost:/home/one# cat /usr/src/linux-headers-4.15.0-29-generic/.config | grep -i edac
CONFIG_EDAC_ATOMIC_SCRUB=y
CONFIG_EDAC_SUPPORT=y
CONFIG_EDAC=y
# CONFIG_EDAC_LEGACY_SYSFS is not set
# CONFIG_EDAC_DEBUG is not set
CONFIG_EDAC_DECODE_MCE=m
CONFIG_EDAC_GHES=y
CONFIG_EDAC_AMD64=m
# CONFIG_EDAC_AMD64_ERROR_INJECTION is not set
CONFIG_EDAC_E752X=m
CONFIG_EDAC_I82975X=m
CONFIG_EDAC_I3000=m
CONFIG_EDAC_I3200=m
CONFIG_EDAC_IE31200=m
CONFIG_EDAC_X38=m
CONFIG_EDAC_I5400=m
CONFIG_EDAC_I7CORE=m
CONFIG_EDAC_I5000=m
CONFIG_EDAC_I5100=m
CONFIG_EDAC_I7300=m
CONFIG_EDAC_SBRIDGE=m
CONFIG_EDAC_SKX=m
CONFIG_EDAC_PND2=m
root@localhost:/home/one# cat /usr/src/linux-headers-4.15.0-29-generic/.config | grep -i ecc
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_MTD_NAND_ECC=m
# CONFIG_MTD_NAND_ECC_SMC is not set
CONFIG_MTD_NAND_ECC_BCH=y
CONFIG_AMD_XGBE_HAVE_ECC=y
CONFIG_MTD_SPINAND_ONDIEECC=y
Run Code Online (Sandbox Code Playgroud)
是什么原因造成的Not enabling Memory Error Detection and Correction since EDAC_DRIVER is not set,我该如何解决?
更新:来自 edac-utils 的输出
root@localhost:/home/one# edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
edac-util: No errors to report.
Run Code Online (Sandbox Code Playgroud)
消息
* 由于未设置 EDAC_DRIVER,因此不启用内存错误检测和纠正是来自 edac init 脚本(edac-utils 包的一部分)的一个不必要的可怕消息。它告诉你的是它没有手动加载特定的 edac 内核模块,因为变量 $EDAC_DRIVER 没有在 /etc/default/edac.conf 中设置。您可以从 init 脚本的相关部分看到这一点:
如果 [ -n "$EDAC_DRIVER" ]; 然后
log_daemon_msg "启用 ${DESC}" "$SERVICE"
modprobe $EDAC_DRIVER
状态=$?
案例 $STATUS 在
0) log_end_msg 0 ;;
5) log_failure_msg "此硬件不支持 EDAC"; log_end_msg 1 ;;
*) log_failure_msg "失败,退出代码 $STATUS"; log_end_msg 1 ;;
esac
别的
log_daemon_msg "未启用 ${DESC},因为未设置 EDAC_DRIVER"
log_end_msg 0
菲
log_daemon_msg "正在为 ${DESC} 加载 DIMM 标签" "$SERVICE"
$edac_ctl --register-labels --quiet
鉴于内核会自动确定要应用哪个 edac 驱动程序,并且 $edac_ctl 命令(紧跟在检查 $EDAC_DRIVER 是否设置的 if-then-else 块之后)成功注册了 DIMM 标签,在我看来一切正常在这里(但是,完全公开,我对 EDAC 一无所知)。
| 归档时间: |
|
| 查看次数: |
1914 次 |
| 最近记录: |