在/var/log/kern.log
:
kernel: [13291329.657499] EDAC MC0: 48 CE error on CPU#0Channel#2_DIMM#0 (channel:2 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0)
Run Code Online (Sandbox Code Playgroud)
这是edac
日志,其中一个内存有ce
错误。
我已阅读edac 文档
Dual channels allows for 128 bit data transfers to the CPU from memory.
Some newer chipsets allow for more than 2 channels, like Fully Buffered DIMMs
(FB-DIMMs). The following example will assume 2 channels:
Channel 0 Channel 1
===================================
csrow0 | DIMM_A0 | DIMM_B0 |
csrow1 | DIMM_A0 | DIMM_B0 |
===================================
===================================
csrow2 | DIMM_A1 | DIMM_B1 |
csrow3 | DIMM_A1 | DIMM_B1 |
===================================
Run Code Online (Sandbox Code Playgroud)
并找到错误通道:
$ grep "[0-9]" /sys/devices/system/edac/mc/mc*/csrow*/ch*_ce_count
/sys/devices/system/edac/mc/mc0/csrow0/ch0_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow0/ch1_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow0/ch2_ce_count:144648966
/sys/devices/system/edac/mc/mc0/csrow1/ch0_ce_count:0
/sys/devices/system/edac/mc/mc1/csrow0/ch0_ce_count:0
/sys/devices/system/edac/mc/mc1/csrow0/ch1_ce_count:0
/sys/devices/system/edac/mc/mc1/csrow0/ch2_ce_count:0
/sys/devices/system/edac/mc/mc1/csrow1/ch0_ce_count:0
/sys/devices/system/edac/mc/mc1/csrow1/ch1_ce_count:0
Run Code Online (Sandbox Code Playgroud)
它应该是mc0/csrow0/ch2
,作为文档,DIMM 应该是DIMM_C0
,并且可以通过以下方式找到 dmidecode
:
但是我找不到这个DIMM,所以我不知道哪个内存有问题:
$ dmidecode -t memory | grep 'Locator: PROC'
Locator: PROC 1 DIMM 2A
Locator: PROC 1 DIMM 1D
Locator: PROC 1 DIMM 4B
Locator: PROC 1 DIMM 3E
Locator: PROC 1 DIMM 6C
Locator: PROC 1 DIMM 5F
Locator: PROC 2 DIMM 2A
Locator: PROC 2 DIMM 1D
Locator: PROC 2 DIMM 4B
Locator: PROC 2 DIMM 3E
Locator: PROC 2 DIMM 6C
Locator: PROC 2 DIMM 5F
Run Code Online (Sandbox Code Playgroud)
有12个插槽,9个插槽有内存。
那么我怎么知道哪个内存有问题呢?
补充:
System Information
Manufacturer: HP
Product Name: ProLiant DL180 G6
Run Code Online (Sandbox Code Playgroud)
您的问题 DIMM 很可能 - Locator: PROC 1 DIMM 5F
CPU#0Channel#2_DIMM#0表示:
PROC 1,
1D,2A = Channel 0
3E,4B = Channel 1
5F,6C = Channel 2
5F = DIMM 0
6C = DIMM 1
Run Code Online (Sandbox Code Playgroud)
编辑:
当提出问题时,更多的信息总是更好......拥有服务器制造商和型号可以简化这一点:
这是HP ProLiant DL180 G6 Quickspecs 中的内存图:
我的建议是 CPU 插槽 #1 中的 DIMM 是正确的...但这是 HP 硬件。你不应该猜的!!
您应该使用 HP 的管理代理,因为它们可以发出警报并提供有关硬件运行状况和状态的特定于平台的详细信息...
[root@veloce ~]# hpasmcli
HP management CLI for Linux (v2.0)
Copyright 2008 Hewlett-Packard Development Group, L.P.
--------------------------------------------------------------------------
This server ProLiant DL180 G6 , is a Proliant 100 Series Server.
NOTE: Some hpasmcli commands may not be supported on 100 series servers.
Type 'help' to get a list of all top level commands.
--------------------------------------------------------------------------
hpasmcli> show dimm
Cartridge #: 0
Processor #: 1
Module #: 2
Present: Yes
Form Factor: fh
Memory Type: 5h
Size: 4096 MB
Speed: 1333 MHz
Status: N/A
Cartridge #: 0
Processor #: 1
Module #: 1
Present: Yes
Form Factor: fh
Memory Type: 5h
Size: 4096 MB
Speed: 1333 MHz
Status: N/A
Cartridge #: 0
Processor #: 1
Module #: 4
Present: Yes
Form Factor: fh
Memory Type: 5h
Size: 4096 MB
Speed: 1333 MHz
Status: N/A
Cartridge #: 0
Processor #: 1
Module #: 6
Present: Yes
Form Factor: fh
Memory Type: 5h
Size: 4096 MB
Speed: 1333 MHz
Status: N/A
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
22276 次 |
最近记录: |