Chr*_*ill 25 fan overheating power-management temperature intel-cpu
我每小时收到几次以下错误消息:
08.03.18 21:27 kernel CPU0: Core temperature above threshold, cpu clock throttled (total events = 2234)
08.03.18 21:27 kernel CPU2: Core temperature above threshold, cpu clock throttled (total events = 2234)
08.03.18 21:27 kernel CPU1: Package temperature above threshold, cpu clock throttled (total events = 2695)
08.03.18 21:27 kernel CPU3: Package temperature above threshold, cpu clock throttled (total events = 2695)
08.03.18 21:27 kernel CPU2: Package temperature above threshold, cpu clock throttled (total events = 2695)
08.03.18 21:27 kernel CPU0: Package temperature above threshold, cpu clock throttled (total events = 2695)
08.03.18 21:27 kernel CPU2: Core temperature/speed normal
08.03.18 21:27 kernel CPU0: Core temperature/speed normal
08.03.18 21:27 kernel CPU3: Package temperature/speed normal
08.03.18 21:27 kernel CPU1: Package temperature/speed normal
08.03.18 21:27 kernel CPU0: Package temperature/speed normal
08.03.18 21:27 kernel CPU2: Package temperature/speed normal
Run Code Online (Sandbox Code Playgroud)
硬件规格:
ThinkPad X1 Yoga 2nd
N1NET33W (1.20 )
Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
Production date 2017.11
Run Code Online (Sandbox Code Playgroud)
软件:
Distributor ID: Ubuntu
Description: Ubuntu 17.10
Release: 17.10
Codename: artful
Linux 4.13.0-36-generic #40-Ubuntu SMP Fri Feb 16 20:07:48 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Run Code Online (Sandbox Code Playgroud)
生物:
我在 BIOS 设置中将电池和交流电都设置为性能,BIOS 是最新的。
问题是什么
问题是 CPU 温度的阈值太早达到了,它发生在 75°C 左右,即使允许CPU达到 100°C。温度永远不会超过 85°C。所以 CPU 的功率会被限制得如此之快。
我不知道我的制造商硬件问题是否与导热膏不足有关,或者是否与软件有关。在我将它发送给联想之前,我想确保它不是一个自制的问题。
统计数据
当我进行压力测试时
stress -c 4 -t 300
Run Code Online (Sandbox Code Playgroud)
错误消息几乎立即发生。
i7z 给出以下输出:
Cpu speed from cpuinfo 2903.00Mhz
cpuinfo might be wrong if cpufreq is enabled. To guess correctly try estimating via tsc
Linux's inbuilt cpu_khz code emulated now
True Frequency (without accounting Turbo) 2903 MHz
CPU Multiplier 29x || Bus clock frequency (BCLK) 100.10 MHz
Socket [0] - [physical cores=2, logical cores=4, max online cores ever=2]
TURBO ENABLED on 2 Cores, Hyper Threading ON
Max Frequency without considering Turbo 3003.10 MHz (100.10 x [30])
Max TURBO Multiplier (if Enabled) with 1/2/3/4 Cores is 39x/39x/39x/39x
Real Current Frequency 3187.97 MHz [100.10 x 31.85] (Max of below)
Core [core-id] :Actual Freq (Mult.) C0% Halt(C1)% C3 % C6 % Temp VCore
Core 1 [0]: 3187.97 (31.85x) 99.9 0 0 0 85 1.0037
Core 2 [1]: 3187.97 (31.85x) 99.9 0 0 0 84 1.0037
C0 = Processor running without halting
C1 = Processor running with halts (States >C0 are power saver modes with cores idling)
C3 = Cores running with PLL turned off and core cache turned off
C6, C7 = Everything in C3 + core state saved to last level cache, C7 is deeper than C6
Above values in table are in percentage over the last 1 sec
[core-id] refers to core-id number in /proc/cpuinfo
'Garbage Values' message printed when garbage values are read
Ctrl+C to exit
Run Code Online (Sandbox Code Playgroud)
如前所述,它永远不会超过 85 Temp,但 CPU 会受到限制。
传感器显示以下输出
iwlwifi-virtual-0
Adapter: Virtual device
temp1: +30.0°C
pch_skylake-virtual-0
Adapter: Virtual device
temp1: +52.0°C
acpitz-virtual-0
Adapter: Virtual device
temp1: +56.0°C (crit = +98.0°C)
thinkpad-isa-0000
Adapter: ISA adapter
fan1: 5859 RPM
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +59.0°C (high = +100.0°C, crit = +100.0°C)
Core 0: +59.0°C (high = +100.0°C, crit = +100.0°C)
Core 1: +58.0°C (high = +100.0°C, crit = +100.0°C)
Run Code Online (Sandbox Code Playgroud)
但是高设置可能没有效果。
热敏电阻
所以我当时就玩弄了 Thermald。
这是我在这里找到的调整后的配置:
<?xml version="1.0" encoding="UTF-8"?>
<ThermalConfiguration>
<Platform>
<Name>Use Fan control first then CPU throttle</Name>
<ProductName>*</ProductName>
<Preference>QUIET</Preference>
<ThermalZones>
<ThermalZone>
<Type>x86_pkg_temp</Type>
<TripPoints>
<TripPoint>
<SensorType>x86_pkg_temp</SensorType>
<Temperature>90000</Temperature>
<type>passive</type>
<ControlType>SEQUENTIAL</ControlType>
<CoolingDevice>
<type>_fan_</type>
</CoolingDevice>
</TripPoint>
</TripPoints>
</ThermalZone>
</ThermalZones>
<CoolingDevices>
<CoolingDevice>
<Type>_fan_</Type>
<Path>/sys/bus/platform/devices/thinkpad_hwmon/pwm1</Path>
<MinState>100</MinState>
<MaxState>255</MaxState>
<IncDecStep>50</IncDecStep>
<DebouncePeriod>10</DebouncePeriod>
</CoolingDevice>
</CoolingDevices>
</Platform>
</ThermalConfiguration>
Run Code Online (Sandbox Code Playgroud)
这没有真正的区别,但我至少可以看到阈值设置(最后几行):
?? sudo thermald --no-daemon --loglevel=info
NO RAPL sysfs present
22 CPUID levels; family:model:stepping 0x6:8e:9 (6:142:9)
Running on a vanilla kernel
Polling mode is enabled: 4
sensor_update: type x86_pkg_temp
sensor_update: type pch_skylake
sensor_update: type iwlwifi
sensor_update: type acpitz
thd_read_default_thermal_sensors loaded 4 sensors
dts /sys/devices/platform/coretemp.0/name doesn't exist
dts /sys/class/hwmon/hwmon3/name doesn't exist
failed to open /dev/acpi_thermal_rel
failed to open /dev/acpi_thermal_rel
TRT/ART read failed
Dumping parsed XML Data
*** Index 0 ***
Name: UseFancontrolfirstthenCPUthrottle
UUID:
type: 0
Zone 0
Name: x86_pkg_temp
Trip Point 0
temp 90000
trip type 2
hyst id 0
sensor type x86_pkg_temp
cdev index 0
type _fan_
influence 0
SamplingPeriod 0
Cooling Dev 0
Type: _fan_
Path: /sys/bus/platform/devices/thinkpad_hwmon/pwm1
Min: 100
Max: 255
Step: 50
AutoDownControl: 0
Product Name matched [wildcard]
sensor index:3 x86_pkg_temp /sys/class/thermal/thermal_zone3/ Async:1
sensor index:1 pch_skylake /sys/class/thermal/thermal_zone1/ Async:0
sensor index:2 iwlwifi /sys/class/thermal/thermal_zone2/ Async:0
sensor index:0 acpitz /sys/class/thermal/thermal_zone0/ Async:0
sensor index:4 hwmon /sys/class/hwmon/hwmon1/temp1_input Async:0
sensor index:5 hwmon /sys/class/hwmon/hwmon1/temp2_input Async:0
sensor index:6 hwmon /sys/class/hwmon/hwmon1/temp3_input Async:0
thd_read_default_cooling devices loaded 4 cdevs
powercap RAPL no long term time window
Use Default pstate drv settings
Product Name matched [wildcard]
3: Processor, C:0 MN: 0 MX:10 ST:1 pt:/sys/class/thermal/ rd_bk 0
1: Processor, C:0 MN: 0 MX:10 ST:1 pt:/sys/class/thermal/ rd_bk 0
2: Processor, C:0 MN: 0 MX:10 ST:1 pt:/sys/class/thermal/ rd_bk 0
0: Processor, C:0 MN: 0 MX:10 ST:1 pt:/sys/class/thermal/ rd_bk 0
4: intel_pstate, C:0 MN: 0 MX:10 ST:1 pt:/sys/devices/system/cpu/intel_pstate/ rd_bk 1
5: _fan_, C:255 MN: 100 MX:255 ST:50 pt:/sys/bus/platform/devices/thinkpad_hwmon/pwm1 rd_bk 1
6: LCD, C:0 MN: 0 MX:1060 ST:106 pt:/sys/class/backlight/intel_backlight/ rd_bk 1
Sorted trip dump zone index:1 type:pch_skylake:
index 0: type:critical temp:115000 hyst:1 zone id:1 sensor id:1 cdev size:0
trip type: 0 temp: 115000
Sorted trip dump zone index:0 type:acpitz:
index 0: type:critical temp:98000 hyst:1 zone id:0 sensor id:0 cdev size:0
trip type: 0 temp: 98000
thd_read_default_thermal_zones loaded 2 zones
zone cpu will be created
dts zone /sys/devices/platform/coretemp.0/name doesn't exist
/sys/class/hwmon/hwmon4/name->iwlwifi
/sys/class/hwmon/hwmon2/name->pch_skylake
/sys/class/hwmon/hwmon0/name->acpitz
dts zone /sys/class/hwmon/hwmon3/name doesn't exist
/sys/class/hwmon/hwmon1/name->coretemp
Buggy max temp: to close to critical 90000
Core temp DTS :critical 100000, max 90000, psv 95000
node type: Element, name: CoolingDevice value: rapl_controller
node type: Element, name: CoolingDevice value: intel_pstate
node type: Element, name: CoolingDevice value: intel_powerclamp
node type: Element, name: CoolingDevice value: cpufreq
node type: Element, name: CoolingDevice value: Processor
CDEVS order specified in thermal-cpu-cdev-order.xml
Sorted trip dump zone index:4 type:cpu:
index 0: type:passive temp:95000 hyst:0 zone id:4 sensor id:65535 cdev size:2
cdev[0] intel_pstate
cdev[1] Processor
trip type: 2 temp: 95000
Product Name matched [wildcard]
zone x86_pkg_temp bounded
Sorted trip dump zone index:5 type:x86_pkg_temp:
index 0: type:passive temp:90000 hyst:0 zone id:5 sensor id:3 cdev size:1
cdev[0] _fan_
trip type: 2 temp: 90000
Zone 1: pch_skylake, Active:0 Bind:0 Sensor_cnt:1
..sensors..
sensor index:1 pch_skylake /sys/class/thermal/thermal_zone1/ Async:0
..trips..
index 0: type:critical temp:115000 hyst:1 zone id:1 sensor id:1 cdev size:0
Zone 0: acpitz, Active:0 Bind:0 Sensor_cnt:1
..sensors..
sensor index:0 acpitz /sys/class/thermal/thermal_zone0/ Async:0
..trips..
index 0: type:critical temp:98000 hyst:1 zone id:0 sensor id:0 cdev size:0
Zone 4: cpu, Active:1 Bind:0 Sensor_cnt:1
..sensors..
sensor index:3 x86_pkg_temp /sys/class/thermal/thermal_zone3/ Async:1
..trips..
index 0: type:passive temp:95000 hyst:0 zone id:4 sensor id:65535 cdev size:2
cdev[0] intel_pstate
cdev[1] Processor
index 1: type:polling temp:90000 hyst:0 zone id:4 sensor id:3 cdev size:0
Zone 5: x86_pkg_temp, Active:1 Bind:1 Sensor_cnt:1
..sensors..
sensor index:3 x86_pkg_temp /sys/class/thermal/thermal_zone3/ Async:1
..trips..
index 0: type:passive temp:90000 hyst:0 zone id:5 sensor id:3 cdev size:1
cdev[0] _fan_
index 1: type:polling temp:85000 hyst:0 zone id:5 sensor id:3 cdev size:0
FD = 7
Current user preference is 0
thd_engine_thread begin
Set : threshold:90000, temperature:53000, cdev:5(_fan_), curr_state:205, max_state:255
Set : threshold:90000, temperature:57000, cdev:5(_fan_), curr_state:155, max_state:255
Set : threshold:90000, temperature:85000, cdev:5(_fan_), curr_state:105, max_state:255
Set : threshold:90000, temperature:85000, cdev:5(_fan_), curr_state:100, max_state:255
Run Code Online (Sandbox Code Playgroud)
最后的想法/问题
更新 #1
在深入研究这个话题并阅读了几篇关于英特尔 CPU 节流的文章和其他在其他操作系统和内核上面临相同(或只是略有不同)问题的帖子后,我得出的结论是,我的笔记本电脑可能不会像我想象的那样行为不端.
尽管内核消息仍然很奇怪,但原因可能是内核打印级别错误或其他任何原因。当我的 CPU 封装温度约为 +52.0°C 且 CPU 频率仅为 1200MHz 时,我也会检索这些消息。这根本没有任何意义。
在使用压力测试测试我的笔记本电脑时,我可以看到错误消息,但实际上 CPU 并没有受到限制。如果我只测试 1 个核心,我会得到 3,900MHz 的全涡轮增压速度。测试所有 4 个内核将最大频率降低到 ~3,300MHz。这是预期的行为。
所以我会把这个问题放在一边——除非这里有人可以提供更多的内部信息。
更新 #2
更新系统后无变化:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04 LTS
Release: 18.04
Codename: bionic
Linux4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:16:15 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Run Code Online (Sandbox Code Playgroud)
更新 #3
更新系统后无变化:
Distributor ID: Ubuntu
Description: Ubuntu 18.10
Release: 18.10
Codename: cosmic
Linux x1 4.18.0-13-generic #14-Ubuntu SMP Wed Dec 5 09:04:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Run Code Online (Sandbox Code Playgroud)
所以我的最终假设是日志很可能是伪造的,或者日志级别配置错误。因为我的笔记本电脑工作正常,也没有节流,也没有过热。
尽管如此,如果有人有预感如何解决这个问题,请随时回答:-)
更新 #4
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 19.04
Release: 19.04
Codename: disco
Linux cw-x1 5.0.0-13-generic #14-Ubuntu SMP Mon Apr 15 14:59:14 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Run Code Online (Sandbox Code Playgroud)
我仍然收到阈值限制消息。
更新 #5
全新 19.10 安装的结果相同:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 19.10
Release: 19.10
Codename: eoan
Run Code Online (Sandbox Code Playgroud)
我刚刚发现一个帖子说这个问题甚至没有用全新的X1 Extreme 2nd Gen 解决。
为了那些正在考虑购买我的笔记本的人,这里有一些我面临的问题:
所以,我想我现在会投降并转向惠普或戴尔。对于大约 2500 欧元的笔记本电脑,我真的不想遇到这些问题:-(
更新 #6
有趣的事实:我昨天刚收到我的戴尔 Precision 5540 和英特尔酷睿 i9-9980HK ......你猜怎么着......
11.12.19 22:11 kernel mce: CPU9: Package temperature above threshold, cpu clock throttled (total events = 412597)
11.12.19 22:11 kernel mce: CPU5: Package temperature above threshold, cpu clock throttled (total events = 412165)
11.12.19 22:11 kernel mce: CPU13: Package temperature above threshold, cpu clock throttled (total events = 412647)
11.12.19 22:11 kernel mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 412648)
11.12.19 22:11 kernel mce: CPU15: Package temperature above threshold, cpu clock throttled (total events = 412378)
11.12.19 22:11 kernel mce: CPU7: Package temperature above threshold, cpu clock throttled (total events = 412669)
11.12.19 22:11 kernel mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 412669)
11.12.19 22:11 kernel mce: CPU8: Package temperature above threshold, cpu clock throttled (total events = 412625)
11.12.19 22:11 kernel mce: CPU11: Package temperature above threshold, cpu clock throttled (total events = 412668)
11.12.19 22:11 kernel mce: CPU4: Package temperature above threshold, cpu clock throttled (total events = 412102)
11.12.19 22:11 kernel mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 412669)
11.12.19 22:11 kernel mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 412669)
11.12.19 22:11 kernel mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 412208)
11.12.19 22:11 kernel mce: CPU14: Package temperature above threshold, cpu clock throttled (total events = 412661)
11.12.19 22:11 kernel mce: CPU12: Package temperature above threshold, cpu clock throttled (total events = 411001)
11.12.19 22:11 kernel mce: CPU10: Package temperature above threshold, cpu clock throttled (total events = 412663)
11.12.19 22:11 kernel mce: CPU9: Package temperature/speed normal
11.12.19 22:11 kernel mce: CPU5: Package temperature/speed normal
11.12.19 22:11 kernel mce: CPU2: Package temperature/speed normal
11.12.19 22:11 kernel mce: CPU15: Package temperature/speed normal
11.12.19 22:11 kernel mce: CPU1: Package temperature/speed normal
11.12.19 22:11 kernel mce: CPU10: Package temperature/speed normal
11.12.19 22:11 kernel mce: CPU7: Package temperature/speed normal
11.12.19 22:11 kernel mce: CPU13: Package temperature/speed normal
11.12.19 22:11 kernel mce: CPU8: Package temperature/speed normal
11.12.19 22:11 kernel mce: CPU11: Package temperature/speed normal
11.12.19 22:11 kernel mce: CPU0: Package temperature/speed normal
11.12.19 22:11 kernel mce: CPU4: Package temperature/speed normal
11.12.19 22:11 kernel mce: CPU3: Package temperature/speed normal
11.12.19 22:11 kernel mce: CPU12: Package temperature/speed normal
11.12.19 22:11 kernel mce: CPU14: Package temperature/speed normal
11.12.19 22:11 kernel mce: CPU6: Package temperature/speed normal
Run Code Online (Sandbox Code Playgroud)
我现在既无语又无精打采。我想我不会再检查我的日志了 :-( 案例已关闭。
有一个错误报告针对thermald:
负载下 CPU 频率控制的不稳定行为
人们会遇到与您报告的相同的错误:
Oct 14 22:30:59 p5520 kernel: [ 9481.033687] CPU3: Package temperature above threshold, cpu clock throttled (total events = 5845)
Oct 14 22:30:59 p5520 kernel: [ 9481.033688] CPU7: Package temperature above threshold, cpu clock throttled (total events = 5845)
Oct 14 22:30:59 p5520 kernel: [ 9481.033718] CPU1: Package temperature above threshold, cpu clock throttled (total events = 5845)
Oct 14 22:30:59 p5520 kernel: [ 9481.033719] CPU5: Package temperature above threshold, cpu clock throttled (total events = 5845)
Oct 14 22:30:59 p5520 kernel: [ 9481.033720] CPU0: Package temperature above threshold, cpu clock throttled (total events = 5845)
Oct 14 22:30:59 p5520 kernel: [ 9481.033720] CPU4: Package temperature above threshold, cpu clock throttled (total events = 5845)
Oct 14 22:30:59 p5520 kernel: [ 9481.033722] CPU6: Package temperature above threshold, cpu clock throttled (total events = 5845)
Oct 14 22:30:59 p5520 kernel: [ 9481.033722] CPU2: Package temperature above threshold, cpu clock throttled (total events = 5845)
Oct 14 22:30:59 p5520 kernel: [ 9481.034709] CPU3: Package temperature/speed normal
Oct 14 22:30:59 p5520 kernel: [ 9481.034710] CPU0: Package temperature/speed normal
Oct 14 22:30:59 p5520 kernel: [ 9481.034711] CPU4: Package temperature/speed normal
Oct 14 22:30:59 p5520 kernel: [ 9481.034711] CPU7: Package temperature/speed normal
Oct 14 22:30:59 p5520 kernel: [ 9481.034738] CPU2: Package temperature/speed normal
Oct 14 22:30:59 p5520 kernel: [ 9481.034738] CPU6: Package temperature/speed normal
Oct 14 22:30:59 p5520 kernel: [ 9481.034739] CPU1: Package temperature/speed normal
Oct 14 22:30:59 p5520 kernel: [ 9481.034740] CPU5: Package temperature/speed normal
Run Code Online (Sandbox Code Playgroud)
评论 #18 说:
尝试从命令行在窗口中运行 Thermald。
Run Code Online (Sandbox Code Playgroud)systemctl stop thermald #thermald --no-daemon --loglevel=info然后执行触发此操作的操作,并附加上述命令的输出。
如果您认为错误报告适合您的情况,您可以订阅电子邮件通知。
在我的机器上,我无法复制thermald和tlp加载问题。我打开了五个终端并在每个终端中输入:
while true ; do : ; done
Run Code Online (Sandbox Code Playgroud)
所发生的只是五个核心以 100% 和 3100 MHz 运行。没有发生节流,但确实使两个笔记本电脑风扇低速运行。通常在 Linux 中它们是关闭的(或者至少我听不到它们)。系统温度为 88 摄氏度,键盘仍能响应输入此答案。客厅确实感觉温暖了一点……
@WinEunuuchs2Unix
\n\n我创建了一个答案,以便讨论您从 erpalma 提出的节流工具。
\n\n我已经使用过这个工具了一些 - 包括几个系统冻结:)\n不幸的是,我还找不到适合我的系统的正确配置。\n但是这个工具肯定对我的系统有影响。我第一次看到一些真正改变节流/温度行为的东西。
\n\n如果没有这个工具,我的 CPU 温度永远不会高于 85\xc2\xb0C。安装该工具并启动服务后,我可以看到压力下温度升至 98\xc2\xb0C,核心频率约为 3300MHz(而不是 3187MHz)。
\n\nerpalma 推荐工具 s-tui,我也绝对可以推荐它。
\n\n\n\n我将监视我的系统几天,然后会报告。\n现在谢谢!
\n