在 Ubuntu/Thinkpad 上设置 CPU 温度节流阈值

Chr*_*ill 25 fan overheating power-management temperature intel-cpu

我每小时收到几次以下错误消息:

08.03.18 21:27  kernel  CPU0: Core temperature above threshold, cpu clock throttled (total events = 2234)
08.03.18 21:27  kernel  CPU2: Core temperature above threshold, cpu clock throttled (total events = 2234)
08.03.18 21:27  kernel  CPU1: Package temperature above threshold, cpu clock throttled (total events = 2695)
08.03.18 21:27  kernel  CPU3: Package temperature above threshold, cpu clock throttled (total events = 2695)
08.03.18 21:27  kernel  CPU2: Package temperature above threshold, cpu clock throttled (total events = 2695)
08.03.18 21:27  kernel  CPU0: Package temperature above threshold, cpu clock throttled (total events = 2695)
08.03.18 21:27  kernel  CPU2: Core temperature/speed normal
08.03.18 21:27  kernel  CPU0: Core temperature/speed normal
08.03.18 21:27  kernel  CPU3: Package temperature/speed normal
08.03.18 21:27  kernel  CPU1: Package temperature/speed normal
08.03.18 21:27  kernel  CPU0: Package temperature/speed normal
08.03.18 21:27  kernel  CPU2: Package temperature/speed normal
Run Code Online (Sandbox Code Playgroud)

硬件规格:

ThinkPad X1 Yoga 2nd
N1NET33W (1.20 )
Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
Production date 2017.11
Run Code Online (Sandbox Code Playgroud)

软件:

Distributor ID: Ubuntu
Description:    Ubuntu 17.10
Release:        17.10
Codename:       artful
Linux 4.13.0-36-generic #40-Ubuntu SMP Fri Feb 16 20:07:48 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Run Code Online (Sandbox Code Playgroud)

生物:

我在 BIOS 设置中将电池交流电都设置为性能,BIOS 是最新的。

问题是什么

问题是 CPU 温度的阈值太早达到了,它发生在 75°C 左右,即使允许CPU达到 100°C。温度永远不会超过 85°C。所以 CPU 的功率会被限制得如此之快。

我不知道我的制造商硬件问题是否与导热膏不足有关,或者是否与软件有关。在我将它发送给联想之前,我想确保它不是一个自制的问题。

统计数据

当我进行压力测试时

stress -c 4 -t 300
Run Code Online (Sandbox Code Playgroud)

错误消息几乎立即发生。

i7z 给出以下输出:

Cpu speed from cpuinfo 2903.00Mhz
cpuinfo might be wrong if cpufreq is enabled. To guess correctly try estimating via tsc
Linux's inbuilt cpu_khz code emulated now
True Frequency (without accounting Turbo) 2903 MHz
  CPU Multiplier 29x || Bus clock frequency (BCLK) 100.10 MHz

Socket [0] - [physical cores=2, logical cores=4, max online cores ever=2]
  TURBO ENABLED on 2 Cores, Hyper Threading ON
  Max Frequency without considering Turbo 3003.10 MHz (100.10 x [30])
  Max TURBO Multiplier (if Enabled) with 1/2/3/4 Cores is  39x/39x/39x/39x
  Real Current Frequency 3187.97 MHz [100.10 x 31.85] (Max of below)
        Core [core-id]  :Actual Freq (Mult.)      C0%   Halt(C1)%  C3 %   C6 %  Temp      VCore
        Core 1 [0]:       3187.97 (31.85x)      99.9       0       0       0    85      1.0037
        Core 2 [1]:       3187.97 (31.85x)      99.9       0       0       0    84      1.0037           


C0 = Processor running without halting
C1 = Processor running with halts (States >C0 are power saver modes with cores idling)
C3 = Cores running with PLL turned off and core cache turned off
C6, C7 = Everything in C3 + core state saved to last level cache, C7 is deeper than C6
  Above values in table are in percentage over the last 1 sec
[core-id] refers to core-id number in /proc/cpuinfo
'Garbage Values' message printed when garbage values are read
  Ctrl+C to exit
Run Code Online (Sandbox Code Playgroud)

如前所述,它永远不会超过 85 Temp,但 CPU 会受到限制。

传感器显示以下输出

iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +30.0°C  

pch_skylake-virtual-0
Adapter: Virtual device
temp1:        +52.0°C  

acpitz-virtual-0
Adapter: Virtual device
temp1:        +56.0°C  (crit = +98.0°C)

thinkpad-isa-0000
Adapter: ISA adapter
fan1:        5859 RPM

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +59.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +59.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +58.0°C  (high = +100.0°C, crit = +100.0°C)
Run Code Online (Sandbox Code Playgroud)

但是高设置可能没有效果。

热敏电阻

所以我当时就玩弄了 Thermald。

这是我在这里找到的调整后的配置:

<?xml version="1.0" encoding="UTF-8"?>
<ThermalConfiguration>
   <Platform>
      <Name>Use Fan control first then CPU throttle</Name>
      <ProductName>*</ProductName>
      <Preference>QUIET</Preference>
      <ThermalZones>
         <ThermalZone>
            <Type>x86_pkg_temp</Type>
            <TripPoints>
               <TripPoint>
                  <SensorType>x86_pkg_temp</SensorType>
                  <Temperature>90000</Temperature>
                  <type>passive</type>
                  <ControlType>SEQUENTIAL</ControlType>
                  <CoolingDevice>
                     <type>_fan_</type>
                  </CoolingDevice>
               </TripPoint>
            </TripPoints>
         </ThermalZone>
      </ThermalZones>
      <CoolingDevices>
         <CoolingDevice>
            <Type>_fan_</Type>
            <Path>/sys/bus/platform/devices/thinkpad_hwmon/pwm1</Path>
            <MinState>100</MinState>
            <MaxState>255</MaxState>
            <IncDecStep>50</IncDecStep>
            <DebouncePeriod>10</DebouncePeriod>
         </CoolingDevice>
      </CoolingDevices>
   </Platform>
</ThermalConfiguration>
Run Code Online (Sandbox Code Playgroud)

这没有真正的区别,但我至少可以看到阈值设置(最后几行):

?? sudo thermald --no-daemon --loglevel=info

NO RAPL sysfs present 
22 CPUID levels; family:model:stepping 0x6:8e:9 (6:142:9)
Running on a vanilla kernel
Polling mode is enabled: 4
sensor_update: type x86_pkg_temp
sensor_update: type pch_skylake
sensor_update: type iwlwifi
sensor_update: type acpitz
thd_read_default_thermal_sensors loaded 4 sensors 
dts /sys/devices/platform/coretemp.0/name doesn't exist
dts /sys/class/hwmon/hwmon3/name doesn't exist
failed to open /dev/acpi_thermal_rel 
failed to open /dev/acpi_thermal_rel 
TRT/ART read failed
 Dumping parsed XML Data
 *** Index 0 ***
Name: UseFancontrolfirstthenCPUthrottle
UUID: 
type: 0
        Zone 0 
         Name: x86_pkg_temp
                 Trip Point 0 
                  temp 90000 
                  trip type 2 
                  hyst id 0 
                  sensor type x86_pkg_temp 
                  cdev index 0 
                          type _fan_ 
                          influence 0 
                          SamplingPeriod 0 
        Cooling Dev 0 
                Type: _fan_
                Path: /sys/bus/platform/devices/thinkpad_hwmon/pwm1
                Min: 100
                Max: 255
                Step: 50
                AutoDownControl: 0
Product Name matched [wildcard]
sensor index:3 x86_pkg_temp /sys/class/thermal/thermal_zone3/ Async:1 
sensor index:1 pch_skylake /sys/class/thermal/thermal_zone1/ Async:0 
sensor index:2 iwlwifi /sys/class/thermal/thermal_zone2/ Async:0 
sensor index:0 acpitz /sys/class/thermal/thermal_zone0/ Async:0 
sensor index:4 hwmon /sys/class/hwmon/hwmon1/temp1_input Async:0 
sensor index:5 hwmon /sys/class/hwmon/hwmon1/temp2_input Async:0 
sensor index:6 hwmon /sys/class/hwmon/hwmon1/temp3_input Async:0 
thd_read_default_cooling devices loaded 4 cdevs 
powercap RAPL no long term time window
Use Default pstate drv settings
Product Name matched [wildcard]
3: Processor, C:0 MN: 0 MX:10 ST:1 pt:/sys/class/thermal/ rd_bk 0 
1: Processor, C:0 MN: 0 MX:10 ST:1 pt:/sys/class/thermal/ rd_bk 0 
2: Processor, C:0 MN: 0 MX:10 ST:1 pt:/sys/class/thermal/ rd_bk 0 
0: Processor, C:0 MN: 0 MX:10 ST:1 pt:/sys/class/thermal/ rd_bk 0 
4: intel_pstate, C:0 MN: 0 MX:10 ST:1 pt:/sys/devices/system/cpu/intel_pstate/ rd_bk 1 
5: _fan_, C:255 MN: 100 MX:255 ST:50 pt:/sys/bus/platform/devices/thinkpad_hwmon/pwm1 rd_bk 1 
6: LCD, C:0 MN: 0 MX:1060 ST:106 pt:/sys/class/backlight/intel_backlight/ rd_bk 1 
Sorted trip dump zone index:1 type:pch_skylake:
index 0: type:critical temp:115000 hyst:1 zone id:1 sensor id:1 cdev size:0
trip type: 0 temp: 115000 
Sorted trip dump zone index:0 type:acpitz:
index 0: type:critical temp:98000 hyst:1 zone id:0 sensor id:0 cdev size:0
trip type: 0 temp: 98000 
thd_read_default_thermal_zones loaded 2 zones 
zone cpu will be created 
dts zone /sys/devices/platform/coretemp.0/name doesn't exist
/sys/class/hwmon/hwmon4/name->iwlwifi
/sys/class/hwmon/hwmon2/name->pch_skylake
/sys/class/hwmon/hwmon0/name->acpitz
dts zone /sys/class/hwmon/hwmon3/name doesn't exist
/sys/class/hwmon/hwmon1/name->coretemp
Buggy max temp: to close to critical 90000
Core temp DTS :critical 100000, max 90000, psv 95000
node type: Element, name: CoolingDevice value: rapl_controller
node type: Element, name: CoolingDevice value: intel_pstate
node type: Element, name: CoolingDevice value: intel_powerclamp
node type: Element, name: CoolingDevice value: cpufreq
node type: Element, name: CoolingDevice value: Processor
CDEVS order specified in thermal-cpu-cdev-order.xml
Sorted trip dump zone index:4 type:cpu:
index 0: type:passive temp:95000 hyst:0 zone id:4 sensor id:65535 cdev size:2
cdev[0] intel_pstate
cdev[1] Processor
trip type: 2 temp: 95000 
Product Name matched [wildcard]
zone x86_pkg_temp bounded 
Sorted trip dump zone index:5 type:x86_pkg_temp:
index 0: type:passive temp:90000 hyst:0 zone id:5 sensor id:3 cdev size:1
cdev[0] _fan_
trip type: 2 temp: 90000 
Zone 1: pch_skylake, Active:0 Bind:0 Sensor_cnt:1
..sensors.. 
sensor index:1 pch_skylake /sys/class/thermal/thermal_zone1/ Async:0 
..trips.. 
index 0: type:critical temp:115000 hyst:1 zone id:1 sensor id:1 cdev size:0
Zone 0: acpitz, Active:0 Bind:0 Sensor_cnt:1
..sensors.. 
sensor index:0 acpitz /sys/class/thermal/thermal_zone0/ Async:0 
..trips.. 
index 0: type:critical temp:98000 hyst:1 zone id:0 sensor id:0 cdev size:0
Zone 4: cpu, Active:1 Bind:0 Sensor_cnt:1
..sensors.. 
sensor index:3 x86_pkg_temp /sys/class/thermal/thermal_zone3/ Async:1 
..trips.. 
index 0: type:passive temp:95000 hyst:0 zone id:4 sensor id:65535 cdev size:2
cdev[0] intel_pstate
cdev[1] Processor
index 1: type:polling temp:90000 hyst:0 zone id:4 sensor id:3 cdev size:0
Zone 5: x86_pkg_temp, Active:1 Bind:1 Sensor_cnt:1
..sensors.. 
sensor index:3 x86_pkg_temp /sys/class/thermal/thermal_zone3/ Async:1 
..trips.. 
index 0: type:passive temp:90000 hyst:0 zone id:5 sensor id:3 cdev size:1
cdev[0] _fan_
index 1: type:polling temp:85000 hyst:0 zone id:5 sensor id:3 cdev size:0
FD = 7
Current user preference is 0
thd_engine_thread begin
Set : threshold:90000, temperature:53000, cdev:5(_fan_), curr_state:205, max_state:255
Set : threshold:90000, temperature:57000, cdev:5(_fan_), curr_state:155, max_state:255
Set : threshold:90000, temperature:85000, cdev:5(_fan_), curr_state:105, max_state:255
Set : threshold:90000, temperature:85000, cdev:5(_fan_), curr_state:100, max_state:255
Run Code Online (Sandbox Code Playgroud)

最后的想法/问题

  • 是否可以设置CPU温度阈值?
  • 它是要设置还是与 BIOS/硬件有关?
  • 我的硬件(导热膏)可能有缺陷吗?
  • 或者我可能分析了完全错误的东西?

更新 #1

在深入研究这个话题并阅读了几篇关于英特尔 CPU 节流的文章和其他在其他操作系统和内核上面临相同(或只是略有不同)问题的帖子后,我得出的结论是,我的笔记本电脑可能不会像我想象的那样行为不端.

尽管内核消息仍然很奇怪,但原因可能是内核打印级别错误或其他任何原因。当我的 CPU 封装温度约为 +52.0°C 且 CPU 频率仅为 1200MHz 时,我也会检索这些消息。这根本没有任何意义。

在使用压力测试测试我的笔记本电脑时,我可以看到错误消息,但实际上 CPU 并没有受到限制。如果我只测试 1 个核心,我会得到 3,900MHz 的全涡轮增压速度。测试所有 4 个内核将最大频率降低到 ~3,300MHz。这是预期的行为

所以我会把这个问题放在一边——除非这里有人可以提供更多的内部信息。

更新 #2

更新系统后无变化:

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04 LTS
Release:        18.04
Codename:       bionic
Linux4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:16:15 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Run Code Online (Sandbox Code Playgroud)

更新 #3

更新系统后无变化:

Distributor ID: Ubuntu
Description:    Ubuntu 18.10
Release:        18.10
Codename:       cosmic
Linux x1 4.18.0-13-generic #14-Ubuntu SMP Wed Dec 5 09:04:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Run Code Online (Sandbox Code Playgroud)

所以我的最终假设是日志很可能是伪造的,或者日志级别配置错误。因为我的笔记本电脑工作正常,也没有节流,也没有过热。

尽管如此,如果有人有预感如何解决这个问题,请随时回答:-)

更新 #4

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 19.04
Release:        19.04
Codename:       disco

Linux cw-x1 5.0.0-13-generic #14-Ubuntu SMP Mon Apr 15 14:59:14 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Run Code Online (Sandbox Code Playgroud)

我仍然收到阈值限制消息。

更新 #5

全新 19.10 安装的结果相同:

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 19.10
Release:        19.10
Codename:       eoan
Run Code Online (Sandbox Code Playgroud)

我刚刚发现一个帖子说这个问题甚至没有用全新的X1 Extreme 2nd Gen 解决

为了那些正在考虑购买我的笔记本的人,这里有一些我面临的问题:

  • 触摸屏无法正常工作
  • 指纹无效
  • Hibernate 只是偶尔成功
  • 由于图形问题,在不同的工作环境(办公室和家庭办公室)之间切换通常无法正常工作
  • 一般来说,多台显示器有很多问题
  • 即使有大量空闲 RAM 可用,也会发生硬盘驱动器交换
  • kscreen 和 xrandr 的一般和各种问题

所以,我想我现在会投降并转向惠普或戴尔。对于大约 2500 欧元的笔记本电脑,我真的不想遇到这些问题:-(

更新 #6

有趣的事实:我昨天刚收到我的戴尔 Precision 5540 和英特尔酷睿 i9-9980HK ......你猜怎么着......

11.12.19 22:11  kernel  mce: CPU9: Package temperature above threshold, cpu clock throttled (total events = 412597)
11.12.19 22:11  kernel  mce: CPU5: Package temperature above threshold, cpu clock throttled (total events = 412165)
11.12.19 22:11  kernel  mce: CPU13: Package temperature above threshold, cpu clock throttled (total events = 412647)
11.12.19 22:11  kernel  mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 412648)
11.12.19 22:11  kernel  mce: CPU15: Package temperature above threshold, cpu clock throttled (total events = 412378)
11.12.19 22:11  kernel  mce: CPU7: Package temperature above threshold, cpu clock throttled (total events = 412669)
11.12.19 22:11  kernel  mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 412669)
11.12.19 22:11  kernel  mce: CPU8: Package temperature above threshold, cpu clock throttled (total events = 412625)
11.12.19 22:11  kernel  mce: CPU11: Package temperature above threshold, cpu clock throttled (total events = 412668)
11.12.19 22:11  kernel  mce: CPU4: Package temperature above threshold, cpu clock throttled (total events = 412102)
11.12.19 22:11  kernel  mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 412669)
11.12.19 22:11  kernel  mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 412669)
11.12.19 22:11  kernel  mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 412208)
11.12.19 22:11  kernel  mce: CPU14: Package temperature above threshold, cpu clock throttled (total events = 412661)
11.12.19 22:11  kernel  mce: CPU12: Package temperature above threshold, cpu clock throttled (total events = 411001)
11.12.19 22:11  kernel  mce: CPU10: Package temperature above threshold, cpu clock throttled (total events = 412663)
11.12.19 22:11  kernel  mce: CPU9: Package temperature/speed normal
11.12.19 22:11  kernel  mce: CPU5: Package temperature/speed normal
11.12.19 22:11  kernel  mce: CPU2: Package temperature/speed normal
11.12.19 22:11  kernel  mce: CPU15: Package temperature/speed normal
11.12.19 22:11  kernel  mce: CPU1: Package temperature/speed normal
11.12.19 22:11  kernel  mce: CPU10: Package temperature/speed normal
11.12.19 22:11  kernel  mce: CPU7: Package temperature/speed normal
11.12.19 22:11  kernel  mce: CPU13: Package temperature/speed normal
11.12.19 22:11  kernel  mce: CPU8: Package temperature/speed normal
11.12.19 22:11  kernel  mce: CPU11: Package temperature/speed normal
11.12.19 22:11  kernel  mce: CPU0: Package temperature/speed normal
11.12.19 22:11  kernel  mce: CPU4: Package temperature/speed normal
11.12.19 22:11  kernel  mce: CPU3: Package temperature/speed normal
11.12.19 22:11  kernel  mce: CPU12: Package temperature/speed normal
11.12.19 22:11  kernel  mce: CPU14: Package temperature/speed normal
11.12.19 22:11  kernel  mce: CPU6: Package temperature/speed normal

Run Code Online (Sandbox Code Playgroud)

我现在既无语又无精打采。我想我不会再检查我的日志了 :-( 案例已关闭。

Win*_*nix 2

有一个错误报告针对thermald

负载下 CPU 频率控制的不稳定行为

人们会遇到与您报告的相同的错误:

Oct 14 22:30:59 p5520 kernel: [ 9481.033687] CPU3: Package temperature above threshold, cpu clock throttled (total events = 5845)
Oct 14 22:30:59 p5520 kernel: [ 9481.033688] CPU7: Package temperature above threshold, cpu clock throttled (total events = 5845)
Oct 14 22:30:59 p5520 kernel: [ 9481.033718] CPU1: Package temperature above threshold, cpu clock throttled (total events = 5845)
Oct 14 22:30:59 p5520 kernel: [ 9481.033719] CPU5: Package temperature above threshold, cpu clock throttled (total events = 5845)
Oct 14 22:30:59 p5520 kernel: [ 9481.033720] CPU0: Package temperature above threshold, cpu clock throttled (total events = 5845)
Oct 14 22:30:59 p5520 kernel: [ 9481.033720] CPU4: Package temperature above threshold, cpu clock throttled (total events = 5845)
Oct 14 22:30:59 p5520 kernel: [ 9481.033722] CPU6: Package temperature above threshold, cpu clock throttled (total events = 5845)
Oct 14 22:30:59 p5520 kernel: [ 9481.033722] CPU2: Package temperature above threshold, cpu clock throttled (total events = 5845)
Oct 14 22:30:59 p5520 kernel: [ 9481.034709] CPU3: Package temperature/speed normal
Oct 14 22:30:59 p5520 kernel: [ 9481.034710] CPU0: Package temperature/speed normal
Oct 14 22:30:59 p5520 kernel: [ 9481.034711] CPU4: Package temperature/speed normal
Oct 14 22:30:59 p5520 kernel: [ 9481.034711] CPU7: Package temperature/speed normal
Oct 14 22:30:59 p5520 kernel: [ 9481.034738] CPU2: Package temperature/speed normal
Oct 14 22:30:59 p5520 kernel: [ 9481.034738] CPU6: Package temperature/speed normal
Oct 14 22:30:59 p5520 kernel: [ 9481.034739] CPU1: Package temperature/speed normal
Oct 14 22:30:59 p5520 kernel: [ 9481.034740] CPU5: Package temperature/speed normal
Run Code Online (Sandbox Code Playgroud)

评论 #18 说:

尝试从命令行在窗口中运行 Thermald。

systemctl stop thermald
#thermald --no-daemon --loglevel=info
Run Code Online (Sandbox Code Playgroud)

然后执行触发此操作的操作,并附加上述命令的输出。

如果您认为错误报告适合您的情况,您可以订阅电子邮件通知。


在我的机器上,我无法复制thermaldtlp加载问题。我打开了五个终端并在每个终端中输入:

while true ; do : ; done
Run Code Online (Sandbox Code Playgroud)

所发生的只是五个核心以 100% 和 3100 MHz 运行。没有发生节流,但确实使两个笔记本电脑风扇低速运行。通常在 Linux 中它们是关闭的(或者至少我听不到它们)。系统温度为 88 摄氏度,键盘仍能响应输入此答案。客厅确实感觉温暖了一点……


Chr*_*ill 2

@WinEunuuchs2Unix

\n\n

我创建了一个答案,以便讨​​论您从 erpalma 提出的节流工具

\n\n

我已经使用过这个工具了一些 - 包括几个系统冻结:)\n不幸的是,我还找不到适合我的系统的正确配置。\n但是这个工具肯定对我的系统有影响。我第一次看到一些真正改变节流/温度行为的东西。

\n\n

如果没有这个工具,我的 CPU 温度永远不会高于 85\xc2\xb0C。安装该工具并启动服务后,我可以看到压力下温度升至 98\xc2\xb0C,核心频率约为 3300MHz(而不是 3187MHz)。

\n\n

erpalma 推荐工具 s-tui,我也绝对可以推荐它。

\n\n

在此输入图像描述

\n\n
    \n
  1. 在安装上述节流服务之前
  2. \n
  3. 激活服务后
  4. \n
\n\n

我将监视我的系统几天,然后会报告。\n现在谢谢!

\n