Ubuntu 18.04 LTS 每次使用 AMD GPU 都挂掉

Aji*_*man 8 18.04

我最近在我的笔记本电脑上安装了 18.04 LTS ubuntu。我每天都面临这个问题。我的笔记本电脑在使用几个小时后就挂断了,没有任何工作,甚至鼠标和键盘都没有。我已经运行dist-upgrade并安装了图形驱动程序,没有任何效果。

需要帮忙

编辑

正如@ElderGeek 所建议的那样。我已经安装了lm-sensors. 我见过温度在 43 到 48 摄氏度之间。

另外这里是我的系统信息:

ajit-soman@ajitsoman-X542BA:~$ sudo lshw -short
[sudo] password for ajit-soman: 
H/W path      Device      Class       Description
=================================================
                          system      X542BA
/0                        bus         X542BA
/0/0                      memory      64KiB BIOS
/0/4                      memory      160KiB L1 cache
/0/5                      memory      1MiB L2 cache
/0/28                     memory      8GiB System Memory
/0/28/0                   memory      4GiB SODIMM DDR4 Synchronous Unbuffered (U
/0/28/1                   memory      4GiB SODIMM DDR4 Synchronous Unbuffered (U
/0/30                     processor   AMD A9-9420 RADEON R5, 5 COMPUTE CORES 2C+
/0/100                    bridge      Family 15h (Models 60h-6fh) Processor Root
/0/100/0.2                generic     Family 15h (Models 60h-6fh) I/O Memory Man
/0/100/1                  display     Stoney [Radeon R2/R3/R4/R5 Graphics]
/0/100/1.1                multimedia  Advanced Micro Devices, Inc. [AMD/ATI]
/0/100/2.2                bridge      Family 15h (Models 60h-6fh) Processor Root
/0/100/2.2/0  wlp1s0      network     QCA9565 / AR9565 Wireless Network Adapter
/0/100/2.3                bridge      Family 15h (Models 60h-6fh) Processor Root
/0/100/2.3/0  enp2s0      network     RTL8111/8168/8411 PCI Express Gigabit Ethe
/0/100/2.4                bridge      Family 15h (Models 60h-6fh) Processor Root
/0/100/2.4/0              storage     ASM1062 Serial ATA Controller
/0/100/8                  generic     Advanced Micro Devices, Inc. [AMD]
/0/100/9.2                multimedia  Family 15h (Models 60h-6fh) Audio Controll
/0/100/10                 bus         FCH USB XHCI Controller
/0/100/11                 storage     FCH SATA Controller [AHCI mode]
/0/100/12                 bus         FCH USB EHCI Controller
/0/100/14                 bus         FCH SMBus Controller
/0/100/14.3               bridge      FCH LPC Bridge
/0/100/14.7               generic     FCH SD Flash Controller
/0/101                    bridge      Family 15h (Models 60h-6fh) Host Bridge
/0/102                    bridge      Family 15h (Models 60h-6fh) Host Bridge
/0/103                    bridge      Advanced Micro Devices, Inc. [AMD]
/0/104                    bridge      Advanced Micro Devices, Inc. [AMD]
/0/105                    bridge      Advanced Micro Devices, Inc. [AMD]
/0/106                    bridge      Advanced Micro Devices, Inc. [AMD]
/0/107                    bridge      Advanced Micro Devices, Inc. [AMD]
/0/108                    bridge      Advanced Micro Devices, Inc. [AMD]
/0/109                    bridge      Advanced Micro Devices, Inc. [AMD]
/0/1          scsi0       storage     
/0/1/0.0.0    /dev/sda    disk        1TB ST1000LM035-1RK1
/0/1/0.0.0/1              volume      511MiB Windows FAT volume
/0/1/0.0.0/2  /dev/sda2   volume      931GiB EXT4 volume
/0/2          scsi1       storage     
/0/2/0.0.0    /dev/cdrom  disk        DVDRAM GUE1N
ajit-soman@ajitsoman-X542BA:~$ 
Run Code Online (Sandbox Code Playgroud)

这是uname -a输出

ajit-soman@ajitsoman-X542BA:~$ uname -a
Linux ajitsoman-X542BA 4.15.0-22-generic #24-Ubuntu SMP Wed May 16 12:15:17 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
ajit-soman@ajitsoman-X542BA:~$ 
Run Code Online (Sandbox Code Playgroud)

编辑

正如@WinEunuuchs2Unix 所建议的那样。我跑了journalctl -b-1,发现了这些红色线条。我已经一一复制粘贴在下面:

Jun 12 22:10:23 ajitsoman-X542BA kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen Jun 12 22:10:23 ajitsoman-X542BA kernel: ata2: ACPI event

Jun 12 22:22:47 ajitsoman-X542BA kernel: ACPI Error: [^^^PB2_.VGA_.AFN7] Namespace lookup failure, AE_NOT_FOUND (20170831/psargs-364)

Jun 12 22:22:47 ajitsoman-X542BA kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.VGA.LCDD._BCM, AE_NOT_FOUND (20170831/psparse-550
Jun 12 22:22:47 ajitsoman-X542BA kernel: ACPI Error: Evaluating _BCM failed (20170831/video-364)

Jun 12 22:22:47 ajitsoman-X542BA kernel: [drm:hwss_wait_for_blank_complete [amdgpu]] *ERROR* DC: failed to blank crtc!


Jun 12 22:23:09 ajitsoman-X542BA bluetoothd[781]: Failed to set mode: Blocked through rfkill (0x12)


Jun 12 23:39:54 ajitsoman-X542BA kernel: [Firmware Bug]: cpu 0, invalid threshold interrupt offset 1 for bank 4, block 0 (MSR00000413=0xd00000


Jun 12 23:39:54 ajitsoman-X542BA rtkit-daemon[973]: The canary thread is apparently starving. Taking action.
Jun 12 23:39:54 ajitsoman-X542BA kernel: ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
Jun 12 23:39:54 ajitsoman-X542BA kernel: ata2.00: ACPI event
Jun 12 23:39:54 ajitsoman-X542BA kernel: ata2.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 10 pio 16392 in
                                                  Get event status notification 4a 01 00 00 10 00 00 00 08 00res 50/00:03:00:00:00/00:00:00:00
Jun 12 23:39:54 ajitsoman-X542BA kernel: ata2.00: status: { DRDY }
Jun 12 23:39:54 ajitsoman-X542BA kernel: ata2: hard resetting link


Jun 13 00:01:53 ajitsoman-X542BA gdm3[840]: GLib: g_variant_new_string: assertion 'string != NULL' failed

Jun 13 00:01:53 ajitsoman-X542BA gdm3[840]: GLib: g_hash_table_find: assertion 'version == hash_table->version' failed
Run Code Online (Sandbox Code Playgroud)

Win*_*nix 3

2018 年 6 月 14 日更新

\n

根据此 ArchLinux论坛帖子,您似乎需要添加:

\n
amdgpu.dc=0\n
Run Code Online (Sandbox Code Playgroud)\n

到你的/etc/default/grubLINUX 行之后quiet splash。然后跑sudo update-grub

\n
\n

作为 Ubuntu 18.04 的新安装者,您是幸运的人之一,可以用来journalctl查看上次启动(已锁定)。使用:

\n
journalctl -b-1\n
Run Code Online (Sandbox Code Playgroud)\n

然后按End键跳转到 EOF(文件结束)。在我成功的最后一次启动中,它说:

\n
Jun 10 16:18:51 alien systemd[1]: Unmounting /mnt/d...\nJun 10 16:18:51 alien systemd[1]: Unmounted /run/user/1000.\nJun 10 16:18:51 alien systemd[1]: Unmounted /media/rick/Ubuntu 18.04 LTS amd64.\nJun 10 16:18:51 alien systemd[1]: Unmounted /boot/efi.\nJun 10 16:18:51 alien ntfs-3g[648]: Unmounting /dev/nvme0n1p8 (Shared_WSL+Linux)\nJun 10 16:18:51 alien ntfs-3g[648]: Permissions cache : 21 writes, 4033288 reads, 99.9% hits\nJun 10 16:18:51 alien systemd[1]: Unmounted /media/rick/casper-rw.\nJun 10 16:18:51 alien systemd[1]: Unmounted /mnt/e.\nJun 10 16:18:51 alien ntfs-3g[736]: Unmounting /dev/sda3 (HGST_Win10)\nJun 10 16:18:51 alien ntfs-3g[736]: Permissions cache : 754 writes, 4108560 reads, 99.9% hits\nJun 10 16:18:51 alien ntfs-3g[637]: Unmounting /dev/nvme0n1p4 (NVMe_Win10)\nJun 10 16:18:51 alien ntfs-3g[637]: Permissions cache : 987 writes, 4983239 reads, 99.9% hits\nJun 10 16:18:51 alien systemd[1]: Unmounted /mnt/d.\nJun 10 16:18:51 alien systemd[1]: Unmounted /mnt/c.\nJun 10 16:18:51 alien systemd[1]: Reached target Unmount All Filesystems.\nJun 10 16:18:51 alien systemd[1]: Stopped target Local File Systems (Pre).\nJun 10 16:18:51 alien systemd[1]: Stopped Remount Root and Kernel File Systems.\nJun 10 16:18:51 alien systemd[1]: Stopped Create Static Device Nodes in /dev.\nJun 10 16:18:51 alien systemd[1]: Reached target Shutdown.\nJun 10 16:18:51 alien systemd[1]: Reached target Final Step.\nJun 10 16:18:51 alien systemd[1]: dev-disk-by\\x2dpartlabel-Basic\\x5cx20data\\x5cx20partition.device: Dev dev-\nJun 10 16:18:51 alien systemd[1]: Received SIGRTMIN+20 from PID 18665 (plymouthd).\nJun 10 16:18:51 alien systemd[1]: Started Show Plymouth Reboot Screen.\nJun 10 16:18:51 alien systemd[1]: Starting Reboot...\nJun 10 16:18:51 alien systemd[1]: Shutting down.\nJun 10 16:18:51 alien kernel: systemd-shutdow: 36 output lines suppressed due to ratelimiting\nJun 10 16:18:51 alien systemd-shutdown[1]: Sending SIGTERM to remaining processes...\nJun 10 16:18:51 alien dnsmasq[1393]: exiting on receipt of SIGTERM\nJun 10 16:18:51 alien systemd-journald[288]: Journal stopped\nlines 46804-46832/46832 (END)\n
Run Code Online (Sandbox Code Playgroud)\n

在您的系统中,您需要查找错误消息。

\n

您可能必须使用Page Up密钥才能看到它们。

\n

当您找到所需内容(或放弃寻找)时,请按Q退出。

\n

如果过热导致关机,您可以安装 Intel Powerclamp:阻止 cpu 过热

\n

此外,lm-sensors您可以使用以下一行代码直接从命令行获取所有热区域的温度读数:

\n
$ paste <(cat /sys/class/thermal/thermal_zone*/type) <(cat /sys/class/thermal/thermal_zone*/temp) | column -s $\'\\t\' -t | sed \'s/\\(.\\)..$/.\\1\xc2\xb0C/\'\n\nINT3400 Thermal  20.0\xc2\xb0C\nSEN1             44.0\xc2\xb0C\nSEN2             52.0\xc2\xb0C\nSEN3             64.0\xc2\xb0C\nSEN4             59.0\xc2\xb0C\nB0D4             73.0\xc2\xb0C\npch_skylake      76.5\xc2\xb0C\nx86_pkg_temp     73.0\xc2\xb0C\n
Run Code Online (Sandbox Code Playgroud)\n

以摄氏度报告并去掉最后三个零。

\n