我一直在使用带有Tesla K80 GPU的AWS EC2实例来运行TensorFlow代码。我已经安装了CUDA 9.0和cuDNN 7.1.4,我使用的是TF 1.12,所有这些都在Ubuntu 16.04上
到昨天为止一切正常,但今天看来NVidia驱动程序由于某种原因已停止运行:
ubuntu@ip-10-0-0-13:~$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Run Code Online (Sandbox Code Playgroud)
我检查了驱动程序:
ubuntu@ip-10-0-0-13:~$ dpkg -l | grep nvidia
rc nvidia-367 367.48-0ubuntu1 amd64 NVIDIA binary driver - version 367.48
ii nvidia-396 396.37-0ubuntu1 amd64 NVIDIA binary driver - version 396.37
ii nvidia-396-dev 396.37-0ubuntu1 amd64 NVIDIA binary Xorg driver development files
ii nvidia-machine-learning-repo-ubuntu1604 1.0.0-1 amd64 nvidia-machine-learning repository configuration files
ii nvidia-modprobe …Run Code Online (Sandbox Code Playgroud)