aba*_*haw 5 python gpu nvidia docker tensorflow
我一直在努力获取依赖TensorFlow的应用程序,以与作为Docker容器一起使用nvidia-docker。我已经在tensorflow/tensorflow:latest-gpu-py3图像之上编译了我的应用程序。我使用以下命令运行Docker容器:
sudo nvidia-docker run -d -p 9090:9090 -v /src/weights:/weights myname/myrepo:mylabel
通过portainer查看日志时,我看到以下内容:
2017-05-16 03:41:47.715682: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.715896: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.715948: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.715978: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.716002: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.718076: E tensorflow/stream_executor/cuda/cuda_driver.cc:405] failed call to cuInit: CUDA_ERROR_UNKNOWN
2017-05-16 03:41:47.718177: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: 1e22bdaf82f1
2017-05-16 03:41:47.718216: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: 1e22bdaf82f1
2017-05-16 03:41:47.718298: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 367.57.0
2017-05-16 03:41:47.718398: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.57 Mon Oct 3 20:37:01 PDT 2016
GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
"""
2017-05-16 03:41:47.718455: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 367.57.0
2017-05-16 03:41:47.718484: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 367.57.0
Run Code Online (Sandbox Code Playgroud)
该容器似乎确实可以正常启动,并且我的应用程序确实正在运行。当我向其发送用于预测的请求时,预测会正确返回-但是在CPU上进行推理时,我希望以低速运行,因此我认为很明显,由于某种原因未使用GPU。我还尝试nvidia-smi从同一容器中运行,以确保它可以看到我的GPU,而这些是结果:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57 Driver Version: 367.57 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID K1 Off | 0000:00:07.0 Off | N/A |
| N/A 28C P8 7W / 31W | 25MiB / 4036MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
我当然不是这方面的专家-但确实可以从容器内部看到GPU。关于如何使用TensorFlow进行操作的任何想法?
我在我的 ubuntu16.04 桌面上运行tensorflow。
几天前我用 GPU 运行代码效果很好。但今天我找不到带有以下代码的 GPU 设备
import tensorflow as tf
from tensorflow.python.client import device_lib as _device_lib
with tf.Session() as sess:
local_device_protos = _device_lib.list_local_devices()
print(local_device_protos)
[print(x.name) for x in local_device_protos]
当我跑步时,我意识到以下问题tf.Session()
cuda_driver.cc:406] 调用 cuInit 失败:CUDA_ERROR_UNKNOWN
我在系统详细信息中检查我的 Nvidia 驱动程序,然后nvcc -V检查nvida-smi驱动程序、cuda 和 cudnn。一切看起来都很好。
然后我去附加驱动程序查看驱动程序详细信息,在那里我发现有很多版本的NVIDIA驱动程序,并且选择了最新版本。但是当我第一次安装驱动程序时只有一个。
然后我运行tf.Session()问题也在这里。我想我应该重新启动计算机,重新启动后,这个问题就消失了。
sess = tf.Session()
2018-07-01 12:02:41.336648: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-07-01 12:02:41.464166: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-07-01 12:02:41.464482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.8225
pciBusID: 0000:01:00.0
totalMemory: 7.93GiB freeMemory: 7.27GiB
2018-07-01 12:02:41.464494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-07-01 12:02:42.308689: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-01 12:02:42.308721: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-07-01 12:02:42.308729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-07-01 12:02:42.309686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7022 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability:
| 归档时间: |
|
| 查看次数: |
5435 次 |
| 最近记录: |