小编Bri*_*n R的帖子

无法运行查询NVML的CUDA代码-有关libnvidia-ml.so的错误

最近,一位同事需要使用NVML查询设备信息,因此我下载了Tesla开发工具包3.304.5,并将文件nvml.h复制到了/ usr / include。为了进行测试,我在tdk_3.304.5 / nvml / example中编译了示例代码,并且工作正常。

整个周末,系统中发生了某些更改(我无法确定更改的内容,而且我不是唯一有权访问计算机的更改),现在使用nvml.h的任何代码(例如示例代码)都会失败,并出现以下错误:

Failed to initialize NVML:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING:

You should always run with libnvidia-ml.so that is installed with your NVIDIA Display Driver. By default it's installed in /usr/lib and /usr/lib64. libnvidia-ml.so in TDK package is a stub library that is attached only for build purposes (e.g. machine that you build your application doesn't have to have Display Driver installed).
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Run Code Online (Sandbox Code Playgroud)

但是,我仍然可以运行nvidia-smi并读取有关我的K20m状态的信息,据我所知,nvidia-smi只是对nvml.h的一组调用。我收到的错误消息有些含糊,但我相信它告诉我nvidia-ml.so文件需要与我在系统上安装的Tesla驱动程序匹配。为了确保一切正确,我重新下载了CUDA 5.0并安装了驱动程序,CUDA运行时和测试文件。我确定nvidia-ml.so文件与驱动程序匹配(均为304.54),所以对于可能出了什么问题我感到很困惑。我可以使用nvcc编译和运行测试代码,也可以运行自己的CUDA代码,只要它不包含nvml.h。

有没有人遇到此错误或对纠正此问题有任何想法?

$ ls -la /usr/lib/libnvidia-ml*
lrwxrwxrwx. 1 root root     17 Jul …
Run Code Online (Sandbox Code Playgroud)

cuda nvcc tesla nvml

5
推荐指数
1
解决办法
2万
查看次数

标签 统计

cuda ×1

nvcc ×1

nvml ×1

tesla ×1