bou*_*bon 2 metrics cuda nvprof
windows下运行nvprof --metrics命令报错\xef\xbc\x9a
\n\n==6580== NVPROF is profiling process 6580, command: Project1.exe\n==6580== Error: Internal profiling error 4292:1.\n======== Error: CUDA profiling error.\n
Run Code Online (Sandbox Code Playgroud)\n\n\n\n如果我只使用nvprof命令,不会报错\xef\xbc\x9a
\n\nF:\\vstest\\Project1\\x64\\Release>nvprof Project1.exe\n==384== NVPROF is profiling process 384, command: Project1.exe\nsumMatrixOnGPU2D <<<(512,512), (32,32)>>> elapsed 22 ms\n==384== Profiling application: Project1.exe\n==384== Profiling result:\n Type Time(%) Time Calls Avg Min Max Name\n GPU activities: 61.28% 538.11ms 2 269.06ms 260.98ms 277.13ms [CUDA memcpy HtoD]\n 36.29% 318.68ms 1 318.68ms 318.68ms 318.68ms [CUDA memcpy DtoH]\n 2.43% 21.364ms 1 21.364ms 21.364ms 21.364ms sumMatrixOnGPU2D(float*, float*, float*, int, int)\n API calls: 56.77% 1.29771s 3 432.57ms 47.895ms 1.19911s cudaMalloc\n 37.53% 857.94ms 3 285.98ms 261.20ms 319.19ms cudaMemcpy\n 2.56% 58.617ms 1 58.617ms 58.617ms 58.617ms cudaDeviceReset\n 2.13% 48.594ms 3 16.198ms 14.312ms 17.671ms cudaFree\n 0.95% 21.732ms 2 10.866ms 275.60us 21.456ms cudaDeviceSynchronize\n 0.02% 512.70us 1 512.70us 512.70us 512.70us cudaLaunchKernel\n 0.02% 359.30us 96 3.7420us 100ns 204.60us cuDeviceGetAttribute\n 0.02% 347.80us 1 347.80us 347.80us 347.80us cudaGetDeviceProperties\n 0.01% 180.60us 1 180.60us 180.60us 180.60us cuDeviceGetPCIBusId\n 0.00% 32.100us 1 32.100us 32.100us 32.100us cuDeviceTotalMem\n 0.00% 13.400us 1 13.400us 13.400us 13.400us cudaSetDevice\n 0.00% 4.0000us 3 1.3330us 200ns 3.5000us cuDeviceGetCount\n 0.00% 3.9000us 1 3.9000us 3.9000us 3.9000us cudaGetLastError\n 0.00% 1.1000us 2 550ns 200ns 900ns cuDeviceGet\n 0.00% 1.0000us 1 1.0000us 1.0000us 1.0000us cuDeviceGetName\n 0.00% 300ns 1 300ns 300ns 300ns cuDeviceGetUuid\n 0.00% 300ns 1 300ns 300ns 300ns cuDeviceGetLuid\n
Run Code Online (Sandbox Code Playgroud)\n\n我想问一下是什么问题,如何使用命令 nvprof --metrics
\n我找到了答案。
我正在添加对我有用的解决方案,作为其他人的参考。
您需要打开 NVIDIA 控制面板(右键单击桌面,然后选择它)桌面(从顶部菜单)-> 启用开发人员设置。然后,选择开发人员(从侧树中)-> 管理 GPU 性能计数器 -> 允许所有用户访问 GPU 性能计数器。
NVidia 之前对此进行了详细记录;现在很难找到了。