nvprof 输出：“未分析内核”是什么意思，以及如何修复它

Question

nvprof 输出：“未分析内核”是什么意思，以及如何修复它

我最近通过系统的包管理器在我的 arch-Linux 机器上安装了 Cuda，我一直试图通过运行一个简单的向量加法程序来测试它是否工作。

我只是将本教程中的代码（使用一个或多个内核的代码）复制粘贴到一个名为cuda_test.cu并运行的文件中

> nvcc cuda_test.cu -o cuda_test

Run Code Online (Sandbox Code Playgroud)

在任何一种情况下，程序都可以运行，并且我没有收到任何错误（因为程序没有崩溃并且输出是没有错误）。但是当我尝试在程序上运行 Cuda 分析器时：

> sudo nvprof ./cuda_test

Run Code Online (Sandbox Code Playgroud)

我得到结果：

==3201== NVPROF is profiling process 3201, command: ./cuda_test
Max error: 0
==3201== Profiling application: ./cuda_test
==3201== Profiling result:
No kernels were profiled.
No API activities were profiled.
==3201== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.

Run Code Online (Sandbox Code Playgroud)

后一个警告不是我的主要问题或我的问题的主题，我的问题是消息说没有分析内核并且没有分析 API 活动。

这是否意味着该程序完全在我的 CPU 上运行？还是 nvprof 中的错误？

我已经找到了大约相同的错误的讨论在这里，但得到的答复是已安装的CUDA错误的版本，而在我的情况下，安装的版本是通过系统包管理器安装了最新版本（版本10.1.243-1 )

有什么办法可以让 nvprof 显示预期的输出？

编辑

试图坚持最后的警告并不能解决问题：

添加对cudaProfilerStop()(or cuProfilerStop()) 的调用，并cudaDeviceReset();根据建议在末尾添加并链接适当的库 ( cuda_profiler_api.hor cudaProfiler.h) 并编译

> nvcc cuda_test.cu -o cuda_test -lcuda

Run Code Online (Sandbox Code Playgroud)

生成一个仍然可以运行的程序，但是当运行哪个 nvprof 时，返回：

==12558== NVPROF is profiling process 12558, command: ./cuda_test
Max error: 0
==12558== Profiling application: ./cuda_test
==12558== Profiling result:
No kernels were profiled.
No API activities were profiled.
==12558== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.
======== Error: Application received signal 139

Run Code Online (Sandbox Code Playgroud)

这并没有解决原来的问题，反而造成了新的错误；当同样的情况，cudaProfilerStop()自行或一起使用cuProfilerStop()和cudaDeviceReset();

编码

如上所述，代码是从教程中复制来测试 Cuda 是否正常工作的，尽管我也包含了对cudaProfilerStop()和的调用cudaDeviceReset()；为清楚起见，此处包括：

#include <iostream>

#include <math.h>

#include <cuda_profiler_api.h>

// Kernel function to add the elements of two arrays

__global__
void add(int n, float *x, float *y)
{
  int index = threadIdx.x;
  int stride = blockDim.x;
  for (int i = index; i < n; i += stride)
      y[i] = x[i] + y[i];
}


int main(void)

{

  int N = 1<<20;

  float *x, *y;


  cudaProfilerStart();


  // Allocate Unified Memory – accessible from CPU or GPU

  cudaMallocManaged(&x, N*sizeof(float));

  cudaMallocManaged(&y, N*sizeof(float));



  // initialize x and y arrays on the host

  for (int i = 0; i < N; i++) {

    x[i] = 1.0f;

    y[i] = 2.0f;

  }



  // Run kernel on 1M elements on the GPU

    add<<<1, 1>>>(N, x, y);



  // Wait for GPU to finish before accessing on host

  cudaDeviceSynchronize();



  // Check for errors (all values should be 3.0f)

  float maxError = 0.0f;

  for (int i = 0; i < N; i++)

    maxError = fmax(maxError, fabs(y[i]-3.0f));

  std::cout << "Max error: " << maxError << std::endl;



  // Free memory

  cudaFree(x);

  cudaFree(y);
  
  cudaDeviceReset();
  cudaProfilerStop();

  

  return 0;

}

Run Code Online (Sandbox Code Playgroud)

Answer 1

Nik*_*laj 18

这个问题显然有些众所周知，经过一番搜索，我发现了这个关于编辑版本中错误代码的线程；那里讨论的解决方案是使用标志 --unified-memory-profiling off 调用 nvprof：

> sudo nvprof --unified-memory-profiling off ./cuda_test

Run Code Online (Sandbox Code Playgroud)

这使得 nvprof 按预期工作——即使没有调用 cudaProfileStop。

归档时间：	6 年前
查看次数：	3688 次
最近记录：	4 年，2 月前