CUDA __device__未解析的extern函数

max*_*max 11 c c++ cuda linker-errors

我试图了解如何__device__在单独的头文件中解耦CUDA 代码.

我有三个文件.

文件:1:int2.cuh

#ifndef INT2_H_
#define INT2_H_

#include "cuda.h"
#include "cuda_runtime.h"
#include "device_launch_parameters.h"

__global__ void kernel();
__device__ int k2(int k);

int launchKernel(int dim);

#endif /* INT2_H_ */
Run Code Online (Sandbox Code Playgroud)

文件2:int2.cu

#include "int2.cuh"
#include "cstdio"

__global__ void kernel() {
    int tid = threadIdx.x;
    printf("%d\n", k2(tid));
}

__device__ int k2(int i) {
    return i * i;
}

int launchKernel(int dim) {
    kernel<<<1, dim>>>();
    cudaDeviceReset();
    return 0;
}
Run Code Online (Sandbox Code Playgroud)

文件3:CUDASample.cu

include <stdio.h>
#include <stdlib.h>
#include "int2.cuh"
#include "iostream"

using namespace std;

static const int WORK_SIZE = 256;

__global__ void sampleCuda() {
    int tid = threadIdx.x;
//    printf("%d\n", k2(tid)); //Can not call k2
    printf("%d\n", tid * tid);
}

int main(void) {

    int var;
    var = launchKernel(16);

    kernel<<<1, 16>>>();
    cudaDeviceReset();

    sampleCuda<<<1, 16>>>();
    cudaDeviceReset();

    return 0;
}
Run Code Online (Sandbox Code Playgroud)

代码工作文件.我可以调用sampleCuda()内核(在同一个文件中),调用C函数launchKernel()(在其他文件中),并kernel()直接调用(在其他文件中).

我面临的问题是__device__sampleCuda()内核调用函数.然后它显示以下错误.但是,相同的功能可以调用kernel().

10:58:11 **** Incremental Build of configuration Debug for project CUDASample ****
make all 
Building file: ../src/CUDASample.cu
Invoking: NVCC Compiler
/Developer/NVIDIA/CUDA-6.5/bin/nvcc -G -g -O0 -gencode arch=compute_20,code=sm_20  -odir "src" -M -o "src/CUDASample.d" "../src/CUDASample.cu"
/Developer/NVIDIA/CUDA-6.5/bin/nvcc -G -g -O0 --compile --relocatable-device-code=false -gencode arch=compute_20,code=compute_20 -gencode arch=compute_20,code=sm_20  -x cu -o  "src/CUDASample.o" "../src/CUDASample.cu"
../src/CUDASample.cu(18): warning: variable "var" was set but never used

../src/CUDASample.cu(8): warning: variable "WORK_SIZE" was declared but never referenced

../src/CUDASample.cu(18): warning: variable "var" was set but never used

../src/CUDASample.cu(8): warning: variable "WORK_SIZE" was declared but never referenced

ptxas fatal   : Unresolved extern function '_Z2k2i'
make: *** [src/CUDASample.o] Error 255

10:58:14 Build Finished (took 2s.388ms)
Run Code Online (Sandbox Code Playgroud)

Grz*_*ski 10

问题是您__device__在单独的编译单元中定义了一个函数__global__来调用它.您需要通过添加标志或将定义移动到同一单元来明确启用可重定位设备代码模式-dc.

来自nvcc文档:

--device-c|-dc将每个.c/.cc/.cpp/.cxx/.cu输入文件编译为包含可重定位设备代码的目标文件.它相当于 --relocatable-device-code= true --compile.

有关详细信息,请参阅CUDA C++设备代码的单独编译和链接.

  • nsight EE有一个项目选项,可用于在创建项目时选择"单独编译"项目类型.如果以这种方式创建项目,则可能比直接修改makefile脚本更容易. (2认同)