我正在关注这里的教程:http://opencl.codeplex.com/wikipage?title = OpenCL%20Tutorials%20-%201
他们列出的内核是这个,它计算两个数字的总和并将其存储在输出变量中:
__kernel void vector_add_gpu (__global const float* src_a,
__global const float* src_b,
__global float* res,
const int num)
{
/* get_global_id(0) returns the ID of the thread in execution.
As many threads are launched at the same time, executing the same kernel,
each one will receive a different ID, and consequently perform a different computation.*/
const int idx = get_global_id(0);
/* Now each work-item asks itself: "is my ID inside the vector's range?"
If …
Run Code Online (Sandbox Code Playgroud) 我正在开发一个在Fermi卡上运行的CUDA 4.0应用程序.根据规范,Fermi具有Compute Capability 2.0,因此应该支持非内联函数调用.
我在一个不同的obj文件中用nvcc 4.0编译我的每个类.然后,我用g ++ - 4.4将它们全部链接起来.
请考虑以下代码:
[文件A.cuh]
#include <cuda_runtime.h>
struct A
{
__device__ __host__ void functionA();
};
Run Code Online (Sandbox Code Playgroud)
[文件B.cuh]
#include <cuda_runtime.h>
struct B
{
__device__ __host__ void functionB();
};
Run Code Online (Sandbox Code Playgroud)
[File A.cu]
#include "A.cuh"
#include "B.cuh"
void A::functionA()
{
B b;
b.functionB();
}
Run Code Online (Sandbox Code Playgroud)
试图编译A.cu与nvcc -o A.o -c A.cu -arch=sm_20
输出Error: External calls are not supported (found non-inlined call to _ZN1B9functionBEv)
.
我一定做错了什么,但是什么?