简单示例中的CUDA内核调用

Question

简单示例中的CUDA内核调用

举例来说，这是cuda的第一个并行代码。

谁能形容我有关内核调用的信息：<<< N，1 >>>

这是重要的代码：

#define N   10

__global__ void add( int *a, int *b, int *c ) {
    int tid = blockIdx.x;    // this thread handles the data at its thread id
    if (tid < N)
        c[tid] = a[tid] + b[tid];
}

int main( void ) {
    int a[N], b[N], c[N];
    int *dev_a, *dev_b, *dev_c;

    // allocate the memory on the GPU
    // fill the arrays 'a' and 'b' on the CPU
    // copy the arrays 'a' and 'b' to the GPU

    add<<<N,1>>>( dev_a, dev_b, dev_c );

    // copy the array 'c' back from the GPU to the CPU
    // display the results
    // free the memory allocated on the GPU

    return 0;
}

Run Code Online (Sandbox Code Playgroud)

为什么使用<<< N , 1 >>>它意味着我们在每个块中使用了N个块和1个线程？因为我们可以编写此代码<<< 1 , N >>>，并在此块中使用1个块和N个线程进行更多优化。

Answer 1

kro*_*eml 5

对于这个小例子，没有特别的原因（正如Bart在评论中告诉您的那样）。但是对于更大，更现实的示例，您应始终牢记每个块的线程数是有限的。也就是说，如果使用N = 10000，则无法使用<<<1,N>>>，但<<<N,1>>>仍然可以使用。

网格大小也有限制。 (2认同)

归档时间：	13 年，7 月前
查看次数：	13140 次
最近记录：	13 年，7 月前