相关疑难解决方法(0)

用CUDA减少总和:什么是N?

根据NVIDIA,是最快的减少内核:

template <unsigned int blockSize>
__device__ void warpReduce(volatile int *sdata, unsigned int tid) {
if (blockSize >=  64) sdata[tid] += sdata[tid + 32];
if (blockSize >=  32) sdata[tid] += sdata[tid + 16];
if (blockSize >=  16) sdata[tid] += sdata[tid +  8];
if (blockSize >=    8) sdata[tid] += sdata[tid +  4];
if (blockSize >=    4) sdata[tid] += sdata[tid +  2];
if (blockSize >=    2) sdata[tid] += sdata[tid +  1];
}
template <unsigned int blockSize>
__global__ void reduce6(int *g_idata, int …
Run Code Online (Sandbox Code Playgroud)

parallel-processing cuda sum

7
推荐指数
1
解决办法
7205
查看次数

标签 统计

cuda ×1

parallel-processing ×1

sum ×1