相关疑难解决方法(0)

CUDA 9 shfl vs. shfl_sync

自CUDA 9起,shfl指令已弃用,应由shfl_sync替换.

但是,当他们表现不同时,我该如何更换它们呢?

代码示例:

__global__
static void shflTest(){
    int tid = threadIdx.x;
    float value = tid + 0.1f;
    int* ivalue = reinterpret_cast<int*>(&value);

    //use the integer shfl
    int ix = __shfl(ivalue[0],5,32);
    int iy = __shfl_sync(ivalue[0],5,32);

    float x = reinterpret_cast<float*>(&ix)[0];
    float y = reinterpret_cast<float*>(&iy)[0];

    if(tid == 0){
        printf("shfl tmp %d %d\n",ix,iy);
        printf("shfl final %f %f\n",x,y);
    }
}

int main()
{
    shflTest<<<1,32>>>();
    cudaDeviceSynchronize();
    return 0;
}
Run Code Online (Sandbox Code Playgroud)

输出:

shfl tmp 1084437299 5
shfl final 5.100000 0.000000
Run Code Online (Sandbox Code Playgroud)

cuda

5
推荐指数
1
解决办法
4074
查看次数

使用syncwarp进行线程同步

除了__syncthreads()同步线程块内的扭曲的函数之外,还有另一个函数称为__syncwarp(). 这个函数究竟有什么作用?

CUDA编程指南说,

will cause the executing thread to wait until all warp lanes named in mask have executed a __syncwarp() (with the same mask) before resuming execution. All non-exited threads named in mask must execute a corresponding __syncwarp() with the same mask, or the result is undefined.

Executing __syncwarp() guarantees memory ordering among threads participating in the barrier. Thus, threads within a warp that wish to communicate via memory can store to memory, …

cuda

5
推荐指数
1
解决办法
1946
查看次数

标签 统计

cuda ×2