使用syncwarp进行线程同步

BAd*_*dhi 5 cuda

除了__syncthreads()同步线程块内的扭曲的函数之外,还有另一个函数称为__syncwarp(). 这个函数究竟有什么作用?

CUDA编程指南说,

will cause the executing thread to wait until all warp lanes named in mask have executed a __syncwarp() (with the same mask) before resuming execution. All non-exited threads named in mask must execute a corresponding __syncwarp() with the same mask, or the result is undefined.

Executing __syncwarp() guarantees memory ordering among threads participating in the barrier. Thus, threads within a warp that wish to communicate via memory can store to memory, execute __syncwarp(), and then safely read values stored by other threads in the warp.

So does this mean that this function ensures synchronization in threads within a warp that is included by the mask? If so, do we need such synchronization within the threads in the same warp since they all are ensured to be executed in lockstep?

Mo *_*ani 7

此功能在 CUDA 9 上可用,是的,它可以同步扭曲中的所有线程,并且对于发散的扭曲很有用。这对于 Volta 架构非常有用,在该架构中,warp 中的线程可以单独调度。