我想通过BS_x读取(BS_X + 1)*(BS_Y + 1)全局存储器位置*BS_Y线程将内容移动到共享存储器,我开发了以下代码.
int i = threadIdx.x;
int j = threadIdx.y;
int idx = blockIdx.x*BLOCK_SIZE_X + threadIdx.x;
int idy = blockIdx.y*BLOCK_SIZE_Y + threadIdx.y;
int index1 = j*BLOCK_SIZE_Y+i;
int i1 = (index1)%(BLOCK_SIZE_X+1);
int j1 = (index1)/(BLOCK_SIZE_Y+1);
int i2 = (BLOCK_SIZE_X*BLOCK_SIZE_Y+index1)%(BLOCK_SIZE_X+1);
int j2 = (BLOCK_SIZE_X*BLOCK_SIZE_Y+index1)/(BLOCK_SIZE_Y+1);
__shared__ double Ezx_h_shared_ext[BLOCK_SIZE_X+1][BLOCK_SIZE_Y+1];
Ezx_h_shared_ext[i1][j1]=Ezx_h[(blockIdx.y*BLOCK_SIZE_Y+j1)*xdim+(blockIdx.x*BLOCK_SIZE_X+i1)];
if ((i2<(BLOCK_SIZE_X+1))&&(j2<(BLOCK_SIZE_Y+1)))
Ezx_h_shared_ext[i2][j2]=Ezx_h[(blockIdx.y*BLOCK_SIZE_Y+j2)*xdim+(blockIdx.x*BLOCK_SIZE_X+i2)];
Run Code Online (Sandbox Code Playgroud)
根据我的理解,合并是顺序处理的连续内存读取的并行等价物.现在如何检测全局内存访问是否已合并?我注意到从(i1,j1)到(i2,j2)的索引跳转.提前致谢.
我目前正在尝试将 2D 卷积代码从这个问题调整为 3D,但无法理解我的错误在哪里。
我的二维码如下所示:
#include <iostream>
#define MASK_WIDTH 3
#define MASK_RADIUS MASK_WIDTH / 2
#define TILE_WIDTH 8
#define W (TILE_WIDTH + MASK_WIDTH - 1)
/**
* GPU 2D Convolution using shared memory
*/
__global__ void convolution(float *I, float* M, float *P, int width, int height)
{
/***** WRITE TO SHARED MEMORY *****/
__shared__ float N_ds[W][W];
// First batch loading
int dest = threadIdx.x + (threadIdx.y * TILE_WIDTH);
int destY = dest / W;
int destX = dest …Run Code Online (Sandbox Code Playgroud)