相关疑难解决方法(0)

分析我的CUDA内核的内存访问合并

我想通过BS_x读取(BS_X + 1)*(BS_Y + 1)全局存储器位置*BS_Y线程将内容移动到共享存储器,我开发了以下代码.

int i       = threadIdx.x;
int j       = threadIdx.y;
int idx     = blockIdx.x*BLOCK_SIZE_X + threadIdx.x;
int idy     = blockIdx.y*BLOCK_SIZE_Y + threadIdx.y;

int index1  = j*BLOCK_SIZE_Y+i;

int i1      = (index1)%(BLOCK_SIZE_X+1);
int j1      = (index1)/(BLOCK_SIZE_Y+1);

int i2      = (BLOCK_SIZE_X*BLOCK_SIZE_Y+index1)%(BLOCK_SIZE_X+1);
int j2      = (BLOCK_SIZE_X*BLOCK_SIZE_Y+index1)/(BLOCK_SIZE_Y+1);

__shared__ double Ezx_h_shared_ext[BLOCK_SIZE_X+1][BLOCK_SIZE_Y+1];     

Ezx_h_shared_ext[i1][j1]=Ezx_h[(blockIdx.y*BLOCK_SIZE_Y+j1)*xdim+(blockIdx.x*BLOCK_SIZE_X+i1)];

if ((i2<(BLOCK_SIZE_X+1))&&(j2<(BLOCK_SIZE_Y+1))) 
Ezx_h_shared_ext[i2][j2]=Ezx_h[(blockIdx.y*BLOCK_SIZE_Y+j2)*xdim+(blockIdx.x*BLOCK_SIZE_X+i2)];
Run Code Online (Sandbox Code Playgroud)

根据我的理解,合并是顺序处理的连续内存读取的并行等价物.现在如何检测全局内存访问是否已合并?我注意到从(i1,j1)到(i2,j2)的索引跳转.提前致谢.

shared cuda

6
推荐指数
1
解决办法
1679
查看次数

使用共享内存与 CUDA 进行 3D 卷积

我目前正在尝试将 2D 卷积代码从这个问题调整为 3D,但无法理解我的错误在哪里。

我的二维码如下所示:

#include <iostream>

#define     MASK_WIDTH      3
#define     MASK_RADIUS     MASK_WIDTH / 2
#define     TILE_WIDTH      8
#define         W           (TILE_WIDTH + MASK_WIDTH - 1)

/**
 * GPU 2D Convolution using shared memory
 */
__global__ void convolution(float *I, float* M, float *P, int width, int height)
{
    /***** WRITE TO SHARED MEMORY *****/
    __shared__ float N_ds[W][W];

    // First batch loading
    int dest = threadIdx.x + (threadIdx.y * TILE_WIDTH);
    int destY = dest / W;
    int destX = dest …
Run Code Online (Sandbox Code Playgroud)

c++ cuda gpu-shared-memory

5
推荐指数
1
解决办法
5927
查看次数

标签 统计

cuda ×2

c++ ×1

gpu-shared-memory ×1

shared ×1