如何在CUDA中使用2D数组?

San*_*eep 14 cuda

我是CUDA的新手.如何分配大小为MXN的2D数组?如何在CUDA中遍历该数组?给我一个示例代码................................................... ..........................................

嗨..谢谢你的回复.我在下面的程序中使用了你的代码.但我没有得到正确的结果.

__global__ void test(int A[BLOCK_SIZE][BLOCK_SIZE], int B[BLOCK_SIZE][BLOCK_SIZE],int C[BLOCK_SIZE][BLOCK_SIZE])
{

    int i = blockIdx.y * blockDim.y + threadIdx.y;
    int j = blockIdx.x * blockDim.x + threadIdx.x;

    if (i < BLOCK_SIZE && j < BLOCK_SIZE)
        C[i][j] = A[i][j] + B[i][j];

}

int main()
{

    int d_A[BLOCK_SIZE][BLOCK_SIZE];
    int d_B[BLOCK_SIZE][BLOCK_SIZE];
    int d_C[BLOCK_SIZE][BLOCK_SIZE];

    int C[BLOCK_SIZE][BLOCK_SIZE];

    for(int i=0;i<BLOCK_SIZE;i++)
      for(int j=0;j<BLOCK_SIZE;j++)
      {
        d_A[i][j]=i+j;
        d_B[i][j]=i+j;
      }


    dim3 dimBlock(BLOCK_SIZE, BLOCK_SIZE); 
    dim3 dimGrid(GRID_SIZE, GRID_SIZE); 

    test<<<dimGrid, dimBlock>>>(d_A,d_B,d_C); 

    cudaMemcpy(C,d_C,BLOCK_SIZE*BLOCK_SIZE , cudaMemcpyDeviceToHost);

    for(int i=0;i<BLOCK_SIZE;i++)
      for(int j=0;j<BLOCK_SIZE;j++)
      {
        printf("%d\n",C[i][j]);

      }
}
Run Code Online (Sandbox Code Playgroud)

请帮我.

ard*_*u07 18

如何分配2D数组:

int main(){
#define BLOCK_SIZE 16
#define GRID_SIZE 1
int d_A[BLOCK_SIZE][BLOCK_SIZE];
int d_B[BLOCK_SIZE][BLOCK_SIZE];

/* d_A initialization */

dim3 dimBlock(BLOCK_SIZE, BLOCK_SIZE); // so your threads are BLOCK_SIZE*BLOCK_SIZE, 256 in this case
dim3 dimGrid(GRID_SIZE, GRID_SIZE); // 1*1 blocks in a grid

YourKernel<<<dimGrid, dimBlock>>>(d_A,d_B); //Kernel invocation
}
Run Code Online (Sandbox Code Playgroud)

如何遍历该数组:

__global__ void YourKernel(int d_A[BLOCK_SIZE][BLOCK_SIZE], int d_B[BLOCK_SIZE][BLOCK_SIZE]){
int row = blockIdx.y * blockDim.y + threadIdx.y;
int col = blockIdx.x * blockDim.x + threadIdx.x;
if (row >= h || col >= w)return;
/* whatever you wanna do with d_A[][] and d_B[][] */
}
Run Code Online (Sandbox Code Playgroud)

我希望这是有帮助的

您也可以参考CUDA编程指南第22页关于矩阵乘法的内容

  • /*d_A初始化*/的实际内容也是答案的重要部分.你能提供它吗? (5认同)
  • @ user621508虽然这将起作用,它只是在设备内存中创建一个巨大的线性阵列.您也可以使用[cudaMalloc3D](http://developer.download.nvidia.com/compute/cuda/3_2/toolkit/docs/online/group__CUDART__MEMORY_g04a7553c90322aef32f8544d5c356a10.html#g04a7553c90322aef32f8544d5c356a10)来分配针对2D-优化的二维数组数据访问.我不知道你是否只想要2D数组的索引或性能. (3认同)
  • @ username_4567,这就是/*d_A初始化*/代表的内容.但是没有释放内存. (2认同)

小智 6

最好的方法是以矢量形式存储二维数组A. 例如,你有一个矩阵A大小为nxm,它指向指针表示的(i,j)元素将是

A[i][j] (with i=0..n-1 and j=0..m-1). 
Run Code Online (Sandbox Code Playgroud)

在矢量形式中,您可以编写

A[i*n+j] (with i=0..n-1 and j=0..m-1).
Run Code Online (Sandbox Code Playgroud)

在这种情况下使用一维数组将简化复制过程,这很简单:

double *A,*dev_A; //A-hous pointer, dev_A - device pointer;
A=(double*)malloc(n*m*sizeof(double));
cudaMalloc((void**)&dev_A,n*m*sizeof(double));
cudaMemcpy(&dev_A,&A,n*m*sizeof(double),cudaMemcpyHostToDevice); //In case if A is double
Run Code Online (Sandbox Code Playgroud)