将包含指针的结构复制到CUDA设备

Tho*_*sen 26 struct pointers host cuda device

我正在开发一个项目,我需要我的CUDA设备在包含指针的结构上进行计算.

typedef struct StructA {
    int* arr;
} StructA;
Run Code Online (Sandbox Code Playgroud)

当我为结构分配内存然后将其复制到设备时,它只会复制结构而不是指针的内容.现在我通过首先分配指针来解决这个问题,然后将主机结构设置为使用新指针(位于GPU上).以下代码示例使用上面的结构描述了此方法:

#define N 10

int main() {

    int h_arr[N] = {1,2,3,4,5,6,7,8,9,10};
    StructA *h_a = (StructA*)malloc(sizeof(StructA));
    StructA *d_a;
    int *d_arr;

    // 1. Allocate device struct.
    cudaMalloc((void**) &d_a, sizeof(StructA));

    // 2. Allocate device pointer.
    cudaMalloc((void**) &(d_arr), sizeof(int)*N);

    // 3. Copy pointer content from host to device.
    cudaMemcpy(d_arr, h_arr, sizeof(int)*N, cudaMemcpyHostToDevice);

    // 4. Point to device pointer in host struct.
    h_a->arr = d_arr;

    // 5. Copy struct from host to device.
    cudaMemcpy(d_a, h_a, sizeof(StructA), cudaMemcpyHostToDevice);

    // 6. Call kernel.
    kernel<<<N,1>>>(d_a);

    // 7. Copy struct from device to host.
    cudaMemcpy(h_a, d_a, sizeof(StructA), cudaMemcpyDeviceToHost);

    // 8. Copy pointer from device to host.
    cudaMemcpy(h_arr, d_arr, sizeof(int)*N, cudaMemcpyDeviceToHost);

    // 9. Point to host pointer in host struct.
    h_a->arr = h_arr;
}
Run Code Online (Sandbox Code Playgroud)

我的问题是:这是做到这一点的方法吗?

这似乎是一项非常多的工作,我提醒你,这是一个非常简单的结构.如果我的struct包含许多带指针本身的指针或结构,则分配和复制的代码将非常广泛且令人困惑.

har*_*ism 24

编辑: CUDA 6引入了统一内存,这使得"深度复制"问题变得更加容易.有关详细信息,请参阅此帖子.


不要忘记您可以按值将结构传递给内核.此代码有效:

// pass struct by value (may not be efficient for complex structures)
__global__ void kernel2(StructA in)
{
    in.arr[threadIdx.x] *= 2;
}
Run Code Online (Sandbox Code Playgroud)

这样做意味着您只需要将数组复制到设备,而不是结构:

int h_arr[N] = {1,2,3,4,5,6,7,8,9,10};
StructA h_a;
int *d_arr;

// 1. Allocate device array.
cudaMalloc((void**) &(d_arr), sizeof(int)*N);

// 2. Copy array contents from host to device.
cudaMemcpy(d_arr, h_arr, sizeof(int)*N, cudaMemcpyHostToDevice);

// 3. Point to device pointer in host struct.
h_a.arr = d_arr;

// 4. Call kernel with host struct as argument
kernel2<<<N,1>>>(h_a);

// 5. Copy pointer from device to host.
cudaMemcpy(h_arr, d_arr, sizeof(int)*N, cudaMemcpyDeviceToHost);

// 6. Point to host pointer in host struct 
//    (or do something else with it if this is not needed)
h_a.arr = h_arr;
Run Code Online (Sandbox Code Playgroud)