是否可以从JCuda向定义为Union的GPU内存发送数据？

Question

是否可以从JCuda向定义为Union的GPU内存发送数据？

我在GPU端（cuda）中定义了这样的新数据类型：

typedef union {
    int i;
    double d;
    long l;
    char s[16];
} data_unit;

data_unit *d_array;

Run Code Online (Sandbox Code Playgroud)

在Java中，我们拥有定义的并集中可用数组之一的数组。通常，如果我们有一个int类型的数组，则可以在Java（JCuda）中执行以下操作：

import static jcuda.driver.JCudaDriver.*;


int data_size;
CUdeviceptr d_array;
int[] h_array = new int[data_size];

cuMemAlloc(d_array, data_size * Sizeof.INT);
cuMemcpyHtoD(d_array, Pointer.to(h_array), data_size * Sizeof.INT);

Run Code Online (Sandbox Code Playgroud)

但是，如果设备上的数组类型是我们的联合，该怎么办呢？（假设h_array仍然是int类型）

int data_size;
CUdeviceptr d_array;
int[] h_array = new int[data_size];

cuMemAlloc(d_array, data_size * Sizeof.?);
// Here we should have some type of alignment (?)
cuMemcpyHtoD(d_array, Pointer.to(h_array), data_size * Sizeof.?);

Run Code Online (Sandbox Code Playgroud)

Answer 1

whn*_*whn 5

我相信对于工会是一个根本的误解。

让我们考虑一下。是什么使联合与结构不同？它可以在不同时间存储不同类型的数据。

它如何完成这项壮举？可以使用某种单独的变量来动态指定类型或占用多少内存，但是Union并没有这样做，它依赖于程序员确切地知道他们想要检索什么类型以及何时检索。因此，如果程序员仅在任何给定时间点才真正知道类型，则唯一的选择是仅确保为您的联合变量分配了足够的空间，以便人们可以始终将其用于任何类型。

确实，这就是联合会所做的事情，请参见此处（是的，我知道它是C / C ++，但这也适用于CUDA）。这对您意味着什么？这意味着您的并集数组的大小应为其最大成员的大小x元素数，因为一个联合的大小即为其最大成员的大小。

让我们看看您的工会，看看如何解决。

typedef union {
    int i;
    double d;
    long l;
    char s[16];
} data_unit;

Run Code Online (Sandbox Code Playgroud)

您的工会有：

int i，我们假设是4个字节
double d，这是8个字节
long l，这很令人困惑，因为根据编译器/平台的不同，它可以是4或8个字节，我们现在假设是8个字节。
char s[16]，容易，16个字节

因此，任何成员占用的最大字节数是您的char s[16]变量16字节。这意味着您需要将代码更改为：

int data_size;
int union_size = 16;
CUdeviceptr d_array;
// copying this to the device will not result in what you expect with out over allocating
// if you just copy over integers, which occupy 4 bytes each, your integers will fill less space than the number of unions 
//  we need to make sure that there is a "stride" here if we want to actually copy real data from host to device. 
// union_size / Sizeof.INT = 4, so there will be 4 x as many ints, 4 for each union. 
int[] h_array = new int[data_size * (union_size / Sizeof.INT)];


// here we aren't looking for size of int to allocate, but the size of our union. 
cuMemAlloc(d_array, data_size * union_size);
// we are copying, again, data_size * union_size bytes
cuMemcpyHtoD(d_array, Pointer.to(h_array), data_size * union_size);

Run Code Online (Sandbox Code Playgroud)

注意

如果要复制int，这基本上意味着您需要将第4个int分配给该索引所需的实际int。

int 0是h_array[0]，int 1是h_array[4]int 2是h_array[8]int n h_array[n * 4]等等。

归档时间：	6 年，8 月前
查看次数：	53 次
最近记录：	6 年，8 月前