Fra*_*ter 1 c++ cuda pointer-arithmetic gpu-shared-memory
我不明白以下几行到底发生了什么:
unsigned char *membershipChanged = (unsigned char *)sharedMemory;
Run Code Online (Sandbox Code Playgroud)
float *clusters = (float *)(sharedMemory + blockDim.x);
Run Code Online (Sandbox Code Playgroud)
我假设 in #1sharedMemory有效地重命名为membershipChanged,但为什么要将 the 添加blockDim到sharedMemory指针中。这个地址指向哪里?
sharedMemory创建于 extern __shared__ char sharedMemory[];
我在CUDA kmeans 实现中找到的代码。
void find_nearest_cluster(int numCoords,
int numObjs,
int numClusters,
float *objects, // [numCoords][numObjs]
float *deviceClusters, // [numCoords][numClusters]
int *membership, // [numObjs]
int *intermediates)
{
extern __shared__ char sharedMemory[];
// The type chosen for membershipChanged must be large enough to support
// reductions! There are blockDim.x elements, one for each thread in the
// block.
unsigned char *membershipChanged = (unsigned char *)sharedMemory;
float *clusters = (float *)(sharedMemory + blockDim.x);
membershipChanged[threadIdx.x] = 0;
// BEWARE: We can overrun our shared memory here if there are too many
// clusters or too many coordinates!
for (int i = threadIdx.x; i < numClusters; i += blockDim.x) {
for (int j = 0; j < numCoords; j++) {
clusters[numClusters * j + i] = deviceClusters[numClusters * j + i];
}
}
.....
Run Code Online (Sandbox Code Playgroud)
sharedMemory + blockDim.x指向blockDim.x远离共享内存区域基址的字节。
您可能会执行此类操作的原因是在共享内存中进行二次分配。内核的启动站点包括find_nearest_cluster为内核动态分配一定量的共享存储。该代码意味着两个逻辑上不同的数组驻留在sharedMemory-- membershipChanged、 和所指向的共享存储中clusters。指针算术只是获取指向第二个数组的指针的一种方法。