2D纹理是图像处理应用程序中CUDA的一个有用特性.要将音高线性存储器绑定到2D纹理,必须对齐存储器.cudaMallocPitch对齐内存分配是一个很好的选择.在我的设备上,返回的音高cudaMallocPitch是512的倍数,即内存是512字节对齐的.
设备的实际对齐要求由cudaDeviceProp::texturePitchAlignment我的设备上的32个字节确定.
我的问题是:
如果2D纹理的实际对齐要求是32字节,那么为什么cudaMallocPitch返回512字节对齐的内存?
这不是浪费记忆吗?例如,如果我创建一个大小为513 x 100的8位图像,它将占用1024 x 100字节.
我在以下系统上遇到此行为:
1:华硕G53JW + Windows 8 x64 + GeForce GTX 460M + CUDA 5 +酷睿i7 740QM + 4GB内存
2:戴尔Inspiron N5110 + Windows 7 x64 + GeForce GT525M + CUDA 4.2 + Corei7 2630QM + 6GB内存
这是一个有点推测性的答案,但请记住,纹理的分配间距必须满足两个对齐属性,一个用于纹理指针,一个用于纹理行。我怀疑这cudaMallocPitch是对前者的尊重,由cudaDeviceProp::textureAlignment. 例如:
#include <cstdio>
int main(void)
{
const int ncases = 12;
const size_t widths[ncases] = { 5, 10, 20, 50, 70, 90, 100,
200, 500, 700, 900, 1000 };
const size_t height = 10;
float *vals[ncases];
size_t pitches[ncases];
struct cudaDeviceProp p;
cudaGetDeviceProperties(&p, 0);
fprintf(stdout, "Texture alignment = %zd bytes\n",
p.textureAlignment);
cudaSetDevice(0);
cudaFree(0); // establish context
for(int i=0; i<ncases; i++) {
cudaMallocPitch((void **)&vals[i], &pitches[i],
widths[i], height);
fprintf(stdout, "width = %zd <=> pitch = %zd \n",
widths[i], pitches[i]);
}
return 0;
}
Run Code Online (Sandbox Code Playgroud)
在 GT320M 上给出以下结果:
Texture alignment = 256 bytes
width = 5 <=> pitch = 256
width = 10 <=> pitch = 256
width = 20 <=> pitch = 256
width = 50 <=> pitch = 256
width = 70 <=> pitch = 256
width = 90 <=> pitch = 256
width = 100 <=> pitch = 256
width = 200 <=> pitch = 256
width = 500 <=> pitch = 512
width = 700 <=> pitch = 768
width = 900 <=> pitch = 1024
width = 1000 <=> pitch = 1024
Run Code Online (Sandbox Code Playgroud)
我猜这cudaDeviceProp::texturePitchAlignment适用于 CUDA 数组。
| 归档时间: |
|
| 查看次数: |
3256 次 |
| 最近记录: |