Cuda和OpenGL Interop

Question

Cuda和OpenGL Interop

我一直在阅读CUDA文档,在我看来,需要与glGL接口的每个缓冲区都需要在glBuffer中创建.

根据nvidia编程指南,必须这样做:

GLuint positionsVBO;
struct cudaGraphicsResource* positionsVBO_CUDA;

int main() {

    // Explicitly set device
    cudaGLSetGLDevice(0);
    // Initialize OpenGL and GLUT
    ...
    glutDisplayFunc(display);
    // Create buffer object and register it with CUDA
    glGenBuffers(1, positionsVBO);
    glBindBuffer(GL_ARRAY_BUFFER, &vbo);
    unsigned int size = width * height * 4 * sizeof(float);
    glBufferData(GL_ARRAY_BUFFER, size, 0, GL_DYNAMIC_DRAW);
    glBindBuffer(GL_ARRAY_BUFFER, 0);
    cudaGraphicsGLRegisterBuffer(&positionsVBO_CUDA, positionsVBO, cudaGraphicsMapFlagsWriteDiscard);

    // Launch rendering loop
    glutMainLoop();
}
void display() {
    // Map buffer object for writing from CUDA
    float4* positions;
    cudaGraphicsMapResources(1, &positionsVBO_CUDA, 0);
    size_t num_bytes;
    cudaGraphicsResourceGetMappedPointer((void**)&positions, &num_bytes, positionsVBO_CUDA));
    // Execute kernel
    dim3 dimBlock(16, 16, 1);
    dim3 dimGrid(width / dimBlock.x, height / dimBlock.y, 1);
    createVertices<<<dimGrid, dimBlock>>>(positions, time, width, height);
    // Unmap buffer object
    cudaGraphicsUnmapResources(1, &positionsVBO_CUDA, 0);
    // Render from buffer object
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
    glBindBuffer(GL_ARRAY_BUFFER, positionsVBO);
    glVertexPointer(4, GL_FLOAT, 0, 0);
    glEnableClientState(GL_VERTEX_ARRAY);
    glDrawArrays(GL_POINTS, 0, width * height);
    glDisableClientState(GL_VERTEX_ARRAY);
    // Swap buffers
    glutSwapBuffers();
    glutPostRedisplay();
}
void deleteVBO() {
    cudaGraphicsUnregisterResource(positionsVBO_CUDA);
    glDeleteBuffers(1, &positionsVBO);
}

__global__ void createVertices(float4* positions, float time, unsigned int width, unsigned int height) { 
    // [....]
}

Run Code Online (Sandbox Code Playgroud)

有没有办法将cudaMalloc创建的内存空间直接提供给OpenGL？我已经在cuda上编写了代码,我想把我的float4数组直接放到OpenGL中.

如果已经有代码如下:

float4 *cd = (float4*) cudaMalloc(elements*sizeof(float4)). 
do_something<<<16,1>>>(cd);

Run Code Online (Sandbox Code Playgroud)

我想通过OpenGL显示do_something的输出.

旁注:为什么cudaGraphicsResourceGetMappedPointer函数在每个时间步都运行？

Answer 1

har*_*ism 11

从CUDA 4.0开始,OpenGL互操作是单向的.这意味着要执行您想要的操作(运行将数据写入GL缓冲区或纹理图像的CUDA内核),您必须将缓冲区映射到设备指针,并将该指针传递给内核,如示例所示.

至于你的旁注:每次调用display()时都会调用cudaGraphicsResourceGetMappedPointer,因为每帧都会调用cudaGraphicsMapResource.每次重新映射资源时,都应重新获取映射指针,因为它可能已更改.为什么要重新映射每一帧？好吧,出于性能原因,OpenGL有时会在内存中移动缓冲区对象(特别是在内存密集型GL应用程序中).如果您始终保留映射资源,则无法执行此操作,性能可能会受到影响.我相信GL的虚拟化内存对象的能力和需求也是当前GL interop API单向的原因之一(GL不允许移动CUDA分配,因此您无法映射CUDA分配的设备指针进入GL缓冲区对象).

归档时间：	14 年，8 月前
查看次数：	12601 次
最近记录：	8 年，9 月前