相关疑难解决方法(0)

从GPU复制到CPU比将CPU复制到GPU要慢

我已经开始学习cuda一段时间了,我有以下问题

看看我在下面的做法:

复制GPU

int* B;
// ...
int *dev_B;    
//initialize B=0

cudaMalloc((void**)&dev_B, Nel*Nface*sizeof(int));
cudaMemcpy(dev_B, B, Nel*Nface*sizeof(int),cudaMemcpyHostToDevice);
//...

//Execute on GPU the following function which is supposed to fill in 
//the dev_B matrix with integers


findNeiborElem <<< Nblocks, Nthreads >>>(dev_B, dev_MSH, dev_Nel, dev_Npel, dev_Nface, dev_FC);
Run Code Online (Sandbox Code Playgroud)

再次复制CPU

cudaMemcpy(B, dev_B, Nel*Nface*sizeof(int),cudaMemcpyDeviceToHost);
Run Code Online (Sandbox Code Playgroud)
  1. 将数组B复制到dev_B只需要几分之一秒.但是将数组dev_B复制回B需要永远.
  2. findNeiborElem函数涉及每个线程的循环,例如它看起来像这样

    __ global __ void findNeiborElem(int *dev_B, int *dev_MSH, int *dev_Nel, int *dev_Npel, int *dev_Nface, int *dev_FC){
    
        int tid=threadIdx.x + blockIdx.x * blockDim.x;
        while (tid<dev_Nel[0]){
            for (int j=1;j<=Nel;j++){
                 // do some …
    Run Code Online (Sandbox Code Playgroud)

c++ cuda gpu

5
推荐指数
1
解决办法
968
查看次数

标签 统计

c++ ×1

cuda ×1

gpu ×1