小编use*_*744的帖子

多gpu CUDA推力

我有一个Cuda C++代码,它使用Thrust目前在单个GPU上正常工作.我现在想修改它为multi-gpu.我有一个主机功能,包括许多Thrust调用,可以对设备阵列进行排序,复制,计算差异等.我想使用每个GPU同时在它自己的(独立的)数组上运行这个Thrust调用序列.我已经读过返回值的Thrust函数是同步的,但是我可以使用OpenMP让每个主机线程调用一个在单独的GPU上运行的函数(使用Thrust调用)吗？

例如(在浏览器中编码):

#pragma omp parallel for 
for (int dev=0; dev<Ndev; dev++){
   cudaSetDevice(dev);
   runthrustfunctions(dev);
}

void runthrustfunctions(int dev){
  /*lots of Thrust functions running on device arrays stored on corresponding GPU*/
 //for example this is just a few of the lines"

 thrust::device_ptr<double> pos_ptr = thrust::device_pointer_cast(particle[dev].pos);
 thrust::device_ptr<int> list_ptr = thrust::device_pointer_cast(particle[dev].list);
 thrust::sequence(list_ptr,list_ptr+length);
 thrust::sort_by_key(pos_ptr, pos_ptr+length,list_ptr);
 thrust::device_vector<double> temp(length);
 thrust::gather(list_ptr,list_ptr+length,pos_ptr,temp.begin());   
 thrust::copy(temp.begin(), temp.end(), pos_ptr);

Run Code Online (Sandbox Code Playgroud)

我想我还需要将结构"particle [0]"存储在GPU 0上,粒子[1]存储在GPU 1等上,我猜这是不可能的.一个选项可能是为每个GPU案例使用"switch"和单独的代码.

我想知道这是一种正确的方法,还是有更好的方法？谢谢

cuda openmp thrust

use*_*744

lucky-day

2
推荐指数

1
解决办法

1810
查看次数