我有这样的代码:
for(int i =0; i<2; i++)
{
//initialization of memory and some variables
........
........
RunDll(input image, output image); //function that calls kernel
}
Run Code Online (Sandbox Code Playgroud)
上述循环中的每次迭代都是独立的.我想同时运行它们.所以,我试过这个:
for(int i =0; i<num_devices; i++)
{
cudaSetDevice(i);
//initialization of memory and some variables
........
........
RunDll(input image, output image);
{
RunBasicFBP_CUDA(parameters); //function that calls kernel 1
xSegmentMetal(parameters); //CPU function
RunBasicFP_CUDA(parameters); //function that uses output of kernel 1 as input for kernel 2
for (int idx_view = 0; idx_view < param.fbp.num_view; idx_view++)
{
for (int idx_bin …
Run Code Online (Sandbox Code Playgroud)