查看有关CUDA问题的答案和评论,以及CUDA标记维基,我发现通常建议每个API调用的返回状态都应该检查错误.API文档包括像功能cudaGetLastError,cudaPeekAtLastError以及cudaGetErrorString,但什么是把这些结合在一起,以可靠地捕捉和无需大量额外的代码报告错误的最好方法?
我是CUDA的新手,我一直致力于"减少算法".
该算法适用于任何小于1 << 24的数组大小.
当我使用大小为1 << 25的数组时,程序在"总和"中返回0,这是错误的.总和应该是2 ^ 25
编辑 cuda-memcheck compiled_code
========= CUDA-MEMCHECK
@@STARTING@@
========= Program hit cudaErrorInvalidValue (error 11) due to "invalid argument" on CUDA API call to cudaLaunch.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib64/libcuda.so.1 [0x2f2d83]
========= Host Frame:test [0x3b37e]
========= Host Frame:test [0x2b71]
========= Host Frame:test [0x2a18]
========= Host Frame:test [0x2a4c]
========= Host Frame:test [0x2600]
========= Host Frame:test [0x2904]
========= Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xfd) [0x1ed5d]
========= Host Frame:test …Run Code Online (Sandbox Code Playgroud)