我在一个GPU上运行这个程序,全局内存为1GB.它给出了以下错误:
Fatal error: cudaMemcpy1 error (unspecified launch failure at CheckDevice.cu:27)
*** FAILED - ABORTING
========= Out-of-range Shared or Local Address
========= at 0x000006a8 in grid::SetSubgridMarker(grid*, grid*)
========= by thread (0,0,0) in block (0,0,0)
========= Device Frame:SetAllFlags_dev(param_t*, grid*) (SetAllFlags_dev(param_t*, grid*) : 0x108)
========= Device Frame:SetAllFlags(param_t*, grid*) (SetAllFlags(param_t*, grid*) : 0x38)
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x3dc) [0xc9edc]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.5.0 [0xa18a]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.5.0 (cudaLaunch + 0x17f) [0x2f4cf]
========= Host Frame:Transport [0xd395]
========= Host Frame:Transport [0xd7bd]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xed) [0x2176d]
========= Host Frame:Transport [0x17bd]
=========
========= Program hit error 4 on CUDA API call to cudaMemcpy
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/libcuda.so [0x26a180]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.5.0 (cudaMemcpy + 0x271) [0x348e1]
========= Host Frame:Transport [0x2cea]
========= Host Frame:Transport [0x3769]
========= Host Frame:Transport [0xd7ee]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xed) [0x2176d]
========= Host Frame:Transport [0x17bd]
=========
========= Program hit error 4 on CUDA API call to cudaGetLastError
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/libcuda.so [0x26a180]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.5.0 (cudaGetLastError + 0x1e6) [0x2a046]
========= Host Frame:Transport [0x2cef]
========= Host Frame:Transport [0x3769]
========= Host Frame:Transport [0xd7ee]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xed) [0x2176d]
========= Host Frame:Transport [0x17bd]
=========
========= ERROR SUMMARY: 3 errors
Run Code Online (Sandbox Code Playgroud)
对于未指定的启动失败错误,相关的代码行是cudaMemcpy操作:
cudaMemcpy(CurrentGrid, Grid_dev, sizeof(grid), cudaMemcpyDeviceToHost);
cudaCheckErrors("cudaMemcpy1 error");
Run Code Online (Sandbox Code Playgroud)
然后如错误消息所示,它说Out-of-range Shared or Local Address at 0x000006a8 in grid::SetSubgridMarker(grid*, grid*)
.是因为设备上的全局内存耗尽了吗?有没有办法在设备上返回内存使用情况?
在源代码中,checkDevice.cu在grid :: SetSubgridMarker之后执行,checkDevice不会在设备上占用太多内存空间,所以我猜测(但没有多少信心)它grid::SetSubgridMarker
会耗尽内存,因此没有空间来启动cudaMemcpy操作.有什么建议?非常感谢!
未指定的启动失败不是由于cudaMemcpy操作.它是紧接该操作之前的内核启动的"遗留"错误.
内核启动失败可能是因为您正在运行代码时报告的内存越界访问cuda-memcheck
.
您应检查内核代码,SetSubGridMarker
以便无法访问共享或本地内存.
这些都不意味着您的设备上的全局内存不足.
如果我在C中有一个像这样的数组:
int C[5];
Run Code Online (Sandbox Code Playgroud)
然后我尝试访问这样的元素:
int temp = C[6];
Run Code Online (Sandbox Code Playgroud)
这是一个超出范围的访问.您正在访问定义的变量存储的末尾.这并不意味着你"内存不足".
你的SetSubGridMarker
代码中正在发生类似的事情.你需要找出它是什么并修复它. cuda-memcheck
通过告诉你块(0,0,0)中的线程(0,0,0)正在进行非法访问,也给你一个线索.通过仔细查看此线程如何索引存储在本地或共享内存中的数据,您应该能够发现错误.
您还可以使用此处描述的方法来cuda-memcheck
识别生成故障的特定内核代码行.