CUDA要求启动的资源太多

Question

CUDA要求启动的资源太多

我在使用Compute Capability 2.0的GTX 480上运行代码时遇到了一些问题

如果我使用每个块1024个线程启动内核,我总是会遇到以下错误:

========= CUDA-MEMCHECK
========= Program hit cudaErrorLaunchOutOfResources (error 7) due to "too many resources requested for launch" on CUDA API call to cudaLaunch.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2ef613]
=========     Host Frame:/usr/local/cuda-6.5/lib64/libcudart.so.6.5 (cudaLaunch + 0x17e) [0x3686e]
=========     Host Frame:./bin/myProgram [0x3a50]
=========     Host Frame:./bin/myProgram [0x388a]
=========     Host Frame:./bin/myProgram [0x38e3]
=========     Host Frame:./bin/myProgram [0x2a99]
=========     Host Frame:./bin/myProgram [0x1410]
=========     Host Frame:./bin/myProgram [0x1da0]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xed) [0x2176d]
=========     Host Frame:./bin/myProgram [0x1139]
=========

Run Code Online (Sandbox Code Playgroud)

我用不同的块和线程计数多次运行程序:

5 Blocks, 512 Threads per Block => Works
5 Blocks, 1024 Threads per Block => Error
10 Blocks, 512 Threads per Block => Works
10 Blocks, 1024 Threads per Block => Error
15 Blocks, 512 Threads per Block => Works
15 Blocks, 1024 Threads per Block => Error

Run Code Online (Sandbox Code Playgroud)

我检查了使用过的寄存器,似乎没问题.具有28个寄存器的"Function4"是使用如此多线程的内核.所有其他kernerls每次通话仅使用<<< 1,32 >>.

ptxas info    : 0 bytes gmem
ptxas info    : Function properties for _Z7function1Py
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Compiling entry function '_Z13function2PyS_i' for 'sm_20'
ptxas info    : Function properties for _Z13function2PyS_i
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 22 registers, 52 bytes cmem[0]
ptxas info    : Compiling entry function '_Z6function3PyiS_' for 'sm_20'
ptxas info    : Function properties for _Z6function3PyiS_
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 22 registers, 56 bytes cmem[0]
ptxas info    : Compiling entry function '_Z17function4PyiiS_Phji' for 'sm_20'
ptxas info    : Function properties for _Z17function4PyiiS_Phji
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 28 registers, 72 bytes cmem[0]

Run Code Online (Sandbox Code Playgroud)

我使用CC 3.0在我的GTX 660上运行这个程序,并且每块可以使用1024个线程.我不知道问题来自哪里.有人有想法吗？

Answer 1

Fra*_*des 8

我有同样的错误.

感谢http://cuda-programming.blogspot.fr/2013/01/handling-cuda-error-messages.html,我理解错误.他们说 :

"要求启动的资源太多 - 此错误意味着超出了多处理器上可用的寄存器数量.减少每个块的线程数以解决问题."

基本上我以前每块可以有一定数量的线程(对于3D内核,8x8x16 = 1024).但是如果嵌套内核调用,则会进一步减少可用寄存器的数量.

归档时间：	11 年，7 月前
查看次数：	10181 次
最近记录：	10 年，2 月前