我目前正在开发一台拥有4台Tesla T10 gpu的gpu服务器.虽然我一直在测试内核并且必须经常使用ctrl-C终止进程,但我在简单设备查询代码的末尾添加了几行代码.代码如下:
#include <stdio.h>
// Print device properties
void printDevProp(cudaDeviceProp devProp)
{
printf("Major revision number: %d\n", devProp.major);
printf("Minor revision number: %d\n", devProp.minor);
printf("Name: %s\n", devProp.name);
printf("Total global memory: %u\n", devProp.totalGlobalMem);
printf("Total shared memory per block: %u\n", devProp.sharedMemPerBlock);
printf("Total registers per block: %d\n", devProp.regsPerBlock);
printf("Warp size: %d\n", devProp.warpSize);
printf("Maximum memory pitch: %u\n", devProp.memPitch);
printf("Maximum threads per block: %d\n", devProp.maxThreadsPerBlock);
for (int i = 0; i < 3; ++i)
printf("Maximum dimension %d of block: %d\n", i, devProp.maxThreadsDim[i]);
for (int i = 0; i < 3; ++i)
printf("Maximum dimension %d of grid: %d\n", i, devProp.maxGridSize[i]);
printf("Clock rate: %d\n", devProp.clockRate);
printf("Total constant memory: %u\n", devProp.totalConstMem);
printf("Texture alignment: %u\n", devProp.textureAlignment);
printf("Concurrent copy and execution: %s\n", (devProp.deviceOverlap ? "Yes" : "No"));
printf("Number of multiprocessors: %d\n", devProp.multiProcessorCount);
printf("Kernel execution timeout: %s\n", (devProp.kernelExecTimeoutEnabled ? "Yes" : "No"));
return;
}
int main()
{
// Number of CUDA devices
int devCount;
cudaGetDeviceCount(&devCount);
printf("CUDA Device Query...\n");
printf("There are %d CUDA devices.\n", devCount);
// Iterate through devices
for (int i = 0; i < devCount; ++i)
{
// Get device properties
printf("\nCUDA Device #%d\n", i);
cudaDeviceProp devProp;
cudaGetDeviceProperties(&devProp, i);
printDevProp(devProp);
}
printf("\nPress any key to exit...");
char c;
scanf("%c", &c);
**for (int i = 0; i < devCount; i++) {
cudaSetDevice(i);
cudaDeviceReset();
}**
return 0;
}
Run Code Online (Sandbox Code Playgroud)
我的查询与main()结束之前的for循环有关,我在其中逐个设置每个设备,然后使用cudaResetDevice命令.我有一种奇怪的感觉,这段代码,虽然没有产生任何错误,但我无法重置所有设备.相反,程序每次仅重置默认设备,即设备0.谁能告诉我怎么做才能重置4个设备中的每一个.
谢谢
看起来您可以向GPU程序添加一个函数来捕获ctrl + c信号(SIGINT)并为程序使用的每个设备调用cudaDeviceReset()函数.
可以在此处找到捕获SIGINT时调用函数的示例代码:
对于你编写的每个GPU程序都包含这样的代码似乎是一种很好的做法,我也会这样做:-)
我没有时间写完整的详细答案,所以请阅读其他答案及其评论.
这可能已经太晚了,但是如果您编写一个信号处理函数,您可以消除内存泄漏并以某种确定的方式重置设备:
// State variables for
extern int no_sigint;
int no_sigint = 1;
extern int interrupts;
int interrupts = 0;
/* Catches signal interrupts from Ctrl+c.
If 1 signal is detected the simulation finishes the current frame and
exits in a clean state. If Ctrl+c is pressed again it terminates the
application without completing writes to files or calculations but
deallocates all memory anyway. */
void
sigint_handler (int sig)
{
if (sig == SIGINT)
{
interrupts += 1;
std::cout << std::endl
<< "Aborting loop.. finishing frame."
<< std::endl;
no_sigint = 0;
if (interrupts >= 2)
{
std::cerr << std::endl
<< "Multiple Interrupts issued: "
<< "Clearing memory and Forcing immediate shutdown!"
<< std::endl;
// write a function to free dynamycally allocated memory
free_mem ();
int devCount;
cudaGetDeviceCount (&devCount);
for (int i = 0; i < devCount; ++i)
{
cudaSetDevice (i);
cudaDeviceReset ();
}
exit (9);
}
}
}
Run Code Online (Sandbox Code Playgroud)
....
int main(){
.....
for (int simulation_step=1 ; simulation_step < SIM_STEPS && no_sigint; ++simulation_step)
{
.... simulation code
}
free_mem();
... cuda device resets
return 0;
}
Run Code Online (Sandbox Code Playgroud)
如果您使用此代码(您甚至可以在外部标头中包含第一个代码段,它就可以工作。您可以对 ctrl+c 进行 2 级控制:第一次按下会停止模拟并正常退出,但应用程序会完成渲染该步骤很高兴优雅地停止并获得正确的结果,如果您再次按 ctrl+c,它将关闭应用程序并释放所有内存。
| 归档时间: |
|
| 查看次数: |
7189 次 |
| 最近记录: |