我正在运行cuda-memcheck调试我的代码,输出如下
========= Program hit cudaErrorCudartUnloading (error 29) due to "driver shutting down" on CUDA API call to cudaFree.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2e40d3]
========= Host Frame:./nmt [0x53526]
========= Host Frame:./nmt [0xfbd9]
terminate called after throwing an instance of '========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 [0x3c259]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 [0x3c2a5]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfc) [0x21ecc]
thrust::system::system_error'
========= Host Frame:./nmt [0x530a]
=========
what(): driver shutting down
========= Error: process didn't terminate …Run Code Online (Sandbox Code Playgroud) 我有一个看起来像这样的结构
struct LstmLayer {
int deviceId;
thrust::device_vector <real_t> W;
thrust::device_vector <real_t> gradW;
LstmLayer() : deviceId(0) {}
LstmLayer(int __deviceId__) : deviceId(__deviceId__) {}
void setDevice(int __deviceId__) { deviceId = __deviceId__; }
void init(bool initParams) {
W.resize(4*lstmSize * 2*lstmSize);
gradW.resize(4*lstmSize * 2*lstmSize);
if (initParams) GPU_Random_Vector(W);
}
}
Run Code Online (Sandbox Code Playgroud)
现在我想初始化一个数组LstmLayer,每个元素都在不同的GPU设备上.我这样做如下
struct LstmLayer lstmLayers[MAX_NUM_LSTM_LAYERS];
for (int i = 0; i < numLstmLayers; ++i) {
CUDA_SAFE_CALL(cudaSetDevice(i));
lstmLayers[i].setDevice(i);
lstmLayers[i].init(true);
}
Run Code Online (Sandbox Code Playgroud)
运行此程序会出现以下错误
terminate called after throwing an instance of 'thrust::system::system_error'
what(): driver shutting down
Run Code Online (Sandbox Code Playgroud)
请告诉我我的代码有什么问题以及如何正确执行?先谢谢你.
使用nvprof,我发现以下内核是我的CUDA应用程序的瓶颈
__global__ void extractColumn_kernel(real_t *tgt, real_t *src, int *indices, int numRows, int len) {
int stride = gridDim.x * blockDim.x;
int tid = blockDim.x * blockIdx.x + threadIdx.x;
for (int j = tid; j < len; j += stride) {
int colId = j / numRows;
int rowId = j % numRows;
tgt[j] = src[indices[colId]*numRows + rowId];
}
}
Run Code Online (Sandbox Code Playgroud)
它打算将src列出的矩阵的列提取indices到矩阵中tgt.请注意,矩阵src和tgt两者都有numRows行,并以列主要维度存储.此外,len = length(indices)*numRows是矩阵的条目总数tgt. …
cuda ×3
c ×2
c++ ×1
extraction ×1
gpu ×1
matrix ×1
memcheck ×1
memory ×1
multiple-gpu ×1
thrust ×1