我试图在我的玩具示例中使用cublas函数cublasSgemmBatched.在这个例子中,我首先分配2D数组:h_AA, h_BBsize [ 6] [ 5]和h_CCsize [ 6] [ 1].之后,我将其复制到设备,执行cublasSgemmBatched并尝试将阵列复制d_CC回主机阵列h_CC.但是,我收到了一个错误(cudaErrorLaunchFailure)设备主机复制,我不确定我是否正确地将数组复制到设备中:
int main(){
cublasHandle_t handle;
cudaError_t cudaerr;
cudaEvent_t start, stop;
cublasStatus_t stat;
const float alpha = 1.0f;
const float beta = 0.0f;
float **h_AA, **h_BB, **h_CC;
h_AA = new float*[6];
h_BB = new float*[6];
h_CC = new float*[6];
for (int i = 0; i < 6; i++){
h_AA[i] = new float[5]; …Run Code Online (Sandbox Code Playgroud)