小编wha*_*gto的帖子

为什么GTX Titan上的cublas比单线程CPU代码慢？

我在我的GTX Titan上测试Nvidia Cublas Library.我有以下代码:

#include "cublas.h"
#include <stdlib.h>
#include <conio.h>
#include <Windows.h>
#include <iostream>
#include <iomanip>

/* Vector size */
#define N (1024 * 1024 * 32)

/* Main */
int main(int argc, char** argv)
{
  LARGE_INTEGER frequency;
  LARGE_INTEGER t1, t2;

  float* h_A;
  float* h_B;
  float* d_A = 0;
  float* d_B = 0;

  /* Initialize CUBLAS */
  cublasInit();

  /* Allocate host memory for the vectors */
  h_A = (float*)malloc(N * sizeof(h_A[0]));
  h_B = (float*)malloc(N * sizeof(h_B[0]));

  /* Fill the vectors …

Run Code Online (Sandbox Code Playgroud)

c++ performance cuda gpgpu cublas

wha*_*gto

lucky-day

4
推荐指数

2
解决办法

732
查看次数