我正在自学CUDA。我的最终目标是将其应用到 Fortran,但因为很多课程/视频都是基于 C/C++,所以我经常最终在两者中执行相同的练习(这是一件好事)。目前,我正在尝试运行一个基本练习,在 GPU 上执行 a(i) = b(i) + c(i) 。为了完整起见,我发布了两个代码以进行比较:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include "cuda.h"
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "cuda_common.cuh"
#include "common.h"
//assume grid is 1D and block is 1D then nx = size
__global__ void sum_arrays_1Dgrid_1Dblock(float* a, float* b, float *c, int nx)
{
int gid = blockIdx.x * blockDim.x + threadIdx.x;
if (gid < nx)
c[gid] = a[gid] + b[gid];
}
void run_sum_array_1d(int argc, char** argv)
{
printf("Runing 1D grid \n");
int …Run Code Online (Sandbox Code Playgroud)