CUDA在Nsight调试中无法看到共享内存值

Question

CUDA在Nsight调试中无法看到共享内存值

Iam*_*Iam 5 debugging cuda shared-memory nsight

一段时间以来，我一直在努力寻找一个似乎无法解决的问题。问题是，当我尝试在Visual Studio 2008下使用Nvidia Nsight调试CUDA代码时，使用共享内存时会得到奇怪的结果。

我的代码是：

template<typename T>
__device__
T integrate()
{
   extern __shared__ T s_test[]; // Dynamically allocated shared memory
   /**** Breakpoint (1) here ****/
   int index = threadIdx.x + threadIdx.y * blockDim.x; // Local index in block. Column major ordering
   if(index < 64 && blockIdx.x==0) { // Only work on a few values. Just testing
      s_test[index] = (T)index;
      /* Some other irelevant code here */
   }
   return v;
}

Run Code Online (Sandbox Code Playgroud)

当我到达断点1并检查Visual Studio监视窗口内的共享内存时，只有数组的前8个值会更改，其他值保持为空。我希望所有前64个都这样做。 Visual Studio中的监视窗口

我认为这可能与所有扭曲不同时执行有关。所以我尝试同步它们。我在里面添加了这段代码integrate()

template<typename T>
__device__
T integrate()
{
   /* Old code is still here */

   __syncthreads();
   /**** Breakpoint (2) here ****/
   if(index < 64 && blockIdx.x==0) {
      T tmp = s_test[index]; // Write to tmp variable so I can inspect it inside Nsight Watch window
      v = tmp + index; // Use `tmp` and `index` somehow so that the compiler doesn't optimize it out of existence
   }
return v;
}

Run Code Online (Sandbox Code Playgroud)

但是问题仍然存在。此外，tmp内的其余值0与VS所指示的“监视”窗口形式不同。 Nsight的监视窗口

我必须提到，这需要很多步骤来完成__syncthreads()，所以当我到达它时，我就跳到了断点2。到底发生了什么！！

EDIT有关系统/启动配置的信息

系统

名称Intel（R）Core（TM）2 Duo CPU E7300 @ 2.66GHz
架构x86
频率2.666 MHz
芯数2
页面大小4.096
总物理内存3.582,00 MB
可用物理内存1.983,00 MB
版本名称Windows 7 Ultimate
版本号6.1.7600

设备 GeForce 9500 GT

驱动程序版本301.42
驱动器型号WDDM
CUDA设备索引0
GPU系列G96
计算能力1.1
SM数量4
帧缓冲区物理大小（MB）512
帧缓冲区带宽（GB / s）16
帧缓冲器总线宽度（位）128
帧缓冲区位置专用
图形时钟（Mhz）812
内存时钟（Mhz）500
处理器时钟（Mhz）1625
内存类型DDR2

集成开发环境

Microsoft Visual Studio团队系统2008
NVIDIA Nsight Visual Studio Edition，版本2.2内部版本2.2.0.12255

编译器命令对应的

1>“ C：\ Program Files \ NVIDIA GPU计算工具包\ CUDA \ v4.2 \ bin \ nvcc.exe” -G -gencode = arch = compute_10，code = \“ sm_10，compute_10 \” --machine 32 -ccbin “ C：\ Program Files \ Microsoft Visual Studio 9.0 \ VC \ bin” -D_NEXUS_DEBUG -g -D_DEBUG -Xcompiler“ / EHsc / W3 / nologo / Od / Zi / RTC1 / MDd” -I“ inc” -I“ C： \ Program Files \ NVIDIA GPU计算工具包\ CUDA \ v4.2 \ include“ -maxrregcount = 0 --compile -o” Debug / process_f2f.cu.obj“ process_f2f.cu

启动配置。共享内存的大小似乎无关紧要。我试过几个版本。我最常与之合作的是：

共享内存2048字节
网格/块大小：{101，101，1}，{16，16，1}

Answer 1

You*_* Nj 1

您是否尝试过__syncthreads()在分配值后进行放置？

template<typename T>
__device__
T integrate()
{
   extern __shared__ T s_test[]; // Dynamically allocated shared memory
   int index = threadIdx.x + threadIdx.y * blockDim.x; // Local index in block. Column major ordering
   if(index < 64 && blockIdx.x==0) { // Only work on a few values. Just testing
      s_test[index] = (T)index;
      /* Some other irelevant code here */
   }
   __syncthreads();
   /**** Breakpoint (1) here ****/
   return v;
}

Run Code Online (Sandbox Code Playgroud)

并尝试查看该断点处的值。

归档时间：	12 年，11 月前
查看次数：	1755 次
最近记录：	9 年，9 月前