本研究论文在GPU上运行了一系列CUDA微基准测试,以获取全局内存延迟,指令吞吐量等统计信息.此链接是作者在其GPU上编写和运行的一组微基准测试的链接.
其中global.cu一个微基准测试给出了指针追逐基准测试的代码,用于测量全局内存延迟.
这是运行的内核的代码.
__global__ void global_latency (unsigned int ** my_array, int array_length, int iterations, int ignore_iterations, unsigned long long * duration) {
unsigned int start_time, end_time;
unsigned int *j = (unsigned int*)my_array;
volatile unsigned long long sum_time;
sum_time = 0;
duration[0] = 0;
for (int k = -ignore_iterations; k < iterations; k++) {
if (k==0) {
sum_time = 0; // ignore some iterations: cold icache misses
}
start_time = clock();
repeat256(j=*(unsigned int **)j;) // unroll …Run Code Online (Sandbox Code Playgroud)