我的CPU的规格说它应该为内存带来5.336GB/s的带宽.为了测试这个,我写了一个简单的程序,在一个大数组上运行memset(或memcpy)并报告时间.我在memset上显示3.8GB/s,在memcpy上显示1.9GB/s. http://en.wikipedia.org/wiki/Intel_Core_(microarchitecture)说我的Q9400应该达到5.336MB/s.怎么了?
我试过用赋值循环替换memset或memcpy.我已经google了一下,试图了解内存对齐情况.我尝试过不同的编译器标志.我花了很多时间在这上面尴尬.感谢您的任何帮助,您可以提供!
我正在使用Ubuntu 12.04和libc-dev版本2.15-0ubuntu10.5和内核3.8.0-37-generic
代码:
#include <stdio.h>
#include <time.h>
#include <string.h>
#include <stdlib.h>
#define numBytes ((long)(1024*1024*1024))
#define numTransfers ((long)(8))
int main(int argc,char**argv){
if(argc!=3){
printf("Usage: %s BLOCK_SIZE_IN_BYTES NUMBER_OF_BLOCKS_TO_TRANSFER\n",argv[0]);
return -1;
}
char*__restrict__ source=(char*)malloc(numBytes);
char*__restrict__ dest=(char*)malloc(numBytes);
struct timespec start,end;
long totalTimeMs;
int i;
clock_gettime(CLOCK_MONOTONIC_RAW,&start);
for(i=0;i<numTransfers;++i)
memset(source,0,numBytes);
clock_gettime(CLOCK_MONOTONIC_RAW,&end);
totalTimeMs=(end.tv_nsec-start.tv_nsec)*.000001+1000*(end.tv_sec-start.tv_sec);
printf("memset %ld bytes %ld times (%.2fGB total) in %ldms (%.3fGB/s). ",numBytes,numTransfers,numBytes/1024.0/1024/1024*numTransfers,totalTimeMs,numBytes/1024.0/1024/1024*1000*numTransfers/totalTimeMs);
clock_gettime(CLOCK_MONOTONIC_RAW,&start);
for(i=0;i<numTransfers;++i)
memcpy( dest, source, numBytes);
clock_gettime(CLOCK_MONOTONIC_RAW,&end);
totalTimeMs=(end.tv_nsec-start.tv_nsec)*.000001+1000*(end.tv_sec-start.tv_sec);
printf("memcpy %ld bytes %ld times (%.2fGB total) in %ldms (%.3fGB/s).\n",numBytes,numTransfers,numBytes/1024.0/1024/1024*numTransfers,totalTimeMs,numBytes/1024.0/1024/1024*1000*numTransfers/totalTimeMs);
free(source);
free(dest);
return …Run Code Online (Sandbox Code Playgroud)