小编Jef*_*Guy的帖子

为什么memset会变慢?

我的CPU的规格说它应该为内存带来5.336GB/s的带宽.为了测试这个,我写了一个简单的程序,在一个大数组上运行memset(或memcpy)并报告时间.我在memset上显示3.8GB/s,在memcpy上显示1.9GB/s. http://en.wikipedia.org/wiki/Intel_Core_(microarchitecture)说我的Q9400应该达到5.336MB/s.怎么了?

我试过用赋值循环替换memset或memcpy.我已经google了一下,试图了解内存对齐情况.我尝试过不同的编译器标志.我花了很多时间在这上面尴尬.感谢您的任何帮助,您可以提供!

我正在使用Ubuntu 12.04和libc-dev版本2.15-0ubuntu10.5和内核3.8.0-37-generic

代码:

#include <stdio.h>
#include <time.h>
#include <string.h>
#include <stdlib.h>

#define numBytes ((long)(1024*1024*1024))
#define numTransfers ((long)(8))

int main(int argc,char**argv){
    if(argc!=3){
        printf("Usage: %s BLOCK_SIZE_IN_BYTES NUMBER_OF_BLOCKS_TO_TRANSFER\n",argv[0]);
        return -1;
    }
    char*__restrict__ source=(char*)malloc(numBytes);
    char*__restrict__ dest=(char*)malloc(numBytes);
    struct timespec start,end;
    long totalTimeMs;
    int i;

    clock_gettime(CLOCK_MONOTONIC_RAW,&start);
    for(i=0;i<numTransfers;++i)
        memset(source,0,numBytes);
    clock_gettime(CLOCK_MONOTONIC_RAW,&end);
    totalTimeMs=(end.tv_nsec-start.tv_nsec)*.000001+1000*(end.tv_sec-start.tv_sec);
    printf("memset %ld bytes %ld times (%.2fGB total) in %ldms (%.3fGB/s). ",numBytes,numTransfers,numBytes/1024.0/1024/1024*numTransfers,totalTimeMs,numBytes/1024.0/1024/1024*1000*numTransfers/totalTimeMs);

    clock_gettime(CLOCK_MONOTONIC_RAW,&start);
    for(i=0;i<numTransfers;++i)
        memcpy( dest, source, numBytes);
    clock_gettime(CLOCK_MONOTONIC_RAW,&end);
    totalTimeMs=(end.tv_nsec-start.tv_nsec)*.000001+1000*(end.tv_sec-start.tv_sec);
    printf("memcpy %ld bytes %ld times (%.2fGB total) in %ldms (%.3fGB/s).\n",numBytes,numTransfers,numBytes/1024.0/1024/1024*numTransfers,totalTimeMs,numBytes/1024.0/1024/1024*1000*numTransfers/totalTimeMs);

    free(source);
    free(dest);

    return …
Run Code Online (Sandbox Code Playgroud)

optimization memset memcpy memory-bandwidth

7
推荐指数
1
解决办法
3372
查看次数

标签 统计

memcpy ×1

memory-bandwidth ×1

memset ×1

optimization ×1