GCC优化对比特操作的有效性

Question

GCC优化对比特操作的有效性

Dum*_*Guy 13 c optimization x86 assembly

以下是在x86-64上设置C中的单个位的两种方法:

inline void SetBitC(long *array, int bit) {
   //Pure C version
   *array |= 1<<bit;
}

inline void SetBitASM(long *array, int bit) {
   // Using inline x86 assembly
   asm("bts %1,%0" : "+r" (*array) : "g" (bit));
}

Run Code Online (Sandbox Code Playgroud)

使用带有-O3 -march=core2选项的GCC 4.3,与常量一起使用时,C版本需要大约90%的时间bit.(两个版本编译为完全相同的汇编代码,但C版本使用or [1<<num],%rax指令而不是bts [num],%rax指令)

与变量一起使用时bit,C版本表现更好,但仍然明显慢于内联汇编.

重置,切换和检查位具有类似的结果.

为什么GCC对这种常见操作的优化程度如此之差？我是否在使用C版本做错了什么？

编辑:对不起,等待漫长的等待,这是我用来进行基准测试的代码.它实际上是一个简单的编程问题...

int main() {
    // Get the sum of all integers from 1 to 2^28 with bit 11 always set
    unsigned long i,j,c=0;
    for (i=1; i<(1<<28); i++) {
        j = i;
        SetBit(&j, 10);
        c += j;
    }
    printf("Result: %lu\n", c);
    return 0;
}

gcc -O3 -march=core2 -pg test.c
./a.out
gprof
with ASM: 101.12      0.08     0.08                             main
with C:   101.12      0.16     0.16                             main

Run Code Online (Sandbox Code Playgroud)

time ./a.out 也给出了类似的结果.