全局变量会降低代码速度

Question

全局变量会降低代码速度

Fan*_*Fox 10 c++ performance global-variables

我正在搞乱我写的最糟糕的代码,(基本上是试图破解)我注意到这段代码:

for(int i = 0; i < N; ++i)
    tan(tan(tan(tan(tan(tan(tan(tan(x++))))))));
end
std::cout << x;

Run Code Online (Sandbox Code Playgroud)

其中N是一个全局变量,运行速度明显慢于:

int N = 10000;
for(int i = 0; i < N; ++i)
    tan(tan(tan(tan(tan(tan(tan(tan(x++))))))));
end
std::cout << x;

Run Code Online (Sandbox Code Playgroud)

全局变量会使它运行得慢,会发生什么？

Answer 1

rtp*_*tpg 8

tl; dr:本地版本将N保留在寄存器中,而全局版本则不然.使用const声明常量,无论你如何声明,它都会更快.

这是我使用的示例代码:

#include <iostream>
#include <math.h>
void first(){
  int x=1;
  int N = 10000;
  for(int i = 0; i < N; ++i)
    tan(tan(tan(tan(tan(tan(tan(tan(x++))))))));
  std::cout << x;
}
int N=10000;
void second(){
  int x=1;
  for(int i = 0; i < N; ++i)
    tan(tan(tan(tan(tan(tan(tan(tan(x++))))))));
  std::cout << x;
}
int main(){
  first();
  second();
}

Run Code Online (Sandbox Code Playgroud)

(命名test.cpp).

要查看我运行的汇编程序代码g++ -S test.cpp.

我有一个巨大的文件,但有一些聪明的搜索(我搜索棕褐色),我找到了我想要的东西:

从first功能:

Ltmp2:
    movl    $1, -4(%rbp)
    movl    $10000, -8(%rbp) ; N is here !!!
    movl    $0, -12(%rbp)    ;initial value of i is here
    jmp LBB1_2       ;goto the 'for' code logic
LBB1_1:             ;the loop is this segment
    movl    -4(%rbp), %eax
    cvtsi2sd    %eax, %xmm0
    movl    -4(%rbp), %eax
    addl    $1, %eax
    movl    %eax, -4(%rbp)
    callq   _tan
    callq   _tan
    callq   _tan
    callq   _tan
    callq   _tan        
    callq   _tan
    callq   _tan
    movl    -12(%rbp), %eax
    addl    $1, %eax
    movl    %eax, -12(%rbp) 
LBB1_2:
    movl    -12(%rbp), %eax ;value of n kept in register 
    movl    -8(%rbp), %ecx  
    cmpl    %ecx, %eax  ;comparing N and i here
    jl  LBB1_1      ;if less, then go into loop code
    movl    -4(%rbp), %eax

Run Code Online (Sandbox Code Playgroud)

第二功能:

Ltmp13:
    movl    $1, -4(%rbp)    ;i
    movl    $0, -8(%rbp) 
    jmp LBB5_2
LBB5_1:             ;loop is here
    movl    -4(%rbp), %eax
    cvtsi2sd    %eax, %xmm0
    movl    -4(%rbp), %eax
    addl    $1, %eax
    movl    %eax, -4(%rbp)
    callq   _tan
    callq   _tan
    callq   _tan
    callq   _tan
    callq   _tan
    callq   _tan
    callq   _tan
    movl    -8(%rbp), %eax
    addl    $1, %eax
    movl    %eax, -8(%rbp)
LBB5_2:
    movl    _N(%rip), %eax  ;loading N from globals at every iteration, instead of keeping it in a register
    movl    -8(%rbp), %ecx

Run Code Online (Sandbox Code Playgroud)

因此,从汇编代码中你可以看到(或没有),在本地版本,N是整个计算过程中保持在寄存器中,而在全球版本,N是从全球在每次迭代重读.

我想这种情况发生的主要原因是线程之类的东西,编译器无法确定N是否未被修改.

如果你const在N(const int N=10000)的声明中添加一个,它会比本地版本更快:

    movl    -8(%rbp), %eax
    addl    $1, %eax
    movl    %eax, -8(%rbp)
LBB5_2:
    movl    -8(%rbp), %eax
    cmpl    $9999, %eax ;9999 used instead of 10000 for some reason I do not know
    jle LBB5_1

Run Code Online (Sandbox Code Playgroud)

N由常数代替.

线程不会出现在编译器的决策过程中,因为竞争条件是未定义的行为.每次迭代都会加载`N`,因为编译器不能确定`tan`不会修改`N`. (4认同)

Answer 2

Lor*_*tel 7

无法优化全局版本以将其放入寄存器中.

Answer 3

Arn*_*jar 7

我对@rtpg的问题和答案做了一点实验,

试验这个问题

在main1.h文件中的全局N变量

int N = 10000;

Run Code Online (Sandbox Code Playgroud)

然后在main1.c文件中,1000个计算的情况:

#include <stdio.h>
#include "sys/time.h"
#include "math.h"
#include "main1.h"



extern int N;

int main(){

        int k = 0;
        timeval static_start, static_stop;
        int x = 0;

        int y = 0;
        timeval start, stop;
        int M = 10000;

        while(k <= 1000){

                gettimeofday(&static_start, NULL);
                for (int i=0; i<N; ++i){
                        tan(tan(tan(tan(tan(tan(tan(tan(x++))))))));
                }
                gettimeofday(&static_stop, NULL);

                gettimeofday(&start, NULL);
                for (int j=0; j<M; ++j){
                        tan(tan(tan(tan(tan(tan(tan(tan(y++))))))));
                }
                gettimeofday(&stop, NULL);

                int first_interval = static_stop.tv_usec - static_start.tv_usec;
                int last_interval = stop.tv_usec - start.tv_usec;

                if(first_interval >=0 && last_interval >= 0){
                        printf("%d, %d\n", first_interval, last_interval);
                }

                k++;
        }

        return 0;
}

Run Code Online (Sandbox Code Playgroud)

结果显示在以下直方图中(频率/微秒):

两种方法中比较输出时间的直方图红色框是基于循环(N)结束的非全局变量,而基于循环(非全局)的M结束的透明绿色.