有和没有opt.flag -O3(C++/C)的浮点除法速度不同的原因

smi*_*dha 2 c c++ floating-point gcc

我试图测量C++中单精度除法与双精度除法的速度差异

这是我编写的简单代码.

#include <iostream>
#include <time.h>

int main(int argc, char *argv[])
{

  float     f_x = 45672.0;
  float     f_y = 67783.0;
  double    d_x = 45672.0;
  double    d_y = 67783.0;

  float     f_answer;
  double    d_answer;

  clock_t   start,stop;
  int       N = 200000000 //2*10^8


 start = clock();
 for (int i = 0; i < N; ++i)
  {
    f_answer = f_x/f_y;
  }
 stop = clock();
 std::cout<<"Single Precision:"<< (stop-start)/(double)CLOCKS_PER_SEC<<"    "<<f_answer <<std::endl;


start = clock();
for (int i = 0; i < N; ++i)
  {
    d_answer = d_x/d_y;
  }
stop = clock();
std::cout<<"Double precision:" <<(stop-start)/(double)CLOCKS_PER_SEC<<"   "<< d_answer<<std::endl;

return 0;
}
Run Code Online (Sandbox Code Playgroud)

当我编译代码时没有优化,因为g++ test.cpp我得到了以下输出

Desktop: ./a.out
Single precision:8.06    0.673797
Double precision:12.68   0.673797
Run Code Online (Sandbox Code Playgroud)

但如果我编译它g++ -O3 test.cpp然后我得到

Desktop: ./a.out
Single precision:0    0.673797
Double precision:0   0.673797
Run Code Online (Sandbox Code Playgroud)

我是如何得到如此大幅度的性能提升的?由于clock()功能的低分辨率,在第二种情况下显示的时间是0 .编译器是否以某种方式检测到每个for循环迭代是否独立于先前的迭代?

Oli*_*rth 7

可能是因为编译器将循环优化为单次迭代.它甚至可能在编译时进行了划分.

检查可执行文件的汇编程序以确定(使用例如objdump).

  • @Omni:是的. (2认同)

Omn*_*ous 5

看看你得到的程序集,g++ -O3 -S很明显循环和所有浮点计算(除了涉及时间的那些)都已经优化了:

        .section        .text.startup,"ax",@progbits
        .p2align 4,,15
        .globl  main
        .type   main, @function
main:
.LFB970:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        pushq   %rbx
        .cfi_def_cfa_offset 24
        .cfi_offset 3, -24
        subq    $24, %rsp
        .cfi_def_cfa_offset 48
        call    clock
        movq    %rax, %rbx
        call    clock
        movq    %rax, %rbp
        movl    $.LC0, %esi
        movl    std::cout, %edi
        subq    %rbx, %rbp
        call    std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
Run Code Online (Sandbox Code Playgroud)

看到两个电话clock,一个接一个?在那之前,只有一些堆栈维护说明.是的,那些循环完全消失了.

您只使用f_answerd_answer打印出一个可以在编译时轻松计算的答案,编译器可以看到.即使拥有它们也没有意义.如果有一个在让他们没有意义的,有在无点f_x,f_y,d_x,或d_y无论是.全没了.

要解决这个问题,您需要让循环的每次迭代都依赖于上次迭代的结果.这是我解决这个问题的方法.我使用complex模板进行计算Mandlebrot集所涉及的一些计算:

#include <iostream>
#include <time.h>
#include <complex>

int main(int argc, char *argv[])
{
   using ::std::complex;
   using ::std::cout;

   const complex<float> f_coord(0.1, 0.1);
   const complex<double> d_coord(0.1, 0.1);

   complex<float> f_answer(0, 0);
   complex<double> d_answer(0, 0);

   clock_t   start, stop;
   const unsigned int N = 200000000; //2*10^8

   start = clock();
   for (unsigned int i = 0; i < N; ++i)
   {
      f_answer = (f_answer * f_answer) + f_coord;
   }
   stop = clock();
   cout << "Single Precision: " << (stop-start)/(double)CLOCKS_PER_SEC
        << "    " << f_answer << '\n';


   start = clock();
   for (unsigned int i = 0; i < N; ++i)
   {
      d_answer = (d_answer * d_answer) + d_coord;
   }
   stop = clock();
   cout << "Double precision: " <<(stop-start)/(double)CLOCKS_PER_SEC
        << "   " << d_answer << '\n';

   return 0;
}
Run Code Online (Sandbox Code Playgroud)