快速 C++ 符号函数

jpm*_*orr 0 c++ benchmarking timing

在我的代码中,我在循环中多次对 double 进行符号检查,并且该循环通常在执行期间运行数百万次。

我的符号检查是一个非常基本的计算,fabs()所以我认为必须有其他方法可以更快,因为“分割很慢”。我遇到了一个模板函数,copysign()并创建了一个简单的程序来运行速度比较。我已经用下面的代码测试了三种可能的解决方案。

// C++ program to find out execution time of  of functions 
#include <chrono> 
#include <iostream> 
#include <math.h>

using namespace std; 
using namespace std::chrono; 

template<typename Clock>

void printResult(const std::string name, std::chrono::time_point<Clock> start, std::chrono::time_point<Clock> stop, const int iterations)
{
    // Get duration. 
    std::chrono::duration my_duration = duration_cast<nanoseconds>(stop - start); 
    my_duration /= iterations;

    cout << "Time taken by "<< name <<" function: " << my_duration.count() << " ns avg. for " << iterations << " iterations." << endl << endl; 
}


template <typename T> int sgn(T val) 
{
    return (T(0) < val) - (val < T(0));
}


int main() {

    // ***************************************************************** //
    int numiters = 100000000;
    double vel = -0.6574;
    double result = 0;
    
    // Get starting timepoint 
    auto start_1 = high_resolution_clock::now(); 
    for(int x = 0; x < numiters; x++) 
    {

        result = (vel/fabs(vel)) * 12.1;

    }

    // Get ending timepoint 
    auto stop_1 = high_resolution_clock::now(); 
    cout << "Result is: " << result << endl;
    printResult("fabs", start_1, stop_1, numiters);

    // Get starting timepoint 
    result = 0;
    auto start_2 = high_resolution_clock::now(); 
    for(int x = 0; x < numiters; x++) 
    {

        result = sgn(vel) * 12.1;

    }

    // Get ending timepoint 
    auto stop_2 = high_resolution_clock::now(); 
    cout << "Result is: " << result << endl;
    printResult("sgn", start_2, stop_2, numiters);


    // Get starting timepoint 
    result = 0;
    auto start_10 = high_resolution_clock::now(); 
    for(int x = 0; x < numiters; x++) 
    {

        result = copysign(12.1, vel);

    }

    // Get ending timepoint 
    auto stop_10 = high_resolution_clock::now(); 
    cout << "Result is: " << result << endl;
    printResult("copysign", start_10, stop_10, numiters);

    cout << endl;


}
Run Code Online (Sandbox Code Playgroud)

当我运行程序时,我有点惊讶地发现fabs()解决方案和copysign解决方案在执行时间上几乎相同。此外,当我多次运行时,我发现结果可能会变化很大。

我的时间正确吗?有没有比我测试过的三个例子更好的方法来做我正在做的事情?

更新

我已经在quick-bench.com上实现了测试,在那里可以指定编译器设置,并且所有 3 个结果在那里似乎几乎相同。我想我可能搞错了:https : //quick-bench.com/q/PJiAmoC2NQIJyuvbdz5ZHUALu2M

Moo*_*uck 5

您的测试无效,因为您正在定时内阻塞 I/O。

但是,我们可以使用 quick-bench 来分析:https : //quick-bench.com/q/gt2KzKOFP4iV3ajmqANL_MhnMZk。这表明时间几乎完全相同。编译器生成的汇编代码呢?

double result = (vel/fabs(vel)) * 12.1;
   movabs $0xc028333333333333,%rax
   mov    %rax,0x8(%rsp)
   add    $0xffffffffffffffff,%rbx


double result = sgn(vel) * 12.1;
   movabs $0xc028333333333333,%rax
   mov    %rax,0x8(%rsp)
   add    $0xffffffffffffffff,%rbx


double result = copysign(12.1, vel);
   movabs $0xc028333333333333,%rax
   mov    %rax,0x8(%rsp)
   add    $0xffffffffffffffff,%rbx
Run Code Online (Sandbox Code Playgroud)

优化代码时:答案始终是首先进行测量以找出程序中实际最慢的部分,然后重写它以完全不执行任何代码。