jpm*_*orr 0 c++ benchmarking timing
在我的代码中,我在循环中多次对 double 进行符号检查,并且该循环通常在执行期间运行数百万次。
我的符号检查是一个非常基本的计算,fabs()所以我认为必须有其他方法可以更快,因为“分割很慢”。我遇到了一个模板函数,copysign()并创建了一个简单的程序来运行速度比较。我已经用下面的代码测试了三种可能的解决方案。
// C++ program to find out execution time of of functions
#include <chrono>
#include <iostream>
#include <math.h>
using namespace std;
using namespace std::chrono;
template<typename Clock>
void printResult(const std::string name, std::chrono::time_point<Clock> start, std::chrono::time_point<Clock> stop, const int iterations)
{
// Get duration.
std::chrono::duration my_duration = duration_cast<nanoseconds>(stop - start);
my_duration /= iterations;
cout << "Time taken by "<< name <<" function: " << my_duration.count() << " ns avg. for " << iterations << " iterations." << endl << endl;
}
template <typename T> int sgn(T val)
{
return (T(0) < val) - (val < T(0));
}
int main() {
// ***************************************************************** //
int numiters = 100000000;
double vel = -0.6574;
double result = 0;
// Get starting timepoint
auto start_1 = high_resolution_clock::now();
for(int x = 0; x < numiters; x++)
{
result = (vel/fabs(vel)) * 12.1;
}
// Get ending timepoint
auto stop_1 = high_resolution_clock::now();
cout << "Result is: " << result << endl;
printResult("fabs", start_1, stop_1, numiters);
// Get starting timepoint
result = 0;
auto start_2 = high_resolution_clock::now();
for(int x = 0; x < numiters; x++)
{
result = sgn(vel) * 12.1;
}
// Get ending timepoint
auto stop_2 = high_resolution_clock::now();
cout << "Result is: " << result << endl;
printResult("sgn", start_2, stop_2, numiters);
// Get starting timepoint
result = 0;
auto start_10 = high_resolution_clock::now();
for(int x = 0; x < numiters; x++)
{
result = copysign(12.1, vel);
}
// Get ending timepoint
auto stop_10 = high_resolution_clock::now();
cout << "Result is: " << result << endl;
printResult("copysign", start_10, stop_10, numiters);
cout << endl;
}
Run Code Online (Sandbox Code Playgroud)
当我运行程序时,我有点惊讶地发现fabs()解决方案和copysign解决方案在执行时间上几乎相同。此外,当我多次运行时,我发现结果可能会变化很大。
我的时间正确吗?有没有比我测试过的三个例子更好的方法来做我正在做的事情?
我已经在quick-bench.com上实现了测试,在那里可以指定编译器设置,并且所有 3 个结果在那里似乎几乎相同。我想我可能搞错了:https : //quick-bench.com/q/PJiAmoC2NQIJyuvbdz5ZHUALu2M
您的测试无效,因为您正在定时内阻塞 I/O。
但是,我们可以使用 quick-bench 来分析:https : //quick-bench.com/q/gt2KzKOFP4iV3ajmqANL_MhnMZk。这表明时间几乎完全相同。编译器生成的汇编代码呢?
double result = (vel/fabs(vel)) * 12.1;
movabs $0xc028333333333333,%rax
mov %rax,0x8(%rsp)
add $0xffffffffffffffff,%rbx
double result = sgn(vel) * 12.1;
movabs $0xc028333333333333,%rax
mov %rax,0x8(%rsp)
add $0xffffffffffffffff,%rbx
double result = copysign(12.1, vel);
movabs $0xc028333333333333,%rax
mov %rax,0x8(%rsp)
add $0xffffffffffffffff,%rbx
Run Code Online (Sandbox Code Playgroud)
优化代码时:答案始终是首先进行测量以找出程序中实际最慢的部分,然后重写它以完全不执行任何代码。