JHB*_*ius 6 c++ algorithm loops
所以“新(旧)大事”是 C++ 中的“无原始循环”。我正在尝试以这种方式编写代码,但似乎效率很低。是的,有些 STL 算法可以做任何事情,但它们似乎效率不高。
例如,我有一种情况,我想要一个指向节点数组中得分最高的节点的指针。确定该分数是一项代价高昂的浮点运算。所以我实现了STL算法版本并将其与原始循环进行了比较:
#include <cfloat>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>
static int counter;
class Node {
public:
auto Score() const -> double {
std::cout << "complex calculation\n";
counter++;
return 1;
}
};
int main()
{
std::array<Node, 10> nodes;
counter = 0;
Node const* nodePtr = std::max_element(std::cbegin(nodes), std::cend(nodes),
[](Node const& node1, Node const& node2) {
return node1.Score() < node2.Score();
});
std::cout << "algorithm count " << counter << std::endl;
counter = 0;
double maxScore = -FLT_MAX;
for (const auto& node : nodes) {
auto score = node.Score();
if (score > maxScore) {
maxScore = score;
nodePtr = &node;
}
}
std::cout << "raw loop count " << counter << std::endl;
}
Run Code Online (Sandbox Code Playgroud)
对此进行评估,对于 STL 版本,昂贵的 Score 函数被评估了 18 次,而原始循环仅使用了 10 次评估......
我做错了吗,还是原始循环并没有那么糟糕?
编辑:在 cout 和静态计数器的建议user58697会阻止编译器优化之后,我更改了代码:
#include <cfloat>
#include <cmath>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>
#include <random>
#include <chrono>
template <typename T>
class Random {
private:
std::default_random_engine generator;
std::uniform_real_distribution<T> distribution;
public:
Random()
: generator()
, distribution(0.0, 1.0)
{}
auto operator()() {
return distribution(generator);
};
};
static Random<double> myRandom;
class Timer {
private:
std::chrono::high_resolution_clock::time_point startTime{};
public:
void Start() noexcept {
startTime = std::chrono::high_resolution_clock::now();
}
[[nodiscard]] auto ElapsedMs() const noexcept {
return std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - startTime).count();
}
};
static Timer timer;
class Node {
private:
double val;
public:
Node() noexcept : val(myRandom()) {}
[[nodiscard]] auto Score() const noexcept {
auto score = std::sqrt(std::log(10.0 / val));
score = std::sin(score) / std::cos(score);
score = std::sqrt(std::sqrt(std::sqrt(std::sqrt(std::sqrt(score)))));
score = std::pow(score, 1000);
return score;
}
};
int main()
{
std::array<Node, 100000> nodes; // yeah, yeah... overloading the stack, I know
for (auto i = 0; i < 2; i++) {
timer.Start();
Node const* nodePtr = &*std::max_element(std::cbegin(nodes), std::cend(nodes),
[](Node const& node1, Node const& node2) {
return node1.Score() < node2.Score();
});
std::cout << "algorithm elapsed time " << timer.ElapsedMs() << std::endl;
timer.Start();
double maxScore = -FLT_MAX;
for (const auto& node : nodes) {
auto score = node.Score();
if (score > maxScore) {
maxScore = score;
nodePtr = &node;
}
}
std::cout << "raw loop count " << timer.ElapsedMs() << std::endl;
}
}
Run Code Online (Sandbox Code Playgroud)
我运行循环两次以消除启动行为...第二次循环的结果(用 g++ 9.1 -O3 编译):
algorithm elapsed time 16
raw loop count 8 (<== I see I forgot to change "count" to "time" :P)
Run Code Online (Sandbox Code Playgroud)
所以不是这样的。
编辑:看到这个问题得到了赞成,人们仍在关注它。自从提出这个问题后,C++20就发布了。C++20 的范围库有一个特殊的功能,可以在这里提供帮助,称为Projection。
即在这种情况下,您可以使用std::ranges::max_element或什至std::ranges::max(旧的标准算法中缺少),例如
#include <cfloat>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>
static int counter;
class Node {
public:
auto Score() const -> double {
std::cout << "complex calculation\n";
counter++;
return 1;
}
};
int main()
{
std::array<Node, 10> nodes;
counter = 0;
Node const* nodePtr = std::max_element(std::cbegin(nodes), std::cend(nodes),
[](Node const& node1, Node const& node2) {
return node1.Score() < node2.Score();
});
std::cout << "algorithm count " << counter << std::endl;
counter = 0;
double maxScore = -FLT_MAX;
for (const auto& node : nodes) {
auto score = node.Score();
if (score > maxScore) {
maxScore = score;
nodePtr = &node;
}
}
std::cout << "raw loop count " << counter << std::endl;
}
Run Code Online (Sandbox Code Playgroud)
然而,由于实现选择不使用缓存,投影并不是这里的解决方案。对于比较器函数的每个参数,Proj都会一次又一次地调用投影函数。Comp
(内部函数调用看起来像
#include <cfloat>
#include <cmath>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>
#include <random>
#include <chrono>
template <typename T>
class Random {
private:
std::default_random_engine generator;
std::uniform_real_distribution<T> distribution;
public:
Random()
: generator()
, distribution(0.0, 1.0)
{}
auto operator()() {
return distribution(generator);
};
};
static Random<double> myRandom;
class Timer {
private:
std::chrono::high_resolution_clock::time_point startTime{};
public:
void Start() noexcept {
startTime = std::chrono::high_resolution_clock::now();
}
[[nodiscard]] auto ElapsedMs() const noexcept {
return std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - startTime).count();
}
};
static Timer timer;
class Node {
private:
double val;
public:
Node() noexcept : val(myRandom()) {}
[[nodiscard]] auto Score() const noexcept {
auto score = std::sqrt(std::log(10.0 / val));
score = std::sin(score) / std::cos(score);
score = std::sqrt(std::sqrt(std::sqrt(std::sqrt(std::sqrt(score)))));
score = std::pow(score, 1000);
return score;
}
};
int main()
{
std::array<Node, 100000> nodes; // yeah, yeah... overloading the stack, I know
for (auto i = 0; i < 2; i++) {
timer.Start();
Node const* nodePtr = &*std::max_element(std::cbegin(nodes), std::cend(nodes),
[](Node const& node1, Node const& node2) {
return node1.Score() < node2.Score();
});
std::cout << "algorithm elapsed time " << timer.ElapsedMs() << std::endl;
timer.Start();
double maxScore = -FLT_MAX;
for (const auto& node : nodes) {
auto score = node.Score();
if (score > maxScore) {
maxScore = score;
nodePtr = &node;
}
}
std::cout << "raw loop count " << timer.ElapsedMs() << std::endl;
}
}
Run Code Online (Sandbox Code Playgroud)
)
用抽象算法替换原始循环是一种很好的风格,因为这样您就可以多次重复使用该算法,但只测试一次。以这种方式包装循环可能看起来像语法糖,但它大大减少了代码中出现错误的可能性,因为您现在可以对抽象算法进行广泛的单元测试,并且您永远不需要担心在需要时错误地实现它。
然而,您在这里比较的是苹果和橙子。您的max_element实现始终会计算Score()其比较,而您的for循环会缓存函数的结果Score()。
更好的实现Node可能是:
class Node {
mutable:
double cached_score = std::numeric_limits<double>::quiet_Nan();
public:
auto Score() const -> double {
if(std::isnan(cached_score)){
std::cout << "complex calculation\n";
counter++;
cached_score = 1;
}
return cached_score;
}
void invalidate_cache() {
cached_score = std::numeric_limits<double>::quiet_Nan();
}
};
Run Code Online (Sandbox Code Playgroud)
这样复杂的计算只执行一次。
或者,编写您自己的抽象:
#include <cfloat>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>
static int counter;
class Node {
public:
auto Score() const -> double {
std::cout << "complex calculation\n";
counter++;
return 1;
}
};
template<class ForwardIt, class Evaluate, class Compare>
ForwardIt max_eval_element(
ForwardIt first,
ForwardIt last,
Evaluate eval,
Compare comp
){
if (first == last) return last;
ForwardIt largest = first;
auto largest_val = eval(*first);
++first;
for (; first != last; ++first) {
const auto this_val = eval(*first);
if (comp(largest_val, this_val)) {
largest = first;
largest_val = this_val;
}
}
return largest;
}
int main()
{
std::array<Node, 10> nodes;
counter = 0;
Node const* nodePtr = max_eval_element(std::cbegin(nodes), std::cend(nodes),
[](Node const& node){ return node.Score(); },
[](double const &a, double const &b) {
return a<b;
});
std::cout << "algorithm count " << counter << std::endl;
counter = 0;
double maxScore = -FLT_MAX;
for (const auto& node : nodes) {
auto score = node.Score();
if (score > maxScore) {
maxScore = score;
nodePtr = &node;
}
}
std::cout << "raw loop count " << counter << std::endl;
}
Run Code Online (Sandbox Code Playgroud)
在这种情况下,两个循环执行相同数量的评估。
我使用过的许多内部代码库都有扩展 STL 的扩展库。它使我工作过的团队更加确信他们的代码已正确编写,并允许您一目了然地解释复杂的操作。这样一来,这些抽象也减少了理解代码的工作量和沟通的工作量。