C++“无原始循环”而不损失性能

Question

C++“无原始循环”而不损失性能

所以“新（旧）大事”是 C++ 中的“无原始循环”。我正在尝试以这种方式编写代码，但似乎效率很低。是的，有些 STL 算法可以做任何事情，但它们似乎效率不高。

例如，我有一种情况，我想要一个指向节点数组中得分最高的节点的指针。确定该分数是一项代价高昂的浮点运算。所以我实现了STL算法版本并将其与原始循环进行了比较：

#include <cfloat>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>

static int counter;

class Node {
public:
    auto Score() const -> double {
        std::cout << "complex calculation\n";
        counter++;
        return 1;
    }
};

int main()
{
    
    std::array<Node, 10> nodes;
    
    counter = 0;
    Node const* nodePtr = std::max_element(std::cbegin(nodes), std::cend(nodes),
        [](Node const& node1, Node const& node2) {
            return node1.Score() < node2.Score();
        });
    std::cout << "algorithm count " << counter << std::endl;
    
    counter = 0;
    double maxScore = -FLT_MAX;
    for (const auto& node : nodes) {
        auto score = node.Score();
        if (score > maxScore) {
            maxScore = score;
            nodePtr = &node;
        }
    }
    std::cout << "raw loop count " << counter << std::endl;
}

Run Code Online (Sandbox Code Playgroud)

对此进行评估，对于 STL 版本，昂贵的 Score 函数被评估了 18 次，而原始循环仅使用了 10 次评估......

我做错了吗，还是原始循环并没有那么糟糕？

编辑：在 cout 和静态计数器的建议user58697会阻止编译器优化之后，我更改了代码：

#include <cfloat>
#include <cmath>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>
#include <random>
#include <chrono>

template <typename T>
class Random {
private:
    std::default_random_engine generator;
    std::uniform_real_distribution<T> distribution;
public:
    Random()
        : generator()
        , distribution(0.0, 1.0)
    {}

    auto operator()() {
        return distribution(generator);
    };
};

static Random<double> myRandom;

class Timer {
private:
    std::chrono::high_resolution_clock::time_point startTime{};
public:
    void Start() noexcept {
        startTime = std::chrono::high_resolution_clock::now();
    }
    [[nodiscard]] auto ElapsedMs() const noexcept {
        return std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - startTime).count();
    }
};

static Timer timer;

class Node {
private:
    double val;
public:
    Node() noexcept : val(myRandom()) {}

    [[nodiscard]] auto Score() const noexcept {
        auto score = std::sqrt(std::log(10.0 / val));
        score = std::sin(score) / std::cos(score);
        score = std::sqrt(std::sqrt(std::sqrt(std::sqrt(std::sqrt(score)))));
        score = std::pow(score, 1000);
        return score;
    }
};

int main()
{
    std::array<Node, 100000> nodes; // yeah, yeah... overloading the stack, I know

    for (auto i = 0; i < 2; i++) {
        timer.Start();
        Node const* nodePtr = &*std::max_element(std::cbegin(nodes), std::cend(nodes),
            [](Node const& node1, Node const& node2) {
                return node1.Score() < node2.Score();
            });
        std::cout << "algorithm elapsed time " << timer.ElapsedMs() << std::endl;

        timer.Start();
        double maxScore = -FLT_MAX;
        for (const auto& node : nodes) {
            auto score = node.Score();
            if (score > maxScore) {
                maxScore = score;
                nodePtr = &node;
            }
        }
        std::cout << "raw loop count " << timer.ElapsedMs() << std::endl;
    }
}

Run Code Online (Sandbox Code Playgroud)

我运行循环两次以消除启动行为...第二次循环的结果（用 g++ 9.1 -O3 编译）：

algorithm elapsed time 16
raw loop count 8 (<== I see I forgot to change "count" to "time" :P)

Run Code Online (Sandbox Code Playgroud)

所以不是这样的。

编辑：看到这个问题得到了赞成，人们仍在关注它。自从提出这个问题后，C++20就发布了。C++20 的范围库有一个特殊的功能，可以在这里提供帮助，称为Projection。

即在这种情况下，您可以使用std::ranges::max_element或什至std::ranges::max（旧的标准算法中缺少），例如

#include <cfloat>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>

static int counter;

class Node {
public:
    auto Score() const -> double {
        std::cout << "complex calculation\n";
        counter++;
        return 1;
    }
};

int main()
{
    
    std::array<Node, 10> nodes;
    
    counter = 0;
    Node const* nodePtr = std::max_element(std::cbegin(nodes), std::cend(nodes),
        [](Node const& node1, Node const& node2) {
            return node1.Score() < node2.Score();
        });
    std::cout << "algorithm count " << counter << std::endl;
    
    counter = 0;
    double maxScore = -FLT_MAX;
    for (const auto& node : nodes) {
        auto score = node.Score();
        if (score > maxScore) {
            maxScore = score;
            nodePtr = &node;
        }
    }
    std::cout << "raw loop count " << counter << std::endl;
}

Run Code Online (Sandbox Code Playgroud)

然而，由于实现选择不使用缓存，投影并不是这里的解决方案。对于比较器函数的每个参数，Proj都会一次又一次地调用投影函数。Comp

（内部函数调用看起来像

#include <cfloat> #include <cmath> #include <iostream> #include <array> #include <algorithm> #include <numeric> #include <random> #include <chrono> template <typename T> class Random { private: std::default_random_engine generator; std::uniform_real_distribution<T> distribution; public: Random() : generator() , distribution(0.0, 1.0) {} auto operator()() { return distribution(generator); }; }; static Random<double> myRandom; class Timer { private: std::chrono::high_resolution_clock::time_point startTime{}; public: void Start() noexcept { startTime = std::chrono::high_resolution_clock::now(); } [[nodiscard]] auto ElapsedMs() const noexcept { return std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - startTime).count(); } }; static Timer timer; class Node { private: double val; public: Node() noexcept : val(myRandom()) {} [[nodiscard]] auto Score() const noexcept { auto score = std::sqrt(std::log(10.0 / val)); score = std::sin(score) / std::cos(score); score = std::sqrt(std::sqrt(std::sqrt(std::sqrt(std::sqrt(score))))); score = std::pow(score, 1000); return score; } }; int main() { std::array<Node, 100000> nodes; // yeah, yeah... overloading the stack, I know for (auto i = 0; i < 2; i++) { timer.Start(); Node const* nodePtr = &*std::max_element(std::cbegin(nodes), std::cend(nodes), [](Node const& node1, Node const& node2) { return node1.Score() < node2.Score(); }); std::cout << "algorithm elapsed time " << timer.ElapsedMs() << std::endl; timer.Start(); double maxScore = -FLT_MAX; for (const auto& node : nodes) { auto score = node.Score(); if (score > maxScore) { maxScore = score; nodePtr = &node; } } std::cout << "raw loop count " << timer.ElapsedMs() << std::endl; } }
Run Code Online (Sandbox Code Playgroud)
）

Answer 1

Ric*_*ard 6

用抽象算法替换原始循环是一种很好的风格，因为这样您就可以多次重复使用该算法，但只测试一次。以这种方式包装循环可能看起来像语法糖，但它大大减少了代码中出现错误的可能性，因为您现在可以对抽象算法进行广泛的单元测试，并且您永远不需要担心在需要时错误地实现它。

然而，您在这里比较的是苹果和橙子。您的max_element实现始终会计算Score()其比较，而您的for循环会缓存函数的结果Score()。

更好的实现Node可能是：

class Node {
mutable:
    double cached_score = std::numeric_limits<double>::quiet_Nan();
public:
    auto Score() const -> double {
        if(std::isnan(cached_score)){
           std::cout << "complex calculation\n";
           counter++;
           cached_score = 1;
        }
        return cached_score;
    }
    void invalidate_cache() {
      cached_score = std::numeric_limits<double>::quiet_Nan();
    }
};

Run Code Online (Sandbox Code Playgroud)

这样复杂的计算只执行一次。

或者，编写您自己的抽象：

#include <cfloat>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>

static int counter;

class Node {
public:
    auto Score() const -> double {
        std::cout << "complex calculation\n";
        counter++;
        return 1;
    }
};

template<class ForwardIt, class Evaluate, class Compare>
ForwardIt max_eval_element(
    ForwardIt first,
    ForwardIt last,
    Evaluate eval,
    Compare comp
){
    if (first == last) return last;

    ForwardIt largest = first;
    auto largest_val = eval(*first);
    ++first;
    for (; first != last; ++first) {
        const auto this_val = eval(*first);
        if (comp(largest_val, this_val)) {
            largest = first;
            largest_val = this_val;
        }
    }
    return largest;
}

int main()
{

    std::array<Node, 10> nodes;

    counter = 0;
    Node const* nodePtr = max_eval_element(std::cbegin(nodes), std::cend(nodes),
                                       [](Node const& node){ return node.Score(); },
                                       [](double const &a, double const &b) {
                                           return a<b;
                                       });
    std::cout << "algorithm count " << counter << std::endl;

    counter = 0;
    double maxScore = -FLT_MAX;
    for (const auto& node : nodes) {
        auto score = node.Score();
        if (score > maxScore) {
            maxScore = score;
            nodePtr = &node;
        }
    }
    std::cout << "raw loop count " << counter << std::endl;
}

Run Code Online (Sandbox Code Playgroud)

在这种情况下，两个循环执行相同数量的评估。

我使用过的许多内部代码库都有扩展 STL 的扩展库。它使我工作过的团队更加确信他们的代码已正确编写，并允许您一目了然地解释复杂的操作。这样一来，这些抽象也减少了理解代码的工作量和沟通的工作量。

@JHBonarius：请参阅我答案的第一句话。您将使用对通用函数的调用来替换复杂的内联操作，您可以为其编写大量的单元测试。如果您在项目中使用两次“max_eval_element”，则与原始循环相比，您需要执行的测试量会减少一半。它还为您提供了可以在项目之间移动的功能。如果您查看 [`std::max_element`](https://en.cppreference.com/w/cpp/algorithm/max_element) 的文档，您会发现它也只是一个抽象。这里没有魔法，只有良好的设计实践。 (2认同)

归档时间：	6 年，4 月前
查看次数：	2270 次
最近记录：	3 年，6 月前