C++“无原始循环”而不损失性能

JHB*_*ius 6 c++ algorithm loops

所以“新(旧)大事”是 C++ 中的“无原始循环”。我正在尝试以这种方式编写代码,但似乎效率很低。是的,有些 STL 算法可以做任何事情,但它们似乎效率不高。

例如,我有一种情况,我想要一个指向节点数组中得分最高的节点的指针。确定该分数是一项代价高昂的浮点运算。所以我实现了STL算法版本并将其与原始循环进行了比较:

#include <cfloat>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>

static int counter;

class Node {
public:
    auto Score() const -> double {
        std::cout << "complex calculation\n";
        counter++;
        return 1;
    }
};

int main()
{
    
    std::array<Node, 10> nodes;
    
    counter = 0;
    Node const* nodePtr = std::max_element(std::cbegin(nodes), std::cend(nodes),
        [](Node const& node1, Node const& node2) {
            return node1.Score() < node2.Score();
        });
    std::cout << "algorithm count " << counter << std::endl;
    
    counter = 0;
    double maxScore = -FLT_MAX;
    for (const auto& node : nodes) {
        auto score = node.Score();
        if (score > maxScore) {
            maxScore = score;
            nodePtr = &node;
        }
    }
    std::cout << "raw loop count " << counter << std::endl;
}
Run Code Online (Sandbox Code Playgroud)

对此进行评估,对于 STL 版本,昂贵的 Score 函数被评估了 18 次,而原始循环仅使用了 10 次评估......

我做错了吗,还是原始循环并没有那么糟糕?

编辑:在 cout 和静态计数器的建议user58697会阻止编译器优化之后,我更改了代码:

#include <cfloat>
#include <cmath>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>
#include <random>
#include <chrono>

template <typename T>
class Random {
private:
    std::default_random_engine generator;
    std::uniform_real_distribution<T> distribution;
public:
    Random()
        : generator()
        , distribution(0.0, 1.0)
    {}

    auto operator()() {
        return distribution(generator);
    };
};

static Random<double> myRandom;

class Timer {
private:
    std::chrono::high_resolution_clock::time_point startTime{};
public:
    void Start() noexcept {
        startTime = std::chrono::high_resolution_clock::now();
    }
    [[nodiscard]] auto ElapsedMs() const noexcept {
        return std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - startTime).count();
    }
};

static Timer timer;

class Node {
private:
    double val;
public:
    Node() noexcept : val(myRandom()) {}

    [[nodiscard]] auto Score() const noexcept {
        auto score = std::sqrt(std::log(10.0 / val));
        score = std::sin(score) / std::cos(score);
        score = std::sqrt(std::sqrt(std::sqrt(std::sqrt(std::sqrt(score)))));
        score = std::pow(score, 1000);
        return score;
    }
};

int main()
{
    std::array<Node, 100000> nodes; // yeah, yeah... overloading the stack, I know

    for (auto i = 0; i < 2; i++) {
        timer.Start();
        Node const* nodePtr = &*std::max_element(std::cbegin(nodes), std::cend(nodes),
            [](Node const& node1, Node const& node2) {
                return node1.Score() < node2.Score();
            });
        std::cout << "algorithm elapsed time " << timer.ElapsedMs() << std::endl;

        timer.Start();
        double maxScore = -FLT_MAX;
        for (const auto& node : nodes) {
            auto score = node.Score();
            if (score > maxScore) {
                maxScore = score;
                nodePtr = &node;
            }
        }
        std::cout << "raw loop count " << timer.ElapsedMs() << std::endl;
    }
}
Run Code Online (Sandbox Code Playgroud)

我运行循环两次以消除启动行为...第二次循环的结果(用 g++ 9.1 -O3 编译):

algorithm elapsed time 16
raw loop count 8 (<== I see I forgot to change "count" to "time" :P)
Run Code Online (Sandbox Code Playgroud)

所以不是这样的。

编辑:看到这个问题得到了赞成,人们仍在关注它。自从提出这个问题后,C++20就发布了。C++20 的范围库有一个特殊的功能,可以在这里提供帮助,称为Projection

即在这种情况下,您可以使用std::ranges::max_element或什至std::ranges::max(旧的标准算法中缺少),例如

#include <cfloat>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>

static int counter;

class Node {
public:
    auto Score() const -> double {
        std::cout << "complex calculation\n";
        counter++;
        return 1;
    }
};

int main()
{
    
    std::array<Node, 10> nodes;
    
    counter = 0;
    Node const* nodePtr = std::max_element(std::cbegin(nodes), std::cend(nodes),
        [](Node const& node1, Node const& node2) {
            return node1.Score() < node2.Score();
        });
    std::cout << "algorithm count " << counter << std::endl;
    
    counter = 0;
    double maxScore = -FLT_MAX;
    for (const auto& node : nodes) {
        auto score = node.Score();
        if (score > maxScore) {
            maxScore = score;
            nodePtr = &node;
        }
    }
    std::cout << "raw loop count " << counter << std::endl;
}
Run Code Online (Sandbox Code Playgroud)

然而,由于实现选择不使用缓存,投影并不是这里的解决方案。对于比较器函数的每个参数,Proj都会一次又一次地调用投影函数。Comp

(内部函数调用看起来像

#include <cfloat>
#include <cmath>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>
#include <random>
#include <chrono>

template <typename T>
class Random {
private:
    std::default_random_engine generator;
    std::uniform_real_distribution<T> distribution;
public:
    Random()
        : generator()
        , distribution(0.0, 1.0)
    {}

    auto operator()() {
        return distribution(generator);
    };
};

static Random<double> myRandom;

class Timer {
private:
    std::chrono::high_resolution_clock::time_point startTime{};
public:
    void Start() noexcept {
        startTime = std::chrono::high_resolution_clock::now();
    }
    [[nodiscard]] auto ElapsedMs() const noexcept {
        return std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - startTime).count();
    }
};

static Timer timer;

class Node {
private:
    double val;
public:
    Node() noexcept : val(myRandom()) {}

    [[nodiscard]] auto Score() const noexcept {
        auto score = std::sqrt(std::log(10.0 / val));
        score = std::sin(score) / std::cos(score);
        score = std::sqrt(std::sqrt(std::sqrt(std::sqrt(std::sqrt(score)))));
        score = std::pow(score, 1000);
        return score;
    }
};

int main()
{
    std::array<Node, 100000> nodes; // yeah, yeah... overloading the stack, I know

    for (auto i = 0; i < 2; i++) {
        timer.Start();
        Node const* nodePtr = &*std::max_element(std::cbegin(nodes), std::cend(nodes),
            [](Node const& node1, Node const& node2) {
                return node1.Score() < node2.Score();
            });
        std::cout << "algorithm elapsed time " << timer.ElapsedMs() << std::endl;

        timer.Start();
        double maxScore = -FLT_MAX;
        for (const auto& node : nodes) {
            auto score = node.Score();
            if (score > maxScore) {
                maxScore = score;
                nodePtr = &node;
            }
        }
        std::cout << "raw loop count " << timer.ElapsedMs() << std::endl;
    }
}
Run Code Online (Sandbox Code Playgroud)

Ric*_*ard 6

用抽象算法替换原始循环是一种很好的风格,因为这样您就可以多次重复使用该算法,但只测试一次。以这种方式包装循环可能看起来像语法糖,但它大大减少了代码中出现错误的可能性,因为您现在可以对抽象算法进行广泛的单元测试,并且您永远不需要担心在需要时错误地实现它。

然而,您在这里比较的是苹果和橙子。您的max_element实现始终会计算Score()其比较,而您的for循环会缓存函数的结果Score()

更好的实现Node可能是:

class Node {
mutable:
    double cached_score = std::numeric_limits<double>::quiet_Nan();
public:
    auto Score() const -> double {
        if(std::isnan(cached_score)){
           std::cout << "complex calculation\n";
           counter++;
           cached_score = 1;
        }
        return cached_score;
    }
    void invalidate_cache() {
      cached_score = std::numeric_limits<double>::quiet_Nan();
    }
};
Run Code Online (Sandbox Code Playgroud)

这样复杂的计算只执行一次。

或者,编写您自己的抽象:

#include <cfloat>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>

static int counter;

class Node {
public:
    auto Score() const -> double {
        std::cout << "complex calculation\n";
        counter++;
        return 1;
    }
};

template<class ForwardIt, class Evaluate, class Compare>
ForwardIt max_eval_element(
    ForwardIt first,
    ForwardIt last,
    Evaluate eval,
    Compare comp
){
    if (first == last) return last;

    ForwardIt largest = first;
    auto largest_val = eval(*first);
    ++first;
    for (; first != last; ++first) {
        const auto this_val = eval(*first);
        if (comp(largest_val, this_val)) {
            largest = first;
            largest_val = this_val;
        }
    }
    return largest;
}

int main()
{

    std::array<Node, 10> nodes;

    counter = 0;
    Node const* nodePtr = max_eval_element(std::cbegin(nodes), std::cend(nodes),
                                       [](Node const& node){ return node.Score(); },
                                       [](double const &a, double const &b) {
                                           return a<b;
                                       });
    std::cout << "algorithm count " << counter << std::endl;

    counter = 0;
    double maxScore = -FLT_MAX;
    for (const auto& node : nodes) {
        auto score = node.Score();
        if (score > maxScore) {
            maxScore = score;
            nodePtr = &node;
        }
    }
    std::cout << "raw loop count " << counter << std::endl;
}
Run Code Online (Sandbox Code Playgroud)

在这种情况下,两个循环执行相同数量的评估。

我使用过的许多内部代码库都有扩展 STL 的扩展库。它使我工作过的团队更加确信他们的代码已正确编写,并允许您一目了然地解释复杂的操作。这样一来,这些抽象也减少了理解代码的工作量和沟通的工作量。

  • @JHBonarius:请参阅我答案的第一句话。您将使用对通用函数的调用来替换复杂的内联操作,您可以为其编写大量的单元测试。如果您在项目中使用两次“max_eval_element”,则与原始循环相比,您需要执行的测试量会减少一半。它还为您提供了可以在项目之间移动的功能。如果您查看 [`std::max_element`](https://en.cppreference.com/w/cpp/algorithm/max_element) 的文档,您会发现它也只是一个抽象。这里没有魔法,只有良好的设计实践。 (2认同)