编译器对编译时分支做了什么？

Question

编译器对编译时分支做了什么？

She*_*ohn 33 c++ templates if-statement type-traits c++11

编辑:我把"if/else"案例作为一个例子,有时可以在编译时解决(例如,当涉及静态值时,cf <type_traits>).将以下答案调整为其他类型的静态分支(例如,多个分支或多标准分支)应该是直截了当的.请注意,使用模板元编程的编译时分支不是此处的主题.

在像这样的典型代码中

#include <type_traits>

template <class T>
T numeric_procedure( const T& x )
{
    if ( std::is_integral<T>::value )
    {
        // Integral types
    }
    else
    {
        // Floating point numeric types
    }
}

Run Code Online (Sandbox Code Playgroud)

当我在代码中稍后定义特定模板类型时,编译器会优化if/else语句吗？

一个简单的替代方案是写这样的东西:

#include <type_traits>

template <class T>
inline T numeric_procedure( const T& x )
{
    return numeric_procedure_impl( x, std::is_integral<T>() );
}

// ------------------------------------------------------------------------

template <class T>
T numeric_procedure_impl( const T& x, std::true_type const )
{
    // Integral types
}

template <class T>
T numeric_procedure_impl( const T& x, std::false_type const )
{
    // Floating point numeric types
}

Run Code Online (Sandbox Code Playgroud)

这些解决方案之间的性能有何不同？是否有任何非主观理由说一个人比另一个人好？还有其他(可能更好的)解决方案来处理编译时分支吗？

Answer 1

Tem*_*Rex 50

TL; DR

有几种方法可以根据模板参数获得不同的运行时行为.性能不应该是您的主要关注点,但灵活性和可维护性应该是.在所有情况下,各种瘦包装器和常量条件表达式都将在适用于发布版本的任何合适的编译器上进行优化.下面是各种权衡的小摘要(受@AndyProwl的回答启发).

运行时如果

您的第一个解决方案是简单的运行时if:

template<class T>
T numeric_procedure(const T& x)
{
    if (std::is_integral<T>::value) {
        // valid code for integral types
    } else {
        // valid code for non-integral types,
        // must ALSO compile for integral types
    }
}

Run Code Online (Sandbox Code Playgroud)

它简单而有效:任何体面的编译器都会优化掉死分支.

有几个缺点:

在某些平台(MSVC)上,一个常量条件表达式会产生一个虚假的编译器警告,然后您需要忽略或静默.
但更糟糕的是,在所有符合要求的平台上,语句的两个分支都if/else需要实际编译所有类型T,即使已知其中一个分支不被采用.如果T根据其性质包含不同的成员类型,则在尝试访问它们时会出现编译器错误.

标签调度

您的第二种方法称为标签调度:

template<class T>
T numeric_procedure_impl(const T& x, std::false_type)
{
    // valid code for non-integral types,
    // CAN contain code that is invalid for integral types
}    

template<class T>
T numeric_procedure_impl(const T& x, std::true_type)
{
    // valid code for integral types
}

template<class T>
T numeric_procedure(const T& x)
{
    return numeric_procedure_impl(x, std::is_integral<T>());
}

Run Code Online (Sandbox Code Playgroud)

它工作正常,没有运行时开销:临时std::is_integral<T>()和单线帮助函数的调用都将在任何体面的平台上进行优化.

主要(次要的IMO)缺点是你有一些样板,而不是1个功能.

SFINAE

与标签调度密切相关的是SFINAE(替换失败不是错误)

template<class T, class = typename std::enable_if<!std::is_integral<T>::value>::type>
T numeric_procedure(const T& x)
{
    // valid code for non-integral types,
    // CAN contain code that is invalid for integral types
}    

template<class T, class = typename std::enable_if<std::is_integral<T>::value>::type>
T numeric_procedure(const T& x)
{
    // valid code for integral types
}

Run Code Online (Sandbox Code Playgroud)

这与tag-dispatching具有相同的效果,但工作方式略有不同.它不是使用参数推导来选择正确的辅助过载,而是直接操作主函数的过载集.

缺点是,如果你不确切知道整个重载集是什么,它可能是一个脆弱而棘手的方法(例如,使用模板繁重的代码,ADL可能会从你没想到的相关命名空间中引入更多的重载).与标签调度相比,基于除二元决策之外的任何其他选择的选择更多.

部分专业化

另一种方法是将类模板助手与函数应用程序运算符一起使用,并对其进行部分特化

template<class T, bool> 
struct numeric_functor;

template<class T>
struct numeric_functor<T, false>
{
    T operator()(T const& x) const
    {
        // valid code for non-integral types,
        // CAN contain code that is invalid for integral types
    }
};

template<class T>
struct numeric_functor<T, true>
{
    T operator()(T const& x) const
    {
        // valid code for integral types
    }
};

template<class T>
T numeric_procedure(T const& x)
{
    return numeric_functor<T, std::is_integral<T>::value>()(x);
}

Run Code Online (Sandbox Code Playgroud)

如果您希望获得细粒度控制和最少的代码重复(例如,如果您还想专注于大小和/或对齐,但仅限于浮点类型),这可能是最灵活的方法.部分模板专业化给出的模式匹配非常适合这种高级问题.与标记调度一样,辅助函子被任何体面的编译器优化掉.

如果你只想专注于单一的二元条件,主要的缺点是稍大的锅炉板.

如果constexpr(C++ 1z提案)

这是重新启动失败的早期提案static if(用于D编程语言)

template<class T>
T numeric_procedure(const T& x)
{
    if constexpr (std::is_integral<T>::value) {
        // valid code for integral types
    } else {
        // valid code for non-integral types,
        // CAN contain code that is invalid for integral types
    }
}

Run Code Online (Sandbox Code Playgroud)

与运行时一样if,所有内容都在一个地方,但这里的主要优点是,else当知道不采用分支时,分支将完全被编译器删除.一个很大的优点是您可以将所有代码保持在本地,并且不必像标记调度或部分模板特化那样使用小帮助函数.

Concepts-Lite(C++ 1z提案)

Concepts-Lite是即将推出的技术规范,计划成为下一个主要C++版本(C++ 1z,z==7最佳猜测)的一部分.

template<Non_integral T>
T numeric_procedure(const T& x)
{
    // valid code for non-integral types,
    // CAN contain code that is invalid for integral types
}    

template<Integral T>
T numeric_procedure(const T& x)
{
    // valid code for integral types
}

Run Code Online (Sandbox Code Playgroud)

此方法使用描述代码应该用于的类型族的概念名称替换括号内的classor typename关键字template< >.它可以看作是标签调度和SFINAE技术的概括.一些编译器(gcc,Clang)对此功能有实验支持.Lite形容词指的是失败的Concepts C++ 11提案.

Answer 2

Use*_*ess 12

请注意,尽管优化器可能能够从生成的代码中修剪静态已知的测试和不可到达的分支,但编译器仍然需要能够编译每个分支.

那是:

int foo() {
  #if 0
    return std::cout << "this isn't going to work\n";
  #else
    return 1;
  #endif
}

Run Code Online (Sandbox Code Playgroud)

将正常工作,因为预处理器在编译器看到之前剥离出死分支,但是:

int foo() {
  if (std::is_integral<double>::value) {
    return std::cout << "this isn't going to work\n";
  } else {
    return 1;
  }
}

Run Code Online (Sandbox Code Playgroud)

惯于.即使优化器可以丢弃第一个分支,它仍然无法编译.这是使用enable_if和SFINAE帮助的地方,因为您可以选择有效(可编译)代码,并且无效(不可编译)代码的编译失败不是错误.

Answer 3

Pet*_*des 5

要回答有关编译器如何处理的标题问题if(false)：

他们优化了恒定的分支条件（和死代码）

语言标准当然并不要求编译器不可怕，但人们实际使用的 C++ 实现在这方面并不可怕。（大多数 C 实现也是如此，除了像tinycc这样可能非常简单的非优化实现。）

if(something)围绕 C++而不是 C 预处理器进行设计的主要原因之一#ifdef SOMETHING是它们同样高效。许多 C++ 功能（如constexpr）仅在编译器已经实现了必要的优化（内联 + 常量传播）后才添加。（我们忍受 C 和 C++ 的所有未定义行为陷阱和陷阱的原因是性能，尤其是在没有 UB 的假设下积极优化的现代编译器。语言设计通常不会带来不必要的性能成本。）

但如果您关心调试模式性能，则选择可能与您的编译器相关。（例如，对于具有实时要求的游戏或其他程序，调试版本甚至可以测试）。

例如clang++ -O0（“调试模式”）仍然if(constexpr_function())在编译时评估 an 并将其视为if(false)or if(true)。其他一些编译器仅在被迫时（通过模板匹配）在编译时进行评估。

if(false)启用优化后不会产生性能成本。（排除错过优化的错误，这可能取决于编译过程中多早将条件解析为 false，并且死代码消除可以在编译器“考虑”为其变量保留堆栈空间之前将其删除，或者函数可能是非叶的，或者其他什么。）

任何不可怕的编译器都可以在编译时恒定条件下优化掉死代码（维基百科：死代码消除）。这是人们对 C++ 实现在现实世界中可用的基本期望的一部分；这是最基本的优化之一，实际使用的所有编译器都会针对简单的情况（例如constexpr.

通常，常量传播（尤其是内联之后）将使条件成为编译时常量，即使它们在源代码中并非如此。一个更明显的情况是优化 a 的第一次迭代上的比较for (int i=0 ; i<n ; i++)，这样它就可以变成一个普通的 asm 循环，在底部有一个条件分支（就像do{}whileC++ 中的循环） ifn是 const 或 provprovable > 0。（是的，真正的编译器会进行值范围优化，而不仅仅是常量传播。）

if(false)一些编译器（例如 gcc 和 clang）会在“调试”模式下删除甚至内部的死代码，以通过内部架构中性表示转换程序逻辑并最终发出 asm 所需的最低优化级别。const（但是调试模式会禁用未声明或constexpr源代码中的变量的任何类型的常量传播。）

有些编译器仅在启用优化时才执行此操作；例如，MSVC 非常喜欢在调试模式下将 C++ 翻译为 asm，并且实际上会在寄存器中创建一个零，并根据它是否为零进行分支if(false)。

对于 gcc 调试模式 ( -O0)，constexpr如果不需要，则不会内联函数。（在某些地方，语言需要一个常量，例如结构内的数组大小。GNU C++ 支持 C99 VLA，但选择内联 constexpr 函数，而不是在调试模式下实际创建 VLA。）

但是constexpr像 do 这样的变量constexpr int x = whatever是在编译时评估的，而不是存储在内存中并进行测试。

但重申一下，在任何优化级别，constexpr函数都是完全内联和优化的，然后对于结果if(true)或也是如此if(false)。

即使在调试版本中， C++17if constexpr(func())也可能鼓励函数的编译时评估。C++23if consteval绝对应该强制它，要求条件是编译时常量，就像模板参数一样。（https://en.cppreference.com/w/cpp/language/if）

示例（来自 Godbolt 编译器资源管理器）

#include <type_traits>
void baz() {
    if (std::is_integral<float>::value) f1();  // optimizes for gcc
    else f2();
}

Run Code Online (Sandbox Code Playgroud)

所有-O2启用优化的编译器（针对 x86-64）：

baz():
        jmp     f2()    # optimized tailcall

Run Code Online (Sandbox Code Playgroud)

调试模式代码质量，通常不相关

禁用优化的 GCC仍会计算表达式并消除死代码：

baz():
        push    rbp
        mov     rbp, rsp          # -fno-omit-frame-pointer is the default at -O0
        call    f2()              # still an unconditional call, no runtime branching
        nop
        pop     rbp
        ret

Run Code Online (Sandbox Code Playgroud)

查看 gcc 未内联禁用优化的内容

static constexpr bool always_false() { return sizeof(char)==2*sizeof(int); }
void baz() {
    if (always_false()) f1();
    else f2();
}

Run Code Online (Sandbox Code Playgroud)

static constexpr bool always_false() { return sizeof(char)==2*sizeof(int); }
void baz() {
    if (always_false()) f1();
    else f2();
}

Run Code Online (Sandbox Code Playgroud)

;; gcc9.1 with no optimization chooses not to inline the constexpr function
baz():
        push    rbp
        mov     rbp, rsp
        call    always_false()
        test    al, al              # the bool return value
        je      .L9
        call    f1()
        jmp     .L11
.L9:
        call    f2()
.L11:
        nop
        pop     rbp
        ret

Run Code Online (Sandbox Code Playgroud)

MSVC 的脑残文字代码生成禁用了优化：

void foo() {
    if (false) f1();
    else f2();
}

Run Code Online (Sandbox Code Playgroud)

;; MSVC 19.20 x86-64  no optimization
void foo(void) PROC                                        ; foo
        sub     rsp, 40                             ; 00000028H
        xor     eax, eax                     ; EAX=0
        test    eax, eax                     ; set flags from EAX (which were already set by xor)
        je      SHORT $LN2@foo               ; jump if ZF is set, i.e. if EAX==0
        call    void f1(void)                          ; f1
        jmp     SHORT $LN3@foo
$LN2@foo:
        call    void f2(void)                          ; f2
$LN3@foo:
        add     rsp, 40                             ; 00000028H
        ret     0

Run Code Online (Sandbox Code Playgroud)

禁用优化的基准测试没有用

您应该始终启用真实代码的优化；调试模式性能唯一重要的时候是当它是可调试性的先决条件时。它不是一个有用的代理来避免你的基准优化; 不同的代码从调试模式中获得的收益或多或少取决于其编写方式。

除非这对您的项目来说真的很重要，并且您无法找到有关本地变量或诸如之类的最小优化的足够信息g++ -Og，否则此答案的标题就是完整答案。忽略调试模式，只关心优化构建中 asm 的质量。（最好启用 LTO，如果您的项目可以启用它以允许跨文件内联。）

归档时间：	11 年，8 月前
查看次数：	4051 次
最近记录：	6 年，8 月前