是否有一些有意义的统计数据来证明保持未定义的有符号整数算术溢出是合理的？

chq*_*lie 35 c c++ signed integer-overflow language-lawyer

C标准明确规定有符号整数溢出具有未定义的行为。然而，大多数CPU都使用定义的语义为溢出实现签名算法（可能除法除法溢出为：x / 0和INT_MIN / -1）。

编译器的作家一直在走的优势undefinedness这种溢出的补充，往往会打破传统的代码非常微妙的方式更积极的优化。例如，此代码可能在较旧的编译器上有效，但在gccand的当前版本上不再有效clang：

/* Tncrement a by a value in 0..255, clamp a to positive integers.
   The code relies on 32-bit wrap-around, but the C Standard makes
   signed integer overflow undefined behavior, so sum_max can now 
   return values less than a. There are Standard compliant ways to
   implement this, but legacy code is what it is... */
int sum_max(int a, unsigned char b) {
    int res = a + b;
    return (res >= a) ? res : INT_MAX;
}

Run Code Online (Sandbox Code Playgroud)

有确凿的证据表明这些优化值得吗？是否有比较研究记录了实际示例甚至经典基准的实际改进？

我在观看时提出了这个问题：C ++ Now 2018：John Regehr“结束主题演讲：未定义的行为和编译器优化”

我在给c和c ++加上标签，因为两种语言的问题都相似，但是答案可能不同。

我不了解研究和统计信息，但是是的，肯定有一些优化考虑了编译器实际所做的事情。是的，它们非常重要（例如，tldr循环矢量化）。

除了编译器优化之外，还有其他方面需要考虑。使用UB，您可以获得C / C ++有符号整数，其数学行为与数学期望一样。例如，x + 10 > x现在适用（当然适用于有效代码），但不会出现环绕行为。

我从Krister Walfridsson的博客中找到了一篇出色的文章，即未定义的签名溢出如何在GCC中实现优化，列出了一些将签名溢出UB考虑在内的优化。以下示例来自于此。我正在向它们添加c ++和汇编示例。

如果优化看起来过于简单，无趣或没有影响，请记住，这些优化只是更大得多的优化链中的步骤。蝴蝶效应确实发生了，因为在较早的步骤看似不重要的优化会在较晚的步骤触发更具影响力的优化。

如果这些示例看起来很荒谬（谁会写x * 10 > 0），请记住，您可以很容易地在C和C ++中使用常量，宏，模板来获得此类示例。此外，编译器在其IR中进行转换和优化时可以得到这类示例。

有符号整数表达式的简化

与0相比，消除乘法
```
(x * c) cmp 0   ->   x cmp 0 
```
Run Code Online (Sandbox Code Playgroud)
```
bool foo(int x) { return x * 10 > 0 }
```
Run Code Online (Sandbox Code Playgroud)
```
foo(int):
        test    edi, edi
        setg    al
        ret
```
Run Code Online (Sandbox Code Playgroud)
乘法运算后消除除法

（x * c1）/ c2-> x *（c1 / c2）如果c1可被c2整除
```
int foo(int x) { return (x * 20) / 10; }
```
Run Code Online (Sandbox Code Playgroud)
```
foo(int):
        lea     eax, [rdi+rdi]
        ret
```
Run Code Online (Sandbox Code Playgroud)

消除否定

（-x）/（-y）-> x / y

(x * c) cmp 0   ->   x cmp 0

Run Code Online (Sandbox Code Playgroud)

foo(int, int):
        mov     eax, edi
        cdq
        idiv    esi
        ret

Run Code Online (Sandbox Code Playgroud)

Simplify comparisons that are always true or false

x + c < x       ->   false
x + c <= x      ->   false
x + c > x       ->   true
x + c >= x      ->   true

Run Code Online (Sandbox Code Playgroud)

bool foo(int x) { return x * 10 > 0 }

Run Code Online (Sandbox Code Playgroud)

foo(int):
        mov     eax, 1
        ret

Run Code Online (Sandbox Code Playgroud)

Eliminate negation in comparisons

(-x) cmp (-y)   ->   y cmp x

Run Code Online (Sandbox Code Playgroud)

foo(int):
        test    edi, edi
        setg    al
        ret

Run Code Online (Sandbox Code Playgroud)

foo(int, int):
        cmp     edi, esi
        setg    al
        ret

Run Code Online (Sandbox Code Playgroud)

Reduce magnitude of constants

x + c > y       ->   x + (c - 1) >= y
x + c <= y      ->   x + (c - 1) < y
Run Code Online (Sandbox Code Playgroud)

bool foo(int x, int y) { return x + 10 <= y; }

Run Code Online (Sandbox Code Playgroud)

foo(int, int):
        add     edi, 9
        cmp     edi, esi
        setl    al
        ret

Run Code Online (Sandbox Code Playgroud)

Eliminate constants in comparisons
```
(x + c1) cmp c2         ->   x cmp (c2 - c1)
(x + c1) cmp (y + c2)   ->   x cmp (y + (c2 - c1)) if c1 <= c2
```
Run Code Online (Sandbox Code Playgroud)
The second transformation is only valid if c1 <= c2, as it would otherwise introduce an overflow when y has the value INT_MIN.
```
int foo(int x) { return (x * 20) / 10; }
```
Run Code Online (Sandbox Code Playgroud)
```
foo(int):
        cmp     edi, -30
        setl    al
        ret
```
Run Code Online (Sandbox Code Playgroud)

Pointer arithmetic and type promotion

If an operation does not overflow, then we will get the same result if we do the operation in a wider type. This is often useful when doing things like array indexing on 64-bit architectures — the index calculations are typically done using 32-bit int, but the pointers are 64-bit, and the compiler may generate more efficient code when signed overflow is undefined by promoting the 32-bit integers to 64-bit operations instead of generating type extensions.

One other aspect of this is that undefined overflow ensures that a[i] and a[i+1] are adjacent. This improves analysis of memory accesses for vectorization etc.

This is a very important optimization as loop vectorization one of the most efficient and effective optimization algorithms.

It is trickier to demonstrate. But I remember actually encountering a situation when changing an index from unsigned to signed drastically improved the generated assembly. Unfortunately I cannot remember or replicate it now. Will come back later if I figure it out.

Value range calculations

The compiler keeps track of the variables' range of possible values at each point in the program, i.e. for code such as
int x = foo();
if (x > 0) {
  int y = x + 5;
  int z = y / 4;
Run Code Online (Sandbox Code Playgroud)
it determines that x has the range [1, INT_MAX] after the if-statement, and can thus determine that y has the range [6, INT_MAX] as overflow is not allowed. And the next line can be optimized to int z = y >> 2; as the compiler knows that y is non-negative.

auto foo(int x)
{
    if (x <= 0)
        __builtin_unreachable();

    return (x + 5) / 4;
}

Run Code Online (Sandbox Code Playgroud)

foo(int):
        lea     eax, [rdi+5]
        sar     eax, 2
        ret

Run Code Online (Sandbox Code Playgroud)

The undefined overflow helps optimizations that need to compare two values (as the wrapping case would give possible values of the form [INT_MIN, (INT_MIN+4)] or [6, INT_MAX] that prevents all useful comparisons with < or >), such as

Changing comparisons x<y to true or false if the ranges for x and y does not overlap

Changing min(x,y) or max(x,y) to x or y if the ranges do not overlap

Changing abs(x) to x or -x if the range does not cross 0

更改x/c为x>>log2(c)if x>0和常数c是2

更改x%c为x&(c-1)if x>0和常数c是2

循环分析与优化

为什么未定义的有符号溢出有助于循环优化的典型示例是像
for (int i = 0; i <= m; i++)
Run Code Online (Sandbox Code Playgroud)
确保因未定义的溢出而终止。这有助于具有特定循环指令的体系结构，因为它们通常不会处理无限循环。

但是未定义的有符号溢出有助于更多的循环优化。所有分析（例如确定迭代次数，转换归纳变量和跟踪内存访问）都在使用上一节中的所有内容来进行工作。特别是，当允许有符号溢出时，可以向量化的循环集会大大减少。

@immibis我的意思是所有有效代码都适用。UB不是有效的代码。 (4认同)
@chqrlie 我在 cppcon 标准委员会的知名成员中听到，事后看来，无符号大小是一个错误。关于无符号的环绕行为也是如此（对于普通类型来说，在溢出时使用 UB 并在明确需要时使用具有环绕行为的单独类型会更好） (2认同)
@immibis，您看到编译器如何将x + 10> = x;转换为mov eax，1吗？正是因为它可以假设`x + 10> = x`始终为真（我最后一次说：**对于有效代码**） (2认同)
@immibis，您目前是故意这样做的。您正在使讨论远离原始问题。我的说法是“对于有效的代码，编译器将假设`x + 10> = x`始终为真-这是标准规定的-因此编译器可以将表达式优化为'true`”。那始终是我的发言。这就是我要讨论的内容（以及您最初所争论的内容）。您现在想谈谈程序员如何确保他不写具有UB的代码，因此是无效的，这是另一种讨论。 (2认同)

-ftrapvGCC / clang的命令行开关不是一个优化的例子，但未定义行为的一个有用结果就是。它插入的代码使您的程序在整数溢出时崩溃。

根据无符号溢出是有意的想法，它不适用于无符号整数。

该标准对有符号整数溢出的措辞可确保人们不会故意写溢出代码，因此ftrapv是发现意外溢出的有用工具。

这是一个实际的小基准，冒泡排序。我比较了没有/有的时间-fwrapv（这意味着溢出是UB/不是UB）。结果如下（秒）：

                   -O3     -O3 -fwrapv    -O1     -O1 -fwrapv
Machine1, clang    5.2     6.3            6.8     7.7
Machine2, clang-8  4.2     7.8            6.4     6.7
Machine2, gcc-8    6.6     7.4            6.5     6.5

Run Code Online (Sandbox Code Playgroud)

如您所见，not-UB ( -fwrapv) 版本几乎总是较慢，最大的差异非常大，为 1.85 倍。

这是代码。请注意，我特意选择了一个实现，它应该会为此测试产生更大的差异。

#include <stdio.h>
#include <stdlib.h>

void bubbleSort(int *a, long n) {
        bool swapped;
        for (int i = 0; i < n-1; i++) {
                swapped = false;
                for (int j = 0; j < n-i-1; j++) {
                        if (a[j] > a[j+1]) {
                                int t = a[j];
                                a[j] = a[j+1];
                                a[j+1] = t;
                                swapped = true;
                        }
                }

                if (!swapped) break;
        }
}

int main() {
        int a[8192];

        for (int j=0; j<100; j++) {
                for (int i=0; i<8192; i++) {
                        a[i] = rand();
                }

                bubbleSort(a, 8192);
        }
}

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，5 月前
查看次数：	812 次
最近记录：	6 年，5 月前

在支持循环和函数的语言中使用'goto'是否有利？如果是这样,为什么？ 194

什么是__stdcall？ 137

什么是qualified-id/name和unqualified-id/name？ 55

pow()似乎是一个人在这里 42

切换案例或std :: map的效率更高 27

在CMake中强制使用C99(使用'for'循环初始声明) 24

字符数组的大小和字符指针的大小 17

什么是"pch.h",为什么需要将它作为第一个头文件包含在内？ 16

究竟什么是"尾随参数包" 15

如何正确地将unsigned char数组转换为uint32_t 12

最终C++书籍指南和列表 4246

如何确定最初克隆本地Git存储库的URL？ 3782

检查字典中是否已存在给定键 2683

比较Git中的两个分支？ 2149

使用其名称(字符串)调用模块的函数 1580

C#的隐藏功能？ 1475

如何在Python中追加文件？ 1446

如何在Git中有选择地合并或选择来自另一个分支的更改？ 1374

如何删除重复的行？ 1254

为什么我们需要C++中的虚函数？ 1223