chq*_*lie 35 c c++ signed integer-overflow language-lawyer
C标准明确规定有符号整数溢出具有未定义的行为。然而,大多数CPU都使用定义的语义为溢出实现签名算法(可能除法除法溢出为:x / 0
和INT_MIN / -1
)。
编译器的作家一直在走的优势undefinedness这种溢出的补充,往往会打破传统的代码非常微妙的方式更积极的优化。例如,此代码可能在较旧的编译器上有效,但在gcc
and的当前版本上不再有效clang
:
/* Tncrement a by a value in 0..255, clamp a to positive integers.
The code relies on 32-bit wrap-around, but the C Standard makes
signed integer overflow undefined behavior, so sum_max can now
return values less than a. There are Standard compliant ways to
implement this, but legacy code is what it is... */
int sum_max(int a, unsigned char b) {
int res = a + b;
return (res >= a) ? res : INT_MAX;
}
Run Code Online (Sandbox Code Playgroud)
有确凿的证据表明这些优化值得吗?是否有比较研究记录了实际示例甚至经典基准的实际改进?
我在观看时提出了这个问题:C ++ Now 2018:John Regehr“结束主题演讲:未定义的行为和编译器优化”
我在给c和c ++加上标签,因为两种语言的问题都相似,但是答案可能不同。
bol*_*lov 21
我不了解研究和统计信息,但是是的,肯定有一些优化考虑了编译器实际所做的事情。是的,它们非常重要(例如,tldr循环矢量化)。
除了编译器优化之外,还有其他方面需要考虑。使用UB,您可以获得C / C ++有符号整数,其数学行为与数学期望一样。例如,x + 10 > x
现在适用(当然适用于有效代码),但不会出现环绕行为。
我从Krister Walfridsson的博客中找到了一篇出色的文章,即未定义的签名溢出如何在GCC中实现优化,列出了一些将签名溢出UB考虑在内的优化。以下示例来自于此。我正在向它们添加c ++和汇编示例。
如果优化看起来过于简单,无趣或没有影响,请记住,这些优化只是更大得多的优化链中的步骤。蝴蝶效应确实发生了,因为在较早的步骤看似不重要的优化会在较晚的步骤触发更具影响力的优化。
如果这些示例看起来很荒谬(谁会写x * 10 > 0
),请记住,您可以很容易地在C和C ++中使用常量,宏,模板来获得此类示例。此外,编译器在其IR中进行转换和优化时可以得到这类示例。
与0相比,消除乘法
Run Code Online (Sandbox Code Playgroud)(x * c) cmp 0 -> x cmp 0
bool foo(int x) { return x * 10 > 0 }
Run Code Online (Sandbox Code Playgroud)
foo(int):
test edi, edi
setg al
ret
Run Code Online (Sandbox Code Playgroud)乘法运算后消除除法
(x * c1)/ c2-> x *(c1 / c2)如果c1可被c2整除
int foo(int x) { return (x * 20) / 10; }
Run Code Online (Sandbox Code Playgroud)
foo(int):
lea eax, [rdi+rdi]
ret
Run Code Online (Sandbox Code Playgroud)消除否定
(-x)/(-y)-> x / y
(x * c) cmp 0 -> x cmp 0
Run Code Online (Sandbox Code Playgroud)
foo(int, int):
mov eax, edi
cdq
idiv esi
ret
Run Code Online (Sandbox Code Playgroud)Simplify comparisons that are always true or false
Run Code Online (Sandbox Code Playgroud)x + c < x -> false x + c <= x -> false x + c > x -> true x + c >= x -> true
bool foo(int x) { return x * 10 > 0 }
Run Code Online (Sandbox Code Playgroud)
foo(int):
mov eax, 1
ret
Run Code Online (Sandbox Code Playgroud)Eliminate negation in comparisons
(-x) cmp (-y) -> y cmp x
Run Code Online (Sandbox Code Playgroud)
foo(int):
test edi, edi
setg al
ret
Run Code Online (Sandbox Code Playgroud)
foo(int, int):
cmp edi, esi
setg al
ret
Run Code Online (Sandbox Code Playgroud)Reduce magnitude of constants
Run Code Online (Sandbox Code Playgroud)x + c > y -> x + (c - 1) >= y x + c <= y -> x + (c - 1) < y
bool foo(int x, int y) { return x + 10 <= y; }
Run Code Online (Sandbox Code Playgroud)
foo(int, int):
add edi, 9
cmp edi, esi
setl al
ret
Run Code Online (Sandbox Code Playgroud)Eliminate constants in comparisons
Run Code Online (Sandbox Code Playgroud)(x + c1) cmp c2 -> x cmp (c2 - c1) (x + c1) cmp (y + c2) -> x cmp (y + (c2 - c1)) if c1 <= c2
The second transformation is only valid if c1 <= c2, as it would otherwise introduce an overflow when y has the value INT_MIN.
int foo(int x) { return (x * 20) / 10; }
Run Code Online (Sandbox Code Playgroud)
foo(int):
cmp edi, -30
setl al
ret
Run Code Online (Sandbox Code Playgroud)If an operation does not overflow, then we will get the same result if we do the operation in a wider type. This is often useful when doing things like array indexing on 64-bit architectures — the index calculations are typically done using 32-bit int, but the pointers are 64-bit, and the compiler may generate more efficient code when signed overflow is undefined by promoting the 32-bit integers to 64-bit operations instead of generating type extensions.
One other aspect of this is that undefined overflow ensures that a[i] and a[i+1] are adjacent. This improves analysis of memory accesses for vectorization etc.
This is a very important optimization as loop vectorization one of the most efficient and effective optimization algorithms.
It is trickier to demonstrate. But I remember actually encountering a situation when changing an index from unsigned
to signed
drastically improved the generated assembly. Unfortunately I cannot remember or replicate it now. Will come back later if I figure it out.
The compiler keeps track of the variables' range of possible values at each point in the program, i.e. for code such as
Run Code Online (Sandbox Code Playgroud)int x = foo(); if (x > 0) { int y = x + 5; int z = y / 4;
it determines that x has the range
[1, INT_MAX]
after the if-statement, and can thus determine that y has the range[6, INT_MAX]
as overflow is not allowed. And the next line can be optimized toint z = y >> 2;
as the compiler knows that y is non-negative.
auto foo(int x)
{
if (x <= 0)
__builtin_unreachable();
return (x + 5) / 4;
}
Run Code Online (Sandbox Code Playgroud)
foo(int):
lea eax, [rdi+5]
sar eax, 2
ret
Run Code Online (Sandbox Code Playgroud)
The undefined overflow helps optimizations that need to compare two values (as the wrapping case would give possible values of the form
[INT_MIN, (INT_MIN+4)]
or[6, INT_MAX]
that prevents all useful comparisons with<
or>
), such as
- Changing comparisons
x<y
to true or false if the ranges forx
andy
does not overlap- Changing
min(x,y)
ormax(x,y)
tox
ory
if the ranges do not overlap- Changing
abs(x)
tox
or-x
if the range does not cross0
- 更改
x/c
为x>>log2(c)
ifx>0
和常数c
是2
- 更改
x%c
为x&(c-1)
ifx>0
和常数c
是2
为什么未定义的有符号溢出有助于循环优化的典型示例是像
Run Code Online (Sandbox Code Playgroud)for (int i = 0; i <= m; i++)
确保因未定义的溢出而终止。这有助于具有特定循环指令的体系结构,因为它们通常不会处理无限循环。
但是未定义的有符号溢出有助于更多的循环优化。所有分析(例如确定迭代次数,转换归纳变量和跟踪内存访问)都在使用上一节中的所有内容来进行工作。特别是,当允许有符号溢出时,可以向量化的循环集会大大减少。
-ftrapv
GCC / clang的命令行开关不是一个优化的例子,但未定义行为的一个有用结果就是。它插入的代码使您的程序在整数溢出时崩溃。
根据无符号溢出是有意的想法,它不适用于无符号整数。
该标准对有符号整数溢出的措辞可确保人们不会故意写溢出代码,因此ftrapv
是发现意外溢出的有用工具。
这是一个实际的小基准,冒泡排序。我比较了没有/有的时间-fwrapv
(这意味着溢出是UB/不是UB)。结果如下(秒):
-O3 -O3 -fwrapv -O1 -O1 -fwrapv
Machine1, clang 5.2 6.3 6.8 7.7
Machine2, clang-8 4.2 7.8 6.4 6.7
Machine2, gcc-8 6.6 7.4 6.5 6.5
Run Code Online (Sandbox Code Playgroud)
如您所见,not-UB ( -fwrapv
) 版本几乎总是较慢,最大的差异非常大,为 1.85 倍。
这是代码。请注意,我特意选择了一个实现,它应该会为此测试产生更大的差异。
#include <stdio.h>
#include <stdlib.h>
void bubbleSort(int *a, long n) {
bool swapped;
for (int i = 0; i < n-1; i++) {
swapped = false;
for (int j = 0; j < n-i-1; j++) {
if (a[j] > a[j+1]) {
int t = a[j];
a[j] = a[j+1];
a[j+1] = t;
swapped = true;
}
}
if (!swapped) break;
}
}
int main() {
int a[8192];
for (int j=0; j<100; j++) {
for (int i=0; i<8192; i++) {
a[i] = rand();
}
bubbleSort(a, 8192);
}
}
Run Code Online (Sandbox Code Playgroud)