lis*_*rus 57 c++ floating-point assembly gcc micro-optimization
在某些情况下,您知道某个浮点表达式将始终为非负数。例如,计算一个矢量的长度时,一个做sqrt(a[0]*a[0] + ... + a[N-1]*a[N-1])(NB:我是知道的std::hypot,这是不相关的问题),并且平方根下表达显然是非负的。但是,GCC 为以下输出以下程序集sqrt(x*x):
mulss xmm0, xmm0
pxor xmm1, xmm1
ucomiss xmm1, xmm0
ja .L10
sqrtss xmm0, xmm0
ret
.L10:
jmp sqrtf
Run Code Online (Sandbox Code Playgroud)
也就是说,它将结果x*x与零进行比较,如果结果为非负数,则执行sqrtss指令,否则调用sqrtf。
因此,我的问题是:如何强制GCC假定该x*x值始终为非负值,从而跳过比较和sqrtf调用,而无需编写内联汇编?
我想强调的是,我对本地解决方案感兴趣,而不是像-ffast-math,-fno-math-errno或那样做-ffinite-math-only(尽管确实可以解决问题,这要归功于ks1322,harold和Eric Postpischil的评论)。
此外,“强制将GCC假定x*x为非负数”应解释为assert(x*x >= 0.f),因此这也排除了x*xNaN 的情况。
我可以使用特定于编译器,特定于平台,特定于CPU等的解决方案。
Pet*_*des 46
You can write assert(x*x >= 0.f) as a compile-time promise instead of a runtime check as follows in GNU C:
#include <cmath>
float test1 (float x)
{
float tmp = x*x;
if (!(tmp >= 0.0f))
__builtin_unreachable();
return std::sqrt(tmp);
}
Run Code Online (Sandbox Code Playgroud)
(related: What optimizations does __builtin_unreachable facilitate? You could also wrap if(!x)__builtin_unreachable() in a macro and call it promise() or something.)
But gcc doesn't know how to take advantage of that promise that tmp is non-NaN and non-negative. We still get (Godbolt) the same canned asm sequence that checks for x>=0 and otherwise calls sqrtf to set errno. Presumably that expansion into a compare-and-branch happens after other optimization passes, so it doesn't help for the compiler to know more.
This is a missed-optimization in the logic that speculatively inlines sqrt when -fmath-errno is enabled (on by default unfortunately).
-fno-math-errno, which is safe globallyThis is 100% safe if you don't rely on math functions ever setting errno. Nobody wants that, that's what NaN propagation and/or sticky flags that record masked FP exceptions are for. e.g. C99/C++11 fenv access via #pragma STDC FENV_ACCESS ON and then functions like fetestexcept(). See the example in feclearexcept which shows using it to detect division by zero.
The FP environment is part of thread context while errno is global.
Support for this obsolete misfeature is not free; you should just turn it off unless you have old code that was written to use it. Don't use it in new code: use fenv. Ideally support for -fmath-errno would be as cheap as possible but the rarity of anyone actually using __builtin_unreachable() or other things to rule out a NaN input presumably made it not worth developer's time to implement the optimization. Still, you could report a missed-optimization bug if you wanted.
Real-world FPU hardware does in fact have these sticky flags that stay set until cleared, e.g. x86's mxcsr status/control register for SSE/AVX math, or hardware FPUs in other ISAs. On hardware where the FPU can detect exceptions, a quality C++ implementation will support stuff like fetestexcept(). And if not, then math-errno probably doesn't work either.
errno for math was an old obsolete design that C / C++ is still stuck with by default, and is now widely considered a bad idea. It makes it harder for compilers to inline math functions efficiently. Or maybe we're not as stuck with it as I thought: Why errno is not set to EDOM even sqrt takes out of domain arguement? explains that setting errno in math functions is optional in ISO C11, and an implementation can indicate whether they do it or not. Presumably in C++ as well.
It's a big mistake to lump -fno-math-errno in with value-changing optimizations like -ffast-math or -ffinite-math-only. You should strongly consider enabling it globally, or at least for the whole file containing this function.
float test2 (float x)
{
return std::sqrt(x*x);
}
Run Code Online (Sandbox Code Playgroud)
# g++ -fno-math-errno -std=gnu++17 -O3
test2(float): # and test1 is the same
mulss xmm0, xmm0
sqrtss xmm0, xmm0
ret
Run Code Online (Sandbox Code Playgroud)
You might as well use -fno-trapping-math as well, if you aren't ever going to unmask any FP exceptions with feenableexcept(). (Although that option isn't required for this optimization, it's only the errno-setting crap that's a problem here.).
-fno-trapping-math doesn't assume no-NaN or anything, it only assumes that FP exceptions like Invalid or Inexact won't ever actually invoke a signal handler instead of producing NaN or a rounded result. -ftrapping-math is the default but it's broken and "never worked" according to GCC dev Marc Glisse. (Even with it on, GCC does some optimizations which can change the number of exceptions that would be raised from zero to non-zero or vice versa. And it blocks some safe optimizations). But unfortunately, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54192 (make it off by default) is still open.
如果您确实曾经揭露过异常,那么最好使用-ftrapping-math,但是再次出现这种情况是非常罕见的,而不是仅仅在进行一些数学运算之后检查标志或检查NaN。无论如何,它实际上并没有保留确切的异常语义。
对于-fno-trapping-math错误地阻止安全优化的情况,请参见SIMD以了解浮动阈值操作。(即使在执行潜在陷阱操作之后,C还是无条件地执行了该操作,gcc也会使非矢量化的asm有条件地进行操作!因此,它不仅阻塞了矢量化,而且与C抽象机相比,它也会更改异常语义。)
Pass the option -fno-math-errno to gcc. This fixes the problem without making your code unportable or leaving the realm of ISO/IEC 9899:2011 (C11).
What this option does is not attempting to set errno when a math library function fails:
-fno-math-errno
Do not set "errno" after calling math functions that are executed
with a single instruction, e.g., "sqrt". A program that relies on
IEEE exceptions for math error handling may want to use this flag
for speed while maintaining IEEE arithmetic compatibility.
This option is not turned on by any -O option since it can result
in incorrect output for programs that depend on an exact
implementation of IEEE or ISO rules/specifications for math
functions. It may, however, yield faster code for programs that do
not require the guarantees of these specifications.
The default is -fmath-errno.
On Darwin systems, the math library never sets "errno". There is
therefore no reason for the compiler to consider the possibility
that it might, and -fno-math-errno is the default.
Given that you don't seem to be particularly interested in math routines setting errno, this seems like a good solution.
没有任何全局选项,这是一种(低开销,但不是免费的)方式来获得无分支的平方根:
#include <immintrin.h>
float test(float x)
{
return _mm_cvtss_f32(_mm_sqrt_ss(_mm_set1_ps(x * x)));
}
Run Code Online (Sandbox Code Playgroud)
(在godbolt上)
像往常一样,Clang对其改组很精明。GCC和MSVC在该领域比较落后,因此无法避免广播。MSVC也在做一些神秘的动作。
例如,还有其他方法可以将浮点数转换__m128为_mm_set_ss。对于Clang来说没有什么区别,对于GCC来说,代码会变得更大或更差(包括一个movss reg, reg在Intel上算是洗牌的东西,因此什至在洗牌上也不省钱)。
| 归档时间: |
|
| 查看次数: |
2529 次 |
| 最近记录: |