通知 c 或 c++ 编译器循环长度是 8 的倍数

Question

通知 c 或 c++ 编译器循环长度是 8 的倍数

我想用 C++ 编写以下函数（使用 gcc 11.1 编译-O3 -mavx -std=c++17）

void f( float * __restrict__ a, float * __restrict__ b, float * __restrict__ c, int64_t n) {
    for (int64_t i = 0; i != n; ++i) {
        a[i] = b[i] + c[i];
    }
}

Run Code Online (Sandbox Code Playgroud)

这会生成大约 60 行汇编代码，其中许多用于处理 n 不是 8 的倍数的情况。https://godbolt.org/z/61MYPG7an

我知道这n始终是 8 的倍数。我可以更改此代码的一种方法是将其替换for (int64_t i = 0; i != n; ++i)为for (int64_t i = 0; i != (n / 8 * 8); ++i). 这仅生成大约 20 条汇编指令。https://godbolt.org/z/vhvdKMfE9

但是，在第二个 Godbolt 链接的第 5 行，有一条指令将的最低三位归零n。如果有一种方法可以通知编译器n始终是 8 的倍数，则可以省略该指令而不会改变行为。有谁知道在任何 c 或 c++ 编译器（尤其是在 gcc 或 clang 上）上执行此操作的方法吗？在我的情况下，这实际上并不重要，但我很感兴趣并且不确定在哪里看。

Answer 1

HTN*_*TNW 10

声明假设 __builtin_unreachable

void f(float *__restrict__ a, float *__restrict__ b, float *__restrict__ c, int64_t n) {
    if(n % 8 != 0) __builtin_unreachable(); // control flow cannot reach this branch so the condition is not necessary and is optimized out
    for (int64_t i = 0; i != n; ++i) { // if control flow reaches this point n is a multiple of 8
        a[i] = b[i] + c[i];
    }
}

Run Code Online (Sandbox Code Playgroud)

这会产生更短的代码。

归档时间：	4 年，5 月前
查看次数：	114 次
最近记录：	4 年，5 月前