选择退出严格别名规则的“char”豁免

Mik*_*ine 5 c++ types simd strict-aliasing compiler-optimization

如果我有一段简单的代码，uint32_t那么它可以比使用uint8_t. 据我所知，这是因为 char 可以豁免严格的别名规则。考虑：

using T = uint32_t;

T *a;
T *b;
T *c;

void mult(int num)
{
    for (int count = 0; count < num; count++)
    {
        a[count] = b[count] * c[count];
    }
}

Run Code Online (Sandbox Code Playgroud)

https://godbolt.org/z/sW1xnTrhc

这有一个内部循环-01：

.LBB0_2:                                # =>This Inner Loop Header: Depth=1
        mov     r8d, dword ptr [rcx + 4*rdi]
        imul    r8d, dword ptr [rax + 4*rdi]
        mov     dword ptr [rdx + 4*rdi], r8d
        inc     rdi
        cmp     rsi, rdi
        jne     .LBB0_2

Run Code Online (Sandbox Code Playgroud)

请注意，在这种情况下，它只是加载一个值，进行乘法，存储结果，然后循环。这很好。但是，如果我使用uint8_t（https://godbolt.org/z/doM4o6ena），我会从 clang 得到这个内部循环：

.LBB0_2:                                # =>This Inner Loop Header: Depth=1
        mov     rsi, qword ptr [rip + b] # see here
        mov     rax, qword ptr [rip + c] # see here
        movzx   eax, byte ptr [rax + rdx]
        mul     byte ptr [rsi + rdx]
        mov     rsi, qword ptr [rip + a] # see here
        mov     byte ptr [rsi + rdx], al
        inc     rdx
        cmp     rcx, rdx
        jne     .LBB0_2

Run Code Online (Sandbox Code Playgroud)

a请注意，此内部循环加载、b和c每次迭代的值。据我了解，因为a,的指针的存储b可能c与所指向的内容别名，因此循环必须单独运行每个迭代，并重新加载值。随着优化级别的提高，情况会变得更糟。使用uint16_t和/或uint32_t与-O3编译器一起执行各种 SIMD/XMM 魔法，但uint8_t/char循环仍然非常简单且未经优化。

请注意，我并不是要求使用restrict或避免全局变量来解决此问题。我也不是在寻求优化这个特定示例的方法。

我要问的是，是否有一个简单的 8 位算术类型可以使用，但不会落入这个陷阱。

归档时间：	2 年，4 月前
查看次数：	142 次
最近记录：	2 年，4 月前