atomic_flag 是如何实现的？

Question

atomic_flag 是如何实现的？

Yuk*_*uki 3 c++ arm x86-64 atomic stdatomic

是如何atomic_flag实施的？在我看来，在 x86-64 上它atomic_bool无论如何都等效，但这只是一个猜测。x86-64 实现与 arm 或 x86 有什么不同吗？

Answer 1

是的，在普通的 CPU 上，atomic<bool>并且atomic<int>也是无锁的，它非常像atomic<bool>，使用相同的指令。（x86 和 x86-64 具有相同的可用原子操作集。）

您可能认为它总是使用 x86lock bts或lock btr设置/重置（清除）单个位，但做其他事情可能更有效（特别是对于返回 bool 而不是分支的函数）。该对象是一个完整的字节，因此您可以存储或交换整个字节。（如果 ABI 保证该值始终为0or 1，则在将结果作为 a 返回之前您不必将其布尔化bool）

GCC 和 clang 编译test_and_set为字节交换，并清除为0. 我们得到了（几乎）为相同的ASMatomic_flag test_and_set作为f.exchange(true);

#include <atomic>

bool TAS(std::atomic_flag &f) {
    return f.test_and_set();
}

bool TAS_bool(std::atomic<bool> &f) {
    return f.exchange(true);
}


void clear(std::atomic_flag &f) {
    //f = 0; // deleted
    f.clear();
}

void clear_relaxed(std::atomic_flag &f) {
    f.clear(std::memory_order_relaxed);
}

void bool_clear(std::atomic<bool> &f) {
    f = false; // deleted
}

Run Code Online (Sandbox Code Playgroud)

在带有 gcc 和 clang 的 x86-64 以及 ARMv7 和 AArch64 的 Godbolt 上。

## GCC9.2 -O3 for x86-64
TAS(std::atomic_flag&):
        mov     eax, 1
        xchg    al, BYTE PTR [rdi]
        ret
TAS_bool(std::atomic<bool>&):
        mov     eax, 1
        xchg    al, BYTE PTR [rdi]
        test    al, al
        setne   al                      # missed optimization, doesn't need to booleanize to 0/1
        ret
clear(std::atomic_flag&):
        mov     BYTE PTR [rdi], 0
        mfence                          # memory fence to drain store buffer before future loads
        ret
clear_relaxed(std::atomic_flag&):
        mov     BYTE PTR [rdi], 0      # x86 stores are already mo_release, no barrier
        ret
bool_clear(std::atomic<bool>&):
        mov     BYTE PTR [rdi], 0
        mfence
        ret

Run Code Online (Sandbox Code Playgroud)

请注意，这xchg也是seq_cst在 x86-64 上进行存储的有效方法，通常比gcc 使用的mov+更有效mfence。Clangxchg用于所有这些（休闲商店除外）。

有趣的是，在 xchg in 之后，clang 重新布尔值化为 0/1 atomic_flag.test_and_set()，但 GCC 在atomic<bool>. clangand al,1在 TAS_bool 中做了一个奇怪的事情，它会将值2视为假。这似乎毫无意义；ABI 保证bool内存中的 a始终存储为 a0或1字节。

对于 ARM，我们有ldrexb/strexb交换重试循环，或者只有strb+dmb ish用于纯存储。或者 AArch64 可以使用stlrb wzr, [x0]forclear或 assign-false 来执行（零寄存器的）顺序释放存储，而无需屏障。

归档时间：	5 年，8 月前
查看次数：	685 次
最近记录：	5 年，8 月前