CMPXCHG16B对吗?

JH.*_*JH. 8 c++ x86 code-access-security

虽然我不确定为什么,但这似乎并不正确.建议会很棒,因为CMPXCHG16B的文档很少(我没有任何英特尔手册...)

template<>
inline bool cas(volatile types::uint128_t *src, types::uint128_t cmp, types::uint128_t with)
{
    /*
    Description:
     The CMPXCHG16B instruction compares the 128-bit value in the RDX:RAX and RCX:RBX registers 
     with a 128-bit memory location. If the values are equal, the zero flag (ZF) is set, 
     and the RCX:RBX value is copied to the memory location. 
     Otherwise, the ZF flag is cleared, and the memory value is copied to RDX:RAX.
     */
    uint64_t * cmpP = (uint64_t*)&cmp;
    uint64_t * withP = (uint64_t*)&with;
    unsigned char result = 0;
    __asm__ __volatile__ (
    "LOCK; CMPXCHG16B %1\n\t"
    "SETZ %b0\n\t"
    : "=q"(result)  /* output */ 
    : "m"(*src), /* input */
      //what to compare against
      "rax"( ((uint64_t) (cmpP[1])) ), //lower bits
      "rdx"( ((uint64_t) (cmpP[0])) ),//upper bits
      //what to replace it with if it was equal
      "rbx"( ((uint64_t) (withP[1])) ), //lower bits
      "rcx"( ((uint64_t) (withP[0]) ) )//upper bits
    : "memory", "cc", "rax", "rdx", "rbx","rcx" /* clobbered items */
    );
    return result;
}
Run Code Online (Sandbox Code Playgroud)

当运行一个例子时,我应该是0,当它应该是1.任何想法?

小智 13

注意到一些问题,

(1)主要问题是约束,"rax"不做它看起来的样子,而是第一个字符"r"让gcc使用任何寄存器.

(2)不确定你的存储类型:: uint128_t,但假设x86平台的标准小端,那么高和低双字也会被交换.

(3)获取某些内容的地址并将其转换为其他内容可能会破坏别名规则.取决于你的类型:: uint128_t是如何定义的,这是一个问题(如果它是两个uint64_t的结构,那就很好).具有-O2的GCC将优化假设不违反别名规则.

(4)*src应该真正标记为输出,而不是指定内存clobber.但这实际上更多的是性能而非正确性问题.类似地,rbx和rcx不需要指定为破坏.

这是一个有效的版本,

#include <stdint.h>

namespace types
{
    // alternative: union with  unsigned __int128
    struct uint128_t
    {
        uint64_t lo;
        uint64_t hi;
    }
    __attribute__ (( __aligned__( 16 ) ));
}

template< class T > inline bool cas( volatile T * src, T cmp, T with );

template<> inline bool cas( volatile types::uint128_t * src, types::uint128_t cmp, types::uint128_t with )
{
    // cmp can be by reference so the caller's value is updated on failure.

    // suggestion: use __sync_bool_compare_and_swap and compile with -mcx16 instead of inline asm
    bool result;
    __asm__ __volatile__
    (
        "lock cmpxchg16b %1\n\t"
        "setz %0"       // on gcc6 and later, use a flag output constraint instead
        : "=q" ( result )
        , "+m" ( *src )
        , "+d" ( cmp.hi )
        , "+a" ( cmp.lo )
        : "c" ( with.hi )
        , "b" ( with.lo )
        : "cc", "memory" // compile-time memory barrier.  Omit if you want memory_order_relaxed compile-time ordering.
    );
    return result;
}

int main()
{
    using namespace types;
    uint128_t test = { 0xdecafbad, 0xfeedbeef };
    uint128_t cmp = test;
    uint128_t with = { 0x55555555, 0xaaaaaaaa };
    return ! cas( & test, cmp, with );
}
Run Code Online (Sandbox Code Playgroud)

  • 我复制并粘贴了你的代码,当用"g ++ - 4.7 -g -DDEBUG = 1 -std = c ++ 0x -pthread dwcas.c -o dwcas.o -ldl -lpthread"编译时,我得到了dwcas.c:29 :错误:表达式后的垃圾`ptr'.任何想法为什么? (2认同)