优化按位逻辑

Question

优化按位逻辑

ron*_*nag 12 c++ optimization 64-bit bit-manipulation

在我的代码中,以下行目前是热点:

int table1[256] = /*...*/;
int table2[512] = /*...*/;
int table3[512] = /*...*/;

int* result = /*...*/;
for(int r = 0; r < r_end; ++r)
{
    std::uint64_t bits = bit_reader.value(); // 64 bits, no assumption regarding bits.

    // The get_ functions are table lookups from the highest word of the bits variable.

    struct entry
    {
        int sign_offset : 5;
        int r_offset    : 4;        
        int x           : 7;        
    };

    // NOTE: We are only interested in the highest word in the bits variable.

    entry e;
    if(is_in_table1(bits)) // branch prediction should work well here since table1 will be hit more often than 2 or 3, and 2 more often than 3.
        e = reinterpret_cast<const entry&>(table1[get_table1_index(bits)]);
    else if(is_in_table2(bits))
        e = reinterpret_cast<const entry&>(table2[get_table2_index(bits)]);
    else
        e = reinterpret_cast<const entry&>(table3[get_table3_index(bits)]);

    r                 += e.r_offset; // r is 18 bits, top 14 bits are always 0.
    int x              = e.x; // x is 14 bits, top 18 bits are always 0.        
    int sign_offset    = e.sign_offset;

    assert(sign_offset <= 16 && sign_offset > 0);

    // The following is the hotspot.

    int sign    = 1 - (bits >> (63 - sign_offset) & 0x2);
    (*result++) = ((x << 18) * sign) | r; // 32 bits

    // End of hotspot

    bit_reader.skip(sign_offset); // sign_offset is the last bit used.
}

Run Code Online (Sandbox Code Playgroud)

虽然我还没有弄清楚如何进一步优化这个,也许是来自Bit-Granularity操作的内在函数,__shiftleft128或者_rot可能有用吗？

请注意,我也正在处理GPU上的结果数据,因此重要的是获取resultGPU然后可用于计算正确数据的内容.

建议？

编辑:

添加了表查找.

编辑:

            int sign = 1 - (bits >> (63 - e.sign_offset) & 0x2);
000000013FD6B893  and         ecx,1Fh  
000000013FD6B896  mov         eax,3Fh  
000000013FD6B89B  sub         eax,ecx  
000000013FD6B89D  movzx       ecx,al  
000000013FD6B8A0  shr         r8,cl  
000000013FD6B8A3  and         r8d,2  
000000013FD6B8A7  mov         r14d,1  
000000013FD6B8AD  sub         r14d,r8d

Run Code Online (Sandbox Code Playgroud)

Answer 1

ron*_*nag 0

我认为这是最快的解决方案：

*result++ = (_rotl64(bits, sign_offset) << 31) | (x << 18) | (r << 0); // 32 bits

Run Code Online (Sandbox Code Playgroud)

然后根据 GPU 上是否设置了符号位来纠正 x。

归档时间：	13 年，6 月前
查看次数：	562 次
最近记录：	13 年，6 月前