Ksh*_*jee 1 algorithm bit-manipulation
这是一个艰难的(至少我很难:P):
在不使用任何循环的情况下查找32位数的最高位集的索引.
非常有趣的问题,我将为您提供基准答案
uint8_t highestBitIndex( uint32_t n )
{
uint8_t r = 0;
while ( n >>= 1 )
r++;
return r;
}
Run Code Online (Sandbox Code Playgroud)
这有助于更好地理解问题,但效率很低。
这种方法也可以用log方法来概括
uint8_t highestSetBitIndex2(uint32_t n) {
return (uint8_t)(log(n) / log(2));
}
Run Code Online (Sandbox Code Playgroud)
然而它的效率也很低(甚至比上面的效率还要低,请参阅基准测试)
uint8_t highestBitIndex3( uint32_t n )
{
return 31 - __builtin_clz(n);
}
Run Code Online (Sandbox Code Playgroud)
这个解决方案虽然非常高效,但它只适用于特定的编译器(gcc 和 clang 都可以)和特定的平台。
注意:如果我们想要索引,则为 31 而不是 32
#include <x86intrin.h>
uint8_t highestSetBitIndex5(uint32_t n)
{
return _bit_scan_reverse(n); // undefined behavior if n == 0
}
Run Code Online (Sandbox Code Playgroud)
这将在汇编级别调用 bsr 指令
LZCNT和BSR可以用汇编语言概括为以下函数:
uint8_t highestSetBitIndex4(uint32_t n) // undefined behavior if n == 0
{
__asm__ __volatile__ (R"(
.intel_syntax noprefix
bsr eax, edi
.att_syntax noprefix
)"
);
}
uint8_t highestSetBitIndex7(uint32_t n) // undefined behavior if n == 0
{
__asm__ __volatile__ (R"(.intel_syntax noprefix
lzcnt ecx, edi
mov eax, 31
sub eax, ecx
.att_syntax noprefix
)");
}
Run Code Online (Sandbox Code Playgroud)
注意:除非您知道自己在做什么,否则请勿使用
首先,使用以下函数清除除最高位之外的所有位:
uint32_t keepHighestBit( uint32_t n )
{
n |= (n >> 1);
n |= (n >> 2);
n |= (n >> 4);
n |= (n >> 8);
n |= (n >> 16);
return n - (n >> 1);
}
Run Code Online (Sandbox Code Playgroud)
图片来源:这个想法来自 Henry S. Warren, Jr. 在他的书《黑客的喜悦》中
然后我们使用基于DeBruijn 序列的算法来执行一种二分搜索:
uint8_t highestBitIndex8( uint32_t b )
{
static const uint32_t deBruijnMagic = 0x06EB14F9; // equivalent to 0b111(0xff ^ 3)
static const uint8_t deBruijnTable[64] = {
0, 0, 0, 1, 0, 16, 2, 0, 29, 0, 17, 0, 0, 3, 0, 22,
30, 0, 0, 20, 18, 0, 11, 0, 13, 0, 0, 4, 0, 7, 0, 23,
31, 0, 15, 0, 28, 0, 0, 21, 0, 19, 0, 10, 12, 0, 6, 0,
0, 14, 27, 0, 0, 9, 0, 5, 0, 26, 8, 0, 25, 0, 24, 0,
};
return deBruijnTable[(keepHighestBit(b) * deBruijnMagic) >> 26];
}
Run Code Online (Sandbox Code Playgroud)
另一个版本:
void propagateBits(uint32_t *n) {
*n |= *n >> 1;
*n |= *n >> 2;
*n |= *n >> 4;
*n |= *n >> 8;
*n |= *n >> 16;
}
uint8_t highestSetBitIndex8(uint32_t b)
{
static const uint32_t Magic = (uint32_t) 0x07C4ACDD;
static const int BitTable[32] = {
0, 9, 1, 10, 13, 21, 2, 29,
11, 14, 16, 18, 22, 25, 3, 30,
8, 12, 20, 28, 15, 17, 24, 7,
19, 27, 23, 6, 26, 5, 4, 31,
};
propagateBits(&b);
return BitTable[(b * Magic) >> 27];
}
Run Code Online (Sandbox Code Playgroud)
编译用g++ -std=c++17 highestSetBit.cpp -O3 && ./a.out
highestBitIndex1 136.8 ms (loop)
highestBitIndex2 183.8 ms (log(n) / log(2))
highestBitIndex3 10.6 ms (de Bruijn lookup Table with power of two, 64 entries)
highestBitIndex4 4.5 ms (inline assembly bsr)
highestBitIndex5 6.7 ms (intrinsic bsr)
highestBitIndex6 4.7 ms (gcc lzcnt)
highestBitIndex7 7.1 ms (inline assembly lzcnt)
highestBitIndex8 10.2 ms (de Bruijn lookup Table, 32 entries)
Run Code Online (Sandbox Code Playgroud)
如果可移植性是您的重点,我个人会选择最高位索引8,否则内置的gcc就很好。
随着递归:
int firstset(int bits) {
return (bits & 0x80000000) ? 31 : firstset((bits << 1) | 1) - 1;
}
Run Code Online (Sandbox Code Playgroud)
[31,..,0]索引| 1通过限制转移次数直到1达到a 来防止堆栈溢出(32)