什么是word_at_a_time.h中的has_zero和find_zero用于

dsp*_*pjm 8 c hash linux-kernel

在linux内核中,inlucde/linux/word_at_a_time.h有两个函数:

static inline long find_zero(unsigned long mask)
{
        long byte = 0;
#ifdef CONFIG_64BIT
        if (mask >> 32)
                mask >>= 32;
        else
                byte = 4;
#endif
        if (mask >> 16)
                mask >>= 16;    
        else
                byte += 2;
        return (mask >> 8) ? byte : byte + 1;
}

static inline bool has_zero(unsigned long val, 
                            unsigned long *data, 
                            const struct word_at_a_time *c)
{
        unsigned long rhs = val | c->low_bits;
        *data = rhs;
        return (val + c->high_bits) & ~rhs;
}
Run Code Online (Sandbox Code Playgroud)

它用在哈希函数中,在git log中它说:

 - has_zero(): take a word, and determine if it has a zero byte in it.
   It gets the word, the pointer to the constant pool, and a pointer to
   an intermediate "data" field it can set.
Run Code Online (Sandbox Code Playgroud)

但我仍然没有得到

(1)这个功能做什么?,
(2)他们是怎么做的?

Gri*_*han 5

假设位从最低有效位(LSB)(编号0)到最高有效位(MSB)编号.

它能做什么?

函数首先使用类似于除法和征服的技术从左侧find_zero()搜索值为零的<= 7个字节. N

怎么样?

假设CONFIG_64BIT定义了64位系统,然后执行以下代码:

#ifdef CONFIG_64BIT
        if (mask >> 32)  //Any bit=1 in left 32 bits 
                mask >>= 32;
        else
                byte = 4;  //<--No, fist 4 bytes are zero
Run Code Online (Sandbox Code Playgroud)

首先要注意的maskunsigned,>>未签名的右筛选也是如此.(比如Java的>>>)

它首先检查mask值是否大于2 32(或者我们可以说在unsigned long mask编号32为的位之间的任何位63是否为1).

mask >> 32将产生一个纯值,它mask的右移零位0扩展32,并使32高位包含零.

例如,如果maks位如下:

63     56 55     48 47     40 39     32 31     24 23     16 15        7       0
?       ? ?       ? ?       ? ?       ? ?       ? ?       ? ?         ?       ?
0000 1000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0001 0101
?                                                                             ?
MSB                                                                         LSM
Run Code Online (Sandbox Code Playgroud)

在这个例子中,比特数3459是一个,所以(mask >> 32)== true(或说非零!0).因此,对于这个例子if (mask >> 32)== if(!0).
在GENRAL,如果从位号的任何一点3263是一个然后mask将更新为mask >>= 32;== mask = mask >> 32;(好像真,如果语句执行).这导致32高阶32位由于右移而变为低阶位(并且位32至63变为0).

在上面的例子mask变成:

           shifted OLD bit number ----> 63                 45                32
63                  47               32 31                 15                 0
?                   ?                 ? ?                   ?                 ?
0000 0000 0000 0000 0000 0000 0000 0000 0000 1000 0000 0000 0000 0000 0000 0100
?                                                                             ?
MSB                                                                         LSM
|-------------------------------------|
| All are zero                        |
Run Code Online (Sandbox Code Playgroud)

注意:从32到63位是0与从比特031从位复制3263从上方.

到此为止:

案例1:
如果从任意一点3263一个mask然后if执行真实面具更新.(正如我上面解释的那样).变数long byte 仍然存在0.于是,在下一个if(mask >> 16),功能find_zero()将继续比特范围内具有零值,以搜索一个字节4863(在if(mask >> 16),比特编号4863将被检查,如果任何位为1).

情形2:
如果从所有位3263处于原始零mask,则if条件为假(即if(mask >> 32)== if(0))和mask不更新,以及可变long byte成为4,和这表明高的四个字节是0mask.在此之后if(mask >> 16),功能find_zero() 将搜索与零位范围内的字节1631.

同样,在第二个if():

if (mask >> 16)
  mask >>= 16;    
else
  byte += 2; 
Run Code Online (Sandbox Code Playgroud)

它将检查编号的位之间是否有任何位16 to 31.如果所有位都为零,那么byte2在其他部分递增,在if部分掩码将由无符号右移16位更新.
(:如果mask被更新mask则其实这个if(mask >> 16)检查之间的任何位是否48 to 63是一个,否则位1631原面罩)

随后,最后有条件的经营者以类似的方式工作,以检查麻木位之间的任何一点815是一个与否.

                  long byte = 0;
                  64 bit `mask`
                       |
                       |
                       ?
            if(any bit from 32 to 62 is one)---+
            |                                  |
            |False: if(0)                      |True: if(!0)
      all bits between 32 to 64              |A byte=Zero NOT found
      are zero                         Some bits are one from bit 32-63
            |                                  |
            |                                  |
        byte = 4                          byte= 0, and
        64 bit `mask`  <--Orignal       `mask` updated as `mask >>= 32;`
           |                                        |
           |                                        |
           ?                                        ?
if(Any bit from 16 to 31 is one)       if(Any bit from 48 to 63 is one)
   |Because high order bits are Zero    |Note due to >> all high order
   |It check for bits numbered 16-31    |bits becomes zero.
   |                                    |2nd if checks for bit from 16-31
   |                                    |that were originally bit
   |                                    | numbered 48 to 63
   |                                    |
   ?                                    ?
Case False: if(0)                        Case False: if(0)
     All bits Zero from 16-31               All bits Zero from 49-63
     mask will NOT updated and              mask will NOT updated and
     byte = 6                                  byte = 2

Case True: if(!0)                         Case False: if(!0)
    Some bits are one from 16-31         Some bits are one from 49-63
    mask updated                            mask Updated
     byte = 4                                  byte = 0
   |                                        |
   |                                        |
   ?                                        ?
more four cases                          more four cases

Total 8 different answer outputs from `0` to `7`
Run Code Online (Sandbox Code Playgroud)

因此,函数从左侧find_zero()搜索值为零的第一个 N字节.输出字节值可以是来自0,7也不考虑最右边的字节("8").
(*注意:长64位长= 8*8位= 8字节.*)

byte NUMBER ("):      
   "1"       "2"        "3"       "4"       "5"       "6"       "7"       "8"
63     56 55     48 47     40 39     32 31     24 23     16 15        7       0
?       ? ?       ? ?       ? ?       ? ?       ? ?       ? ?         ?       ?
0000 1000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0001 0101
?                                                                             ?
MSB                                                                         LSM
Run Code Online (Sandbox Code Playgroud)

byte = 0- >表示第一个字节不是零
字节= 1- >高位字节为零,位编号5663
byte = 2- >两个高位字节为零,位编号4863
byte = 3- >三个高位字节是零,其位编号4063
:
:
类似地,字节= 7- >七个左字节是0,(或所有0).

流程图

流程图

下面我写了一个小代码,调用find_zero()具有不同值的函数,将帮助您理解函数:

int main(){ 
 printf("\nmask=0x0, all bytes are zero, result =%ld", find_zero(0x0L)); // not prints 8
 printf("\nmask=0xff, left seven bytes are zero, result =%ld", find_zero(0xffL));
 printf("\nmask=0xffff, left six bytes are zero, result =%ld", find_zero(0xffffL));
 printf("\nmask=0xffffff, left five bytes are zero, result =%ld", find_zero(0xffffffL));
 printf("\nmask=0xffffffff, left four bytes are zero, result =%ld", find_zero(0xffffffffL));
 printf("\nmask=0xffffffffff, left three bytes are zero, result =%ld", find_zero(0xffffffffffL));
 printf("\nmask=0xffffffffffff, left two bytes are zero, result =%ld", find_zero(0xffffffffffffL));
 printf("\nmask=0xffffffffffffff, left most one byte is zero, result =%ld", find_zero(0xffffffffffffffL));
 printf("\nmask=0xffffffffffffffff, No byte is zero, result =%ld", find_zero(0xffffffffffffffffL));
 printf("\nmask=0x8000000000000000L, first byte is NOT zero (so no search), result =%ld", find_zero(0x8000000000000000LL));  
    printf("\n");
    return 1;
}
Run Code Online (Sandbox Code Playgroud)

请观察输出以了解功能:

mask=0x0, all bytes are zero, result =7
mask=0xff, left seven bytes are zero, result =7
mask=0xffff, left six bytes are zero, result =6
mask=0xffffff, left five bytes are zero, result =5
mask=0xffffffff, left four bytes are zero, result =4
mask=0xffffffffff, left three bytes are zero, result =3
mask=0xffffffffffff, left two bytes are zero, result =2
mask=0xffffffffffffff, left most one byte is zero, result =1
mask=0xffffffffffffffff, No byte is zero, result =0
mask=0x8000000000000000L, first byte is NOT zero (so no search), result =0
Run Code Online (Sandbox Code Playgroud)

注意:如果所有字节都为零,7则不打印8.