DJB哈希函数中5381号码的原因?

vij*_*iji 42 algorithm hash primes

谁能告诉我为什么在DJB哈希函数中使用数字5381?

DJB Hash功能是

h(0)= 5381

h(i)= 33*h(i-1)^ str [i]

一个c程序:

unsigned int DJBHash(char* str, unsigned int len)
{
   unsigned int hash = 5381;
   unsigned int i    = 0;

   for(i = 0; i < len; str++, i++)
   {   
      hash = ((hash << 5) + hash) + (*str);
   }   

   return hash;
}
Run Code Online (Sandbox Code Playgroud)

Mar*_*son 60

我偶然发现了一条评论,该评论揭示了DJB的目标:

/*
* DJBX33A (Daniel J. Bernstein, Times 33 with Addition)
*
* This is Daniel J. Bernstein's popular `times 33' hash function as
* posted by him years ago on comp.lang.c. It basically uses a function
* like ``hash(i) = hash(i-1) * 33 + str[i]''. This is one of the best
* known hash functions for strings. Because it is both computed very
* fast and distributes very well.
*
* The magic of number 33, i.e. why it works better than many other
* constants, prime or not, has never been adequately explained by
* anyone. So I try an explanation: if one experimentally tests all
* multipliers between 1 and 256 (as RSE did now) one detects that even
* numbers are not useable at all. The remaining 128 odd numbers
* (except for the number 1) work more or less all equally well. They
* all distribute in an acceptable way and this way fill a hash table
* with an average percent of approx. 86%.
*
* If one compares the Chi^2 values of the variants, the number 33 not
* even has the best value. But the number 33 and a few other equally
* good numbers like 17, 31, 63, 127 and 129 have nevertheless a great
* advantage to the remaining numbers in the large set of possible
* multipliers: their multiply operation can be replaced by a faster
* operation based on just one shift plus either a single addition
* or subtraction operation. And because a hash function has to both
* distribute good _and_ has to be very fast to compute, those few
* numbers should be preferred and seems to be the reason why Daniel J.
* Bernstein also preferred it.
*
*
* -- Ralf S. Engelschall <rse@engelschall.com>
*/
Run Code Online (Sandbox Code Playgroud)

这是一个与你正在查看的哈希函数略有不同的哈希函数,尽管它确实使用了5831幻数.链接目标下面的注释代码已经展开.

然后我发现了这个:

Magic Constant 5381:

  1. odd number

  2. prime number

  3. deficient number

  4. 001/010/100/000/101 b
Run Code Online (Sandbox Code Playgroud)

还有这个答案可以解释一下djb2哈希函数背后的逻辑吗? 它将DJB自己的帖子引用到邮件列表中,提到了5381(摘自此处摘录的答案):

[...]几乎任何好的乘数都有效.如果c和d介于0到255之间,我认为你担心31c + d不能覆盖任何合理范围的哈希值.这就是为什么当我发现33哈希函数并开始在我的压缩器中使用它时,我开始的哈希值为5381.我想你会发现这和261乘数一样好.


Mah*_*dsi 28

5381只是一个数字,在测试中,导致更少的碰撞更好的雪崩.你会在几乎每个哈希算法中找到"魔术常数".

  • 那些交换过的网址让我大笑. (2认同)
  • 问题是如何减少碰撞?我也跟着放声大笑。而且提问者在没有任何证据的情况下就接受了答案!!! (2认同)
  • djb2(如 fnv1a)实际上有 [bad avalanche/distribution](https://i.stack.imgur.com/vm0Id.png)。他们甚至不符合非严格的雪崩标准,这需要更少的计算能力来计算。但他们_确实_有不错的碰撞率。:) 通常碰撞率与其雪崩行为有关,这意味着 djb2 不如其他选择。越接近所有比特都是伪随机的,任何两个值匹配的可能性就越小。 (2认同)

小智 23

我发现这个号码的一个非常有趣的属性可能是一个原因.

5381是第709个素数.
709是第127个素数.
127是第31个素数.
31是第11个素数.
11是5次素数.
5是第3素数.
3是第二素数.
2是第一素数.

5381是第一个发生8次的数字.5381st prime可能超过signed int的限制,所以停止链是一个好点.

  • https://oeis.org/search?q=5381 5381st prime并不接近有符号int的限制. (4认同)
  • 您是怎么找到这个的? (2认同)