Python 与 BigQuery FarmHash 有时不相等

Mac*_*ing 4 hash python-3.x google-bigquery

当我运行时在 Bi​​gQuery 中

select farm_fingerprint('6823339101') as f
Run Code Online (Sandbox Code Playgroud)

结果

-889610237538610470
Run Code Online (Sandbox Code Playgroud)

在Python中

#pip install pyfarmhash
import farmhash
print(farmhash.hash64('6823339101'))
Run Code Online (Sandbox Code Playgroud)

结果是

17557133836170941146
Run Code Online (Sandbox Code Playgroud)

BigQuery 和 Python 确实在大多数输入上达成一致,但也有一些特定的输入,例如上面的输入,同一输入存在不匹配

'6823339101'
Run Code Online (Sandbox Code Playgroud)

如何让 bigquery 和 python 100% 一致?

bigquery 和 python 哈希文档的链接

https://pypi.org/project/pyfarmhash/

https://cloud.google.com/bigquery/docs/reference/standard-sql/hash_functions

小智 7

正如评论中提到的,该函数返回一个unsigned int.
所以我们需要将其转换如下;

import numpy as np
np.uint64(farmhash.fingerprint64(x)).astype('int64')
Run Code Online (Sandbox Code Playgroud)

相关问题:https://github.com/lovell/farmhash/issues/26#issuecomment-524581600

结果:

>>> import farmhash
>>> import numpy as np
>>> np.uint64(farmhash.fingerprint64('6823339101')).astype('int64')
-889610237538610470
Run Code Online (Sandbox Code Playgroud)