为什么 hashlib 比 sha256 的其他代码更快?如何让我的代码接近 hashlib 性能?

Pou*_*uJa 4 python hash sha256 hashlib python-3.x

下面的代码hashlib.sha256()与我sha256_test()用原始 python 编写的函数在哈希率性能方面进行了比较。

\n\n
from time import time_ns as time\nimport hashlib\n\ndef pad512(bytes_):\n    L       = len(bytes_)*8\n    K       = 512 - ((L + 1) % 512)\n    padding = (1 << K) | L\n    return bytes_ + padding.to_bytes((K + 1)//8, \'big\')\n\ndef mpars (M):\n    chunks = []\n    while M:\n        chunks.append(M[:64])\n        M = M[64:]\n    return chunks\n\ndef sha256_transform(H, Kt, W):\n    a, b, c, d, e, f, g, h = H\n    # Step 1: Looping\n    for t in range(0, 64):\n        T1 = h + g1(e) + Ch(e, f, g) + Kt[t] + W[t]\n        T2 = (g0(a) + Maj(a, b, c))\n        h = g\n        g = f\n        f = e\n        e = (d + T1) & 0xffffffff\n        d = c\n        c = b\n        b = a\n        a = (T1 + T2) & 0xffffffff\n    # Step 2: Updating Hashes\n    H[0] = (a + H[0]) & 0xffffffff\n    H[1] = (b + H[1]) & 0xffffffff\n    H[2] = (c + H[2]) & 0xffffffff\n    H[3] = (d + H[3]) & 0xffffffff\n    H[4] = (e + H[4]) & 0xffffffff\n    H[5] = (f + H[5]) & 0xffffffff\n    H[6] = (g + H[6]) & 0xffffffff\n    H[7] = (h + H[7]) & 0xffffffff\n    return H\n\nCh   = lambda x, y, z: (z ^ (x & (y ^ z)))\n##    """The x input chooses if the output is from y or z.\n##    Ch(x,y,z)=(x\xe2\x88\xa7y)\xe2\x8a\x95(\xc2\xacx\xe2\x88\xa7z)"""\nMaj  = lambda x, y, z: (((x | y) & z) | (x & y))\n##    """The result is set according to the majority of the 3 inputs.\n##    Maj(x, y,z) = (x \xe2\x88\xa7 y) \xe2\x8a\x95 (x \xe2\x88\xa7 z) \xe2\x8a\x95 ( y \xe2\x88\xa7 z)"""\n\nROTR = lambda x, y: (((x & 0xffffffff) >> (y & 31)) | (x << (32 - (y & 31)))) & 0xffffffff\nSHR  = lambda x, n: (x & 0xffffffff) >> n\n\ns0   = lambda x: (ROTR(x, 7) ^ ROTR(x, 18) ^ SHR(x, 3))\ns1   = lambda x: (ROTR(x, 17) ^ ROTR(x, 19) ^ SHR(x, 10))\n\ng0   = lambda x: (ROTR(x, 2) ^ ROTR(x, 13) ^ ROTR(x, 22))\ng1   = lambda x: (ROTR(x, 6) ^ ROTR(x, 11) ^ ROTR(x, 25))\n\ndef sha256_test (bytes_):\n    #Parameters\n    initHash = [\n                0x6A09E667, 0xBB67AE85, 0x3C6EF372, 0xA54FF53A,\n                0x510E527F, 0x9B05688C, 0x1F83D9AB, 0x5BE0CD19,\n                ]\n    Kt = [\n        0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,\n        0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,\n        0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,\n        0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,\n        0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,\n        0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,\n        0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,\n        0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2,\n        ]\n\n    padM   = pad512(bytes_)\n    chunks = mpars(padM)\n    # Preparing Initaial Hashes\n    H = initHash    \n    # Starting the Main Loop\n    for chunk in chunks:\n        W = []\n        # Step 1: Preparing Wt\n        for t in range(0, 16):\n            W.append((((((chunk[4*t] << 8) | chunk[4*t+1]) << 8) | chunk[4*t+2]) << 8) | chunk[4*t+3])\n        for t in range(16, 64):\n            W.append((s1(W[t-2]) + W[t-7] + s0(W[t-15]) + W[t-16]) & 0xffffffff)\n        # Step 2: transform the hash\n        H = sha256_transform(H, Kt, W)\n        # Step 3: Give Out the digest\n        Hash = b\'\'\n        for j in H:\n            Hash += (j.to_bytes(4, byteorder=\'big\'))\n\n    return Hash\n\nif __name__ == "__main__":\n\n    k = 10000\n    M = bytes.fromhex(\'00000000000000000001d2c45d09a2b4596323f926dcb240838fa3b47717bff6\') #block #548867\n    start = time()\n    for i in range(0, k):\n        o1 = sha256_test(sha256_test(M))\n    end    = time()\n    endtns1 = (end-start)/k\n    endts1  = endtns1 * 1e-9\n    print(\'@sha256_TESTs() Each iteration takes:  {} (ns) and {} (sec).\'.format(endtns1, endts1))\n    print(\'@sha256_TESTs() Calculated Hash power: {} (h/s)\'.format(int(2/endts1)))\n\n    start = time()\n    for i in range(0, k):\n        o2 = hashlib.sha256(hashlib.sha256(M).digest()).digest()\n    end    = time()\n    endtns2 = (end-start)/k\n    endts2  = endtns2 * 1e-9\n    print(\'@hashlib.sha256() Each iteration takes:  {} (ns) and {} (sec).\'.format(endtns2, endts2))\n    print(\'@hashlib.sha256() Calculated Hash power: {} (Kh/s)\'.format(int(2/endts2/1024)))\n\n    print(\'Outputs Match       : \', o1 == o2)\n    print(\'hashlib is ~{} times faster\'.format(int(endtns1/endtns2)))\n
Run Code Online (Sandbox Code Playgroud)\n\n

计算算力时,1 Kilo Hash算算1000哈希还是1024散列?!

\n\n

如果我计算哈希率的方法是正确的,我的结论是我的电脑可以~900 (h/s)使用我自己的sha256_test()函数生成哈希率,而hashlib.sha256()使用~300 Kh/s.

\n\n

首先,我想了解一下hashlib出色性能背后的机制。当我阅读 中的代码时hashlib.py,里面没有太多代码,我无法理解哈希值是如何计算的。可以看到后面的代码吗hashlib.sha256()

\n\n

其次,是否有可能改进我的代码,使其接近性能300 (Kh/s)?我读过有关 Cython 的文章,我只是不确定它能够在多大程度上改进这种算法。

\n\n

第三,这在技术上可能比hashlibPython更快吗?

\n

Ale*_*kov 6

老实说,查看 hashlib.py 不会对您有太大帮助,但它可能会给您一个提示。你所做的是纯Python代码,而hashlib依赖于C实现,并且可以轻松地围绕纯Python运行。也就是说你需要看看这个。因此,如果您想接近这些数字,您需要研究 cython、C、C++ 或 Rust。