两个二进制字符串之间的汉明距离不起作用

Hyp*_*ion 6 python binary bit hamming-distance

我发现了一个有趣的算法来计算这个站点的汉明距离:

def hamming2(x,y):
    """Calculate the Hamming distance between two bit strings"""
    assert len(x) == len(y)
    count,z = 0,x^y
    while z:
        count += 1
        z &= z-1 # magic!
    return count
Run Code Online (Sandbox Code Playgroud)

关键是这个算法只适用于位串,我试图比较两个二进制字符串,但它们是字符串格式,如

'100010'
'101000'
Run Code Online (Sandbox Code Playgroud)

如何使它们与此算法一起使用?

dla*_*ask 26

实施它:

def hamming2(s1, s2):
    """Calculate the Hamming distance between two bit strings"""
    assert len(s1) == len(s2)
    return sum(c1 != c2 for c1, c2 in zip(s1, s2))
Run Code Online (Sandbox Code Playgroud)

并测试它:

assert hamming2("1010", "1111") == 2
assert hamming2("1111", "0000") == 4
assert hamming2("1111", "1111") == 0
Run Code Online (Sandbox Code Playgroud)


Ada*_*mes 5

如果我们要坚持使用原始算法,我们需要将字符串转换为整数以便能够使用按位运算符.

def hamming2(x_str, y_str):
    """Calculate the Hamming distance between two bit strings"""
    assert len(x_str) == len(y_str)
    x, y = int(x_str, 2), int(y_str, 2)  # '2' specifies we are reading a binary number
    count, z = 0, x ^ y
    while z:
        count += 1
        z &= z - 1  # magic!
    return count
Run Code Online (Sandbox Code Playgroud)

然后我们可以这样称呼它:

print(hamming2('100010', '101000'))
Run Code Online (Sandbox Code Playgroud)

虽然这种算法很酷,但必须转换为字符串可能会抵消它可能具有的任何速度优势.@dlask发布的答案更为简洁.


Pan*_*al. 5

这就是我用来计算汉明距离的方法.
它计算相等长度字符串之间的差异数.

def hamdist(str1, str2):
    diffs = 0
    for ch1, ch2 in zip(str1, str2):
        if ch1 != ch2:
            diffs += 1
    return diffs
Run Code Online (Sandbox Code Playgroud)