Lui*_*uis 5 python algorithm edit-distance levenshtein-distance
我正在计算包含最多6个值的序列之间的Levenshtein距离。这些值的顺序不应影响距离。
如何将其实现为迭代或递归算法?
例:
# Currently
>>> LDistance('dog', 'god')
2
# Sorted
>>> LDistance('dgo', 'dgo')
0
# Proposed
>>> newLDistance('dog', 'god')
0
Run Code Online (Sandbox Code Playgroud)
'dog'和'god'具有完全相同的字母,在手之前对字符串进行排序将返回所需的结果。但是,这并非始终有效:
# Currently
>>> LDistance('doge', 'gold')
3
# Sorted
>>> LDistance('dego', 'dglo')
2
# Proposed
>>> newLDistance('doge', 'gold')
1
Run Code Online (Sandbox Code Playgroud)
“ doge”和“ gold”具有3/4个匹配字母,因此应返回距离1。这是我当前的递归代码:
def mLD(s, t):
memo = {}
def ld(s, t):
if not s: return len(t)
if not t: return len(s)
if s[0] == t[0]: return ld(s[1:], t[1:])
if (s, t) not in memo:
l1 = ld(s, t[1:])
l2 = ld(s[1:], t)
l3 = ld(s[1:], t[1:])
memo[(s,t)] = 1 + min(l1, l2, l3)
return memo[(s,t)]
return ld(s, t)
Run Code Online (Sandbox Code Playgroud)
编辑:后续问题:向Levenshtein-Distance-like算法添加异常
为此,您不需要 Levenshtein 机器。
import collections
def distance(s1, s2):
cnt = collections.Counter()
for c in s1:
cnt[c] += 1
for c in s2:
cnt[c] -= 1
return sum(abs(diff) for diff in cnt.values()) // 2 + \
(abs(sum(cnt.values())) + 1) // 2 # can be omitted if len(s1) == len(s2)
Run Code Online (Sandbox Code Playgroud)