使用重音编辑距离

vig*_*gte 1 python edit-distance

在python中是否有一些考虑到重音的编辑距离.例如,举行以下财产

d('ab', 'ac') > d('àb', 'ab') > 0
Run Code Online (Sandbox Code Playgroud)

roo*_*oot 5

使用Levenshtein模块:

In [1]: import unicodedata, string

In [2]: from Levenshtein import distance

In [3]: def remove_accents(data):
   ...:     return ''.join(x for x in unicodedata.normalize('NFKD', data)
   ...:                             if x in string.ascii_letters).lower()

In [4]: def norm_dist(s1, s2):
   ...:     norm1, norm2 = remove_accents(s1), remove_accents(s2)
   ...:     d1, d2 = distance(s1, s2), distance(norm1, norm2)
   ...:     return (d1+d2)/2.

In [5]: norm_dist(u'ab', u'ac')
Out[5]: 1.0

In [6]: norm_dist(u'àb', u'ab')
Out[6]: 0.5
Run Code Online (Sandbox Code Playgroud)