vig*_*gte 1 python edit-distance
在python中是否有一些考虑到重音的编辑距离.例如,举行以下财产
d('ab', 'ac') > d('àb', 'ab') > 0
Run Code Online (Sandbox Code Playgroud)
In [1]: import unicodedata, string
In [2]: from Levenshtein import distance
In [3]: def remove_accents(data):
...: return ''.join(x for x in unicodedata.normalize('NFKD', data)
...: if x in string.ascii_letters).lower()
In [4]: def norm_dist(s1, s2):
...: norm1, norm2 = remove_accents(s1), remove_accents(s2)
...: d1, d2 = distance(s1, s2), distance(norm1, norm2)
...: return (d1+d2)/2.
In [5]: norm_dist(u'ab', u'ac')
Out[5]: 1.0
In [6]: norm_dist(u'àb', u'ab')
Out[6]: 0.5
Run Code Online (Sandbox Code Playgroud)