可能最常用的度量标准是Levenshtein Distance,有时称为"编辑距离".简单来说,它衡量您需要使比较中的字符串相同的编辑(添加,删除或一般化方法,还有转置)的数量.
该算法具有简单,高效和众所周知的实现,这里的伪代码直接来自之前链接的维基百科文章:
int LevenshteinDistance(char s[1..m], char t[1..n])
{
// for all i and j, d[i,j] will hold the Levenshtein distance between
// the first i characters of s and the first j characters of t;
// note that d has (m+1)x(n+1) values
declare int d[0..m, 0..n]
for i from 0 to m
d[i, 0] := i // the distance of any first string to an empty second string
for j from 0 to n
d[0, j] := j // the distance of any second string to an empty first string
for j from 1 to n
{
for i from 1 to m
{
if s[i] = t[j] then
d[i, j] := d[i-1, j-1] // no operation required
else
d[i, j] := minimum
(
d[i-1, j] + 1, // a deletion
d[i, j-1] + 1, // an insertion
d[i-1, j-1] + 1 // a substitution
)
}
}
return d[m,n]
}
Run Code Online (Sandbox Code Playgroud)
另请参阅这个相关的SO问题:用于模糊字符串比较的优秀Python模块?