El_*_*rón 9 python levenshtein-distance
首先,我想说我是python中的新手.我试图计算许多单词列表的Levenshtein距离.直到现在我成功编写了一对单词的代码,但是我在列表中遇到了一些问题.我只是有两个列表,其中一个在另一个之下,就像这样:carlos stiv peter
我想使用Levenshtein距离进行相似性处理.somebady可以告诉我如何加载列表然后使用函数计算距离?
我会感激的!
这是我的代码只有两个字符串:
#!/usr/bin/env python
# -*- coding=utf-8 -*-
def lev_dist(source, target):
if source == target:
return 0
#words = open(test_file.txt,'r').read().split();
# Prepare matrix
slen, tlen = len(source), len(target)
dist = [[0 for i in range(tlen+1)] for x in range(slen+1)]
for i in xrange(slen+1):
dist[i][0] = i
for j in xrange(tlen+1):
dist[0][j] = j
# Counting distance
for i in xrange(slen):
for j in xrange(tlen):
cost = 0 if source[i] == target[j] else 1
dist[i+1][j+1] = min(
dist[i][j+1] + 1, # deletion
dist[i+1][j] + 1, # insertion
dist[i][j] + cost # substitution
)
return dist[-1][-1]
if __name__ == '__main__':
import sys
if len(sys.argv) != 3:
print 'Usage: You have to enter a source_word and a target_word'
sys.exit(-1)
source, target = sys.argv[1], sys.argv[2]
print lev_dist(source, target)
Run Code Online (Sandbox Code Playgroud)
El_*_*rón 13
我终于得到了代码从朋友一些帮助:)你可以计算莱文斯坦距离和从第二个列表将最后一行的脚本比较它的每一个字,即工作:打印(列表1 [0],列表2 [我]),将list1中的第一个单词与list2中的每个单词进行比较.
谢谢
#!/usr/bin/env python
# -*- coding=utf-8 -*-
import codecs
def lev_dist(source, target):
if source == target:
return 0
# Prepare a matrix
slen, tlen = len(source), len(target)
dist = [[0 for i in range(tlen+1)] for x in range(slen+1)]
for i in range(slen+1):
dist[i][0] = i
for j in range(tlen+1):
dist[0][j] = j
# Counting distance, here is my function
for i in range(slen):
for j in range(tlen):
cost = 0 if source[i] == target[j] else 1
dist[i+1][j+1] = min(
dist[i][j+1] + 1, # deletion
dist[i+1][j] + 1, # insertion
dist[i][j] + cost # substitution
)
return dist[-1][-1]
# load words from a file into a list
def loadWords(file):
list = [] # create an empty list to hold the file contents
file_contents = codecs.open(file, "r", "utf-8") # open the file
for line in file_contents: # loop over the lines in the file
line = line.strip() # strip the line breaks and any extra spaces
list.append(line) # append the word to the list
return list
if __name__ == '__main__':
import sys
if len(sys.argv) != 3:
print 'Usage: You have to enter a source_word and a target_word'
sys.exit(-1)
source, target = sys.argv[1], sys.argv[2]
# create two lists, one of each file by calling the loadWords() function on the file
list1 = loadWords(source)
list2 = loadWords(target)
# now you have two lists; each file has to have the words you are comparing on the same lines
# now call you lev_distance function on each pair from those lists
for i in range(0, len(list1)): # so now you are looping over a range of numbers, not lines
print lev_dist(list1[0], list2[i])
# print lev_dist(source, target)
Run Code Online (Sandbox Code Playgroud)
小智 5
不要重新发明轮子:
http://pypi.python.org/pypi/python-Levenshtein/