Auf*_*ind 7 python string mapping list
给出了两个包含字符串的python列表(人名):
list_1 = ['J. Payne', 'George Bush', 'Billy Idol', 'M Stuart', 'Luc van den Bergen']
list_2 = ['John Payne', 'George W. Bush', 'Billy Idol', 'M. Stuart', 'Luc Bergen']
Run Code Online (Sandbox Code Playgroud)
我想要一个最相似的名称映射.
'J. Payne' -> 'John Payne'
'George Bush' -> 'George W. Bush'
'Billy Idol' -> 'Billy Idol'
'M Stuart' -> 'M. Stuart'
'Luc van den Bergen' -> 'Luc Bergen'
Run Code Online (Sandbox Code Playgroud)
在python中有一个简洁的方法吗?这些列表平均包含5个或6个名称.有时更多,但这很少.有时它只是每个列表中的一个名称,可能拼写略有不同.
Joh*_*ooy 11
使用此处定义的函数:http://hetland.org/coding/python/levenshtein.py
>>> for i in list_1:
... print i, '==>', min(list_2, key=lambda j:levenshtein(i,j))
...
Run Code Online (Sandbox Code Playgroud)
J. Payne ==> John Payne George Bush ==> George W. Bush Billy Idol ==> Billy Idol M Stuart ==> M. Stuart Luc van den Bergen ==> Luc Bergen
您可以使用functools.partial而不是lambda
>>> from functools import partial
>>> for i in list_1:
... print i, '==>', min(list_2, key=partial(levenshtein,i))
...
Run Code Online (Sandbox Code Playgroud)
J. Payne ==> John Payne George Bush ==> George W. Bush Billy Idol ==> Billy Idol M Stuart ==> M. Stuart Luc van den Bergen ==> Luc Bergen
Joh*_*rra 10
你可以试试difflib
:
import difflib
list_1 = ['J. Payne', 'George Bush', 'Billy Idol', 'M Stuart', 'Luc van den Bergen']
list_2 = ['John Payne', 'George W. Bush', 'Billy Idol', 'M. Stuart', 'Luc Bergen']
mymap = {}
for elem in list_1:
closest = difflib.get_close_matches(elem, list_2)
if closest:
mymap[elem] = closest[0]
print mymap
Run Code Online (Sandbox Code Playgroud)
输出:
{'George Bush': 'George W. Bush',
'Luc van den Bergen': 'Luc Bergen',
'Billy Idol': 'Billy Idol',
'J. Payne': 'John Payne',
'M Stuart': 'M. Stuart'}
Run Code Online (Sandbox Code Playgroud)