我试图比较2个字符串列表的相似性,并将它们呈现在熊猫数据框中以供检查; 所以我使用1个列表作为索引,另一个作为列列表.然后我想计算它们上的"Levenshtein相似性"(比较两个单词之间的相似性的函数).
我试图通过使用应用映射来实现这一点,它将进入每个单元格,并将单元格索引与单元格列进行比较.但我怎么能这样做?或者可能会有一些更简单的方法?
things = ['car', 'bike', 'sidewalk', 'eatery']
action = ['walking', 'caring', 'biking', 'eating']
matrix = pd.DataFrame(index = things, columns = action)
def lev(x):
x = Levenshtein.distance(x.index, x.column)
matrix.applymap(lev)
Run Code Online (Sandbox Code Playgroud)
到目前为止,我使用以下(下面),但我发现它笨拙和缓慢
matrix = pd.DataFrame(data = [action for i in things], index = things, columns = action)
for i, values in matrix.iterrows():
for j, value in enumerate(values):
matrix.ix[i,j] = Levenshtein.distance(i, value)
Run Code Online (Sandbox Code Playgroud)
我认为你可以使用apply和for列值使用.name:
def lev(x):
#replace your function
return x.index + x.name
a = matrix.apply(lev)
print (a)
walking caring biking eating
car carwalking carcaring carbiking careating
bike bikewalking bikecaring bikebiking bikeeating
sidewalk sidewalkwalking sidewalkcaring sidewalkbiking sidewalkeating
eatery eaterywalking eaterycaring eaterybiking eateryeating
Run Code Online (Sandbox Code Playgroud)
编辑:
如果需要一些arithemtic操作使用广播:
a = pd.DataFrame(matrix.index.values + matrix.columns.values[:,None],
index=matrix.index,
columns=matrix.columns)
print (a)
walking caring biking eating
car carwalking bikewalking sidewalkwalking eaterywalking
bike carcaring bikecaring sidewalkcaring eaterycaring
sidewalk carbiking bikebiking sidewalkbiking eaterybiking
eatery careating bikeeating sidewalkeating eateryeating
Run Code Online (Sandbox Code Playgroud)
要么:
a = pd.DataFrame(matrix.index.values + matrix.columns.values[:, np.newaxis],
index=matrix.index,
columns=matrix.columns)
print (a)
walking caring biking eating
car carwalking bikewalking sidewalkwalking eaterywalking
bike carcaring bikecaring sidewalkcaring eaterycaring
sidewalk carbiking bikebiking sidewalkbiking eaterybiking
eatery careating bikeeating sidewalkeating eateryeating
Run Code Online (Sandbox Code Playgroud)
小智 5
您可以通过“嵌套apply” 来做到这一点,如下所示:
things = ['car', 'bike', 'sidewalk', 'eatery']
action = ['walking', 'caring', 'biking', 'eating']
matrix = pd.DataFrame(index=things, columns=action)
matrix.apply(lambda x: pd.DataFrame(x).apply(lambda y: LD(x.name, y.name), axis=1))
Run Code Online (Sandbox Code Playgroud)
输出:
walking caring biking eating
car 6 3 6 5
bike 6 5 3 5
sidewalk 7 8 7 8
eatery 6 5 6 3
Run Code Online (Sandbox Code Playgroud)
pd.DataFrame(x)这里的调用是因为x是Series对象,并且Series.apply类似于applymap,不携带index任何columns信息。