Pandas - 在applymap期间检索每个元素的行名和列名

jim*_*iat 7 python pandas

我试图比较2个字符串列表的相似性,并将它们呈现在熊猫数据框中以供检查; 所以我使用1个列表作为索引,另一个作为列列表.然后我想计算它们上的"Levenshtein相似性"(比较两个单词之间的相似性的函数).

我试图通过使用应用映射来实现这一点,它将进入每个单元格,并将单元格索引与单元格列进行比较.但我怎么能这样做?或者可能会有一些更简单的方法?

things = ['car', 'bike', 'sidewalk', 'eatery']
action = ['walking', 'caring', 'biking', 'eating']
matrix = pd.DataFrame(index = things, columns = action)

def lev(x):
    x = Levenshtein.distance(x.index, x.column)  
matrix.applymap(lev)
Run Code Online (Sandbox Code Playgroud)

到目前为止,我使用以下(下面),但我发现它笨拙和缓慢

matrix = pd.DataFrame(data = [action for i in things], index = things, columns = action)
for i, values in matrix.iterrows():
    for j, value in enumerate(values):
        matrix.ix[i,j] = Levenshtein.distance(i, value) 
Run Code Online (Sandbox Code Playgroud)

jez*_*ael 8

我认为你可以使用apply和for列值使用.name:

def lev(x):
    #replace your function
    return x.index + x.name
a = matrix.apply(lev)
print (a)
                  walking          caring          biking          eating
car            carwalking       carcaring       carbiking       careating
bike          bikewalking      bikecaring      bikebiking      bikeeating
sidewalk  sidewalkwalking  sidewalkcaring  sidewalkbiking  sidewalkeating
eatery      eaterywalking    eaterycaring    eaterybiking    eateryeating
Run Code Online (Sandbox Code Playgroud)

编辑:

如果需要一些arithemtic操作使用广播:

a = pd.DataFrame(matrix.index.values + matrix.columns.values[:,None], 
                 index=matrix.index, 
                 columns=matrix.columns)
print (a)
             walking       caring           biking         eating
car       carwalking  bikewalking  sidewalkwalking  eaterywalking
bike       carcaring   bikecaring   sidewalkcaring   eaterycaring
sidewalk   carbiking   bikebiking   sidewalkbiking   eaterybiking
eatery     careating   bikeeating   sidewalkeating   eateryeating
Run Code Online (Sandbox Code Playgroud)

要么:

a = pd.DataFrame(matrix.index.values + matrix.columns.values[:, np.newaxis], 
                 index=matrix.index, 
                 columns=matrix.columns)
print (a)
             walking       caring           biking         eating
car       carwalking  bikewalking  sidewalkwalking  eaterywalking
bike       carcaring   bikecaring   sidewalkcaring   eaterycaring
sidewalk   carbiking   bikebiking   sidewalkbiking   eaterybiking
eatery     careating   bikeeating   sidewalkeating   eateryeating
Run Code Online (Sandbox Code Playgroud)


小智 5

您可以通过“嵌套apply” 来做到这一点,如下所示:

things = ['car', 'bike', 'sidewalk', 'eatery']
action = ['walking', 'caring', 'biking', 'eating']
matrix = pd.DataFrame(index=things, columns=action)
matrix.apply(lambda x: pd.DataFrame(x).apply(lambda y: LD(x.name, y.name), axis=1))
Run Code Online (Sandbox Code Playgroud)

输出:

          walking  caring  biking  eating
car             6       3       6       5
bike            6       5       3       5
sidewalk        7       8       7       8
eatery          6       5       6       3
Run Code Online (Sandbox Code Playgroud)

pd.DataFrame(x)这里的调用是因为xSeries对象,并且Series.apply类似于applymap,不携带index任何columns信息。