Python中Dataframe中每行之间的余弦相似度

Jay*_*rni 8 python dataframe pandas scikit-learn

我有一个DataFrame包含多个向量,每个向量有3个条目.每行都是我表示的向量.我需要计算每个向量之间的余弦相似度.将其转换为矩阵表示更好还是DataFrame本身有更清晰的方法?

这是我尝试过的代码.

import pandas as pd
from scipy import spatial
df = pd.DataFrame([X,Y,Z]).T
similarities = df.values.tolist()

for x in similarities:
    for y in similarities:
        result = 1 - spatial.distance.cosine(x, y)
Run Code Online (Sandbox Code Playgroud)

mir*_*ulo 17

你可以直接使用sklearn.metrics.pairwise.cosine_similarity.

演示

import numpy as np; import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

df = pd.DataFrame(np.random.randint(0, 2, (3, 5)))

df
##     0  1  2  3  4
##  0  1  1  1  0  0
##  1  0  0  1  1  1
##  2  0  1  0  1  0

cosine_similarity(df)
##  array([[ 1.        ,  0.33333333,  0.40824829],
##         [ 0.33333333,  1.        ,  0.40824829],
##         [ 0.40824829,  0.40824829,  1.        ]])
Run Code Online (Sandbox Code Playgroud)