相关疑难解决方法(0)

为什么scikit-learn的最近邻居似乎没有返回适当的余弦相似距离?

我试图使用scikit的最近邻实现,从随机值矩阵中找到最接近给定列向量的列向量.

该代码应该找到第21列的最近邻居,然后检查这些邻居与第21列的实际余弦相似性.

from sklearn.neighbors import NearestNeighbors
import sklearn.metrics.pairwise as smp
import numpy as np

test=np.random.randint(0,5,(50,50))
nbrs = NearestNeighbors(n_neighbors=5, algorithm='auto', metric=smp.cosine_similarity).fit(test)
distances, indices = nbrs.kneighbors(test)

x=21   

for idx,d in enumerate(indices[x]):

    sim2 = smp.cosine_similarity(test[:,x],test[:,d])


    print "sklearns cosine similarity would be ", sim2
    print 'sklearns reported distance is', distances[x][idx]
    print 'sklearns if that distance was cosine, the similarity would be: ' ,1- distances[x][idx]
Run Code Online (Sandbox Code Playgroud)

输出看起来像

sklearns cosine similarity would be  [[ 0.66190748]]
sklearns reported distance is 0.616586738214
sklearns if that distance was cosine, the …
Run Code Online (Sandbox Code Playgroud)

nearest-neighbor python-2.7 cosine-similarity scikit-learn

5
推荐指数
1
解决办法
4723
查看次数