使用numpy的k-最近邻分类器

Question

使用numpy的k-最近邻分类器

我正在尝试实现自己的kNN分类器.我已经设法实现了一些东西,但速度非常慢......

def euclidean_distance(X_train, X_test):
    """
    Create list of all euclidean distances between the given
    feature vector and all other feature vectors in the training set
    """
    return [np.linalg.norm(X - X_test) for X in X_train]

def k_nearest(X, Y, k):
    """
    Get the indices of the nearest feature vectors and return a
    list of their classes
    """
    idx = np.argpartition(X, k)
    return np.take(Y, idx[:k])

def predict(X_test):
    """
    For each feature vector get its predicted class
    """
    distance_list = [euclidean_distance(X_train, X) for X in X_test]
    return np.array([Counter(k_nearest(distances, Y_train, k)).most_common()[0][0] for distances in distance_list])

Run Code Online (Sandbox Code Playgroud)

在哪里(例如)

X = [[  1.96701284   6.05526865]
     [  1.43021202   9.17058291]]

Y = [ 1.  0.]

Run Code Online (Sandbox Code Playgroud)

显然,如果我不使用任何for循环,它会快得多,但我不知道如何让它在没有它们的情况下工作.有没有办法可以在不使用for循环/列表推导的情况下完成此操作？

Answer 1

Div*_*kar 8

这是一个矢量化的方法 -

from scipy.spatial.distance import cdist
from scipy.stats import mode

dists = cdist(X_train, X)
idx = np.argpartition(dists, k, axis=0)[:k]
nearest_dists = np.take(Y_train, idx)
out = mode(nearest_dists,axis=0)[0]

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，10 月前
查看次数：	5150 次
最近记录：	8 年，10 月前