给定坐标，如何获得 K 个最远的点？

Question

给定坐标，如何获得 K 个最远的点？

Duc*_*een 11 python metrics cluster-analysis points

我们有 10000 行ages (float), titles (enum/int), scores (float), ....

我们有 N 列，每列都有一个表中的 int/float 值。
你可以把它想象成 ND 空间中的点
我们想选择 K 个点，它们之间的距离最大化。

因此，如果我们在一个紧密排列的集群中有 100 个点，而在远处有 1 个点，我们将得到如下三个点的结果：或这个

对于 4 点，它会变得更有趣并在中间选择一些点。

那么如何从 N（具有任何复杂性）中选择 K 个最远的行（点）？它看起来像一个具有给定分辨率的 ND 点云“三角测量”，但不适用于 3d 点。

我为 K=200 和 N=100000 和 ND=6（可能是基于 KDTree、SOM 或三角剖分的多重网格或人工神经网络......）寻找一种相当快速的方法（近似 - 不需要精确的解决方案）。

Answer 1

Ash*_*'Sa 2

假设如果您将具有 N (10000) 行和 D 维度（或特征）的 csv 文件读取到N*Dmartix X 中。您可以计算每个点之间的距离并将其存储在距离矩阵中，如下所示：

import numpy as np
X = np.asarray(X) ### convert to numpy array
distance_matrix = np.zeros((X.shape[0],X.shape[0]))
for i in range(X.shape[0]):
    for j in range(i+1,X.shape[0]): 
    ## We compute triangle matrix and copy the rest. Distance from point A to point B and distance from point B to point A are the same. 
        distance_matrix[i][j]= np.linalg.norm(X[i]-X[j]) ## Here I am calculating Eucledian distance. Other distance measures can also be used.

        #distance_matrix = distance_matrix + distance_matrix.T - np.diag(np.diag(distance_matrix)) ## This syntax can be used to get the lower triangle of distance matrix, which is not really required in your case.
        K = 5 ## Number of points that you want to pick

        indexes = np.unravel_index(np.argsort(distance_matrix.ravel())[-1*K:], distance_matrix.shape)

        print(indexes)

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，4 月前
查看次数：	1540 次
最近记录：	5 年，3 月前