fid*_*eli 39 python numpy euclidean-distance
我有两个x - y坐标数组,我想找到一个数组中每个点与另一个数组中所有点之间的最小欧几里德距离.阵列的大小不一定相同.例如:
xy1=numpy.array(
[[ 243, 3173],
[ 525, 2997]])
xy2=numpy.array(
[[ 682, 2644],
[ 277, 2651],
[ 396, 2640]])
Run Code Online (Sandbox Code Playgroud)
我目前的方法遍历每个坐标xy
的xy1
计算之间的协调距离和其他坐标.
mindist=numpy.zeros(len(xy1))
minid=numpy.zeros(len(xy1))
for i,xy in enumerate(xy1):
dists=numpy.sqrt(numpy.sum((xy-xy2)**2,axis=1))
mindist[i],minid[i]=dists.min(),dists.argmin()
Run Code Online (Sandbox Code Playgroud)
有没有办法消除for循环,并以某种方式在两个数组之间进行逐元素计算?我设想生成一个距离矩阵,我可以在其中找到每行或每列中的最小元素.
另一种看待问题的方法.假设我将xy1
(长度为m)和xy2
(长度为p)连接成xy
(长度为n),并存储原始数组的长度.从理论上讲,我应该能够从那些我可以获取mxp子矩阵的坐标生成一个nxn距离矩阵.有没有办法有效地生成这个子矩阵?
den*_*nis 42
(几个月后)
scipy.spatial.distance.cdist( X, Y )
给出了所有距离对,X和Y 2暗淡,3暗淡......
它还有22种不同的规范,详见
此处.
# cdist example: (nx,dim) (ny,dim) -> (nx,ny)
from __future__ import division
import sys
import numpy as np
from scipy.spatial.distance import cdist
#...............................................................................
dim = 10
nx = 1000
ny = 100
metric = "euclidean"
seed = 1
# change these params in sh or ipython: run this.py dim=3 ...
for arg in sys.argv[1:]:
exec( arg )
np.random.seed(seed)
np.set_printoptions( 2, threshold=100, edgeitems=10, suppress=True )
title = "%s dim %d nx %d ny %d metric %s" % (
__file__, dim, nx, ny, metric )
print "\n", title
#...............................................................................
X = np.random.uniform( 0, 1, size=(nx,dim) )
Y = np.random.uniform( 0, 1, size=(ny,dim) )
dist = cdist( X, Y, metric=metric ) # -> (nx, ny) distances
#...............................................................................
print "scipy.spatial.distance.cdist: X %s Y %s -> %s" % (
X.shape, Y.shape, dist.shape )
print "dist average %.3g +- %.2g" % (dist.mean(), dist.std())
print "check: dist[0,3] %.3g == cdist( [X[0]], [Y[3]] ) %.3g" % (
dist[0,3], cdist( [X[0]], [Y[3]] ))
# (trivia: how do pairwise distances between uniform-random points in the unit cube
# depend on the metric ? With the right scaling, not much at all:
# L1 / dim ~ .33 +- .2/sqrt dim
# L2 / sqrt dim ~ .4 +- .2/sqrt dim
# Lmax / 2 ~ .4 +- .2/sqrt dim
Run Code Online (Sandbox Code Playgroud)
Ale*_*lli 25
要计算m乘以距离的矩阵,这应该工作:
>>> def distances(xy1, xy2):
... d0 = numpy.subtract.outer(xy1[:,0], xy2[:,0])
... d1 = numpy.subtract.outer(xy1[:,1], xy2[:,1])
... return numpy.hypot(d0, d1)
Run Code Online (Sandbox Code Playgroud)
所述.outer
呼叫使两个这样的矩阵(沿两个轴的标量的差异)时,.hypot
呼叫接通那些成相同形状的矩阵(标量欧几里德距离的).
接受的答案并没有完全解决这个问题,该问题要求找到两组点之间的最小距离,而不是两组中每个点之间的距离.
尽管原始问题的直接解决方案确实包括计算每对之间的距离并且随后找到最小的距离,但如果只对最小距离感兴趣,则这不是必需的.对于后一个问题,存在更快的解决方案.
所有提出的解决方案都有一个可扩展的运行时间m*p = len(xy1)*len(xy2)
.这对于小型数据集来说是可以的,但是可以编写一个可以扩展的最佳解决方案m*log(p)
,从而为大型xy2
数据集节省大量资金.
可以使用scipy.spatial.cKDTree实现此最佳执行时间缩放,如下所示
import numpy as np
from scipy import spatial
xy1 = np.array(
[[243, 3173],
[525, 2997]])
xy2 = np.array(
[[682, 2644],
[277, 2651],
[396, 2640]])
# This solution is optimal when xy2 is very large
tree = spatial.cKDTree(xy2)
mindist, minid = tree.query(xy1)
print(mindist)
# This solution by @denis is OK for small xy2
mindist = np.min(spatial.distance.cdist(xy1, xy2), axis=1)
print(mindist)
Run Code Online (Sandbox Code Playgroud)
mindist
每个点xy1
与点集之间的最小距离在哪里xy2
对于您要执行的操作:
dists = numpy.sqrt((xy1[:, 0, numpy.newaxis] - xy2[:, 0])**2 + (xy1[:, 1, numpy.newaxis - xy2[:, 1])**2)
mindist = numpy.min(dists, axis=1)
minid = numpy.argmin(dists, axis=1)
Run Code Online (Sandbox Code Playgroud)
编辑:而不是调用的sqrt
,做广场等,你可以使用numpy.hypot
:
dists = numpy.hypot(xy1[:, 0, numpy.newaxis]-xy2[:, 0], xy1[:, 1, numpy.newaxis]-xy2[:, 1])
Run Code Online (Sandbox Code Playgroud)
小智 5
import numpy as np
P = np.add.outer(np.sum(xy1**2, axis=1), np.sum(xy2**2, axis=1))
N = np.dot(xy1, xy2.T)
dists = np.sqrt(P - 2*N)
Run Code Online (Sandbox Code Playgroud)