sau*_*sau 5 python numpy vectorization
我是python的新手.我正在使用dbscan代码进行聚类以进行一些更改.现在代码运行正常,但速度非常慢.所以我发现我必须从我的代码中删除'for loop'.这是代码的一部分:
class Point:
def __init__(self, x = 0, y = 0, visited = False, isnoise = False):
self.x = x
self.y = y
self.visited = False
self.isnoise = False
def show(self):
return self.x, self.y
def dist(self, p1, p2):
#Calculate the great circle distance between two points on the earth (specified in decimal degrees)return distance between two point
# convert decimal degrees to radians
dlat = radians(p2.x-p1.x)
dlon = radians(p2.y-p1.y)
a = sin(dlat/2) * sin(dlat/2) + cos(radians(p1.x))* cos(radians(p2.x)) * sin(dlon/2) * sin(dlon/2)
c = 2 * atan2(sqrt(a), sqrt(1-a))
d = 6371 * c
return d
def distanceQuery(self,neighbor_pts):
dista=[]
for i in range(len(neighbor_pts)):
for j in range(i+1,len(neighbor_pts)):
z=self.dist(neighbor_pts[i],neighbor_pts[j])
dista.append(z)
return max(dista)
Run Code Online (Sandbox Code Playgroud)
distanceQuery函数使用double for循环.有什么办法可以删除吗?我可以将这个双向量化为循环吗?由于这是聚类代码,因此需要附加一些步骤.在附加时,我已经读过numpy数组的工作方式与python列表不同.附加numpy数组是低效的.
所以这可以是矢量化.但是这里是代码的其他部分,在我检查某些条件之后发生追加.
def expandCluster(self, P, neighbor_points):
self.cluster[self.cluster_inx].append(P)
iterator = iter(neighbor_points)
while True:
try:
npoint_tmp = iterator.next()
except StopIteration:
# StopIteration exception is raised after last element
break
if (not npoint_tmp.visited):
#for each point P' in NeighborPts
npoint_tmp.visited = True
NeighborPts_ = self.regionQuery(npoint_tmp)
if (len(NeighborPts_) >= self.MinPts):
for j in range(len(NeighborPts_)):
neighbor_points.append(NeighborPts_[j])
if self.distanceQuery(neighbor_points)>0.10:
break
Run Code Online (Sandbox Code Playgroud)
现在,如果我也矢量化neighbor_points.我将不得不解决附加问题?所以每个点都会附加到neighbour_points然后它会成为一个distanceQuery.而这个过程也是迭代的一部分.所以这里有两个循环.我只想确保在numpy数组中追加不会效率低下
import numpy as np
def dist(p1, p2):
# Initially, p1.shape() == (n, 2) and p2.shape() == (m, 2)
# Now, p1.shape() == (1, n, 2) and p2.shape() == (m, 1, 2)
p1 = p1[np.newaxis, :, :]
p2 = p2[:, np.newaxis, :]
# get all the vectory things
from numpy import sin, cos, radians, sqrt, arctan2 as atan2
# do the same math as before, but use `p[..., 0]` instead of `p.x` etc
dlat = radians(p2[..., 0] - p1[..., 0])
dlon = radians(p2[..., 1] - p1[..., 1])
a = sin(dlat/2) * sin(dlat/2) + cos(p1[..., 0])*cos(p2[..., 0]) * sin(dlon/2) * sin(dlon/2)
c = 2 * atan2(sqrt(a), sqrt(1-a))
d = 6371 * c
return d
def distanceQuery(neighbor_pts):
return np.max(dist(neighbor_pts, neighbor_pts))
Run Code Online (Sandbox Code Playgroud)
例如:
>>> points = np.array([[0, 0], [45, 0], [45, 45], [90, 0]], dtype=float)
>>> dist(points, points)
array([[ 0. , 5003.77169901, 6272.52596983, 10007.54339801],
[ 5003.77169901, 0. , 2579.12525679, 5003.77169901],
[ 6272.52596983, 2579.12525679, 0. , 4347.69702221],
[ 10007.54339801, 5003.77169901, 4347.69702221, 0. ]])
>>> np.max(_)
10007.543398010286
Run Code Online (Sandbox Code Playgroud)
def dist_slow(p1, p2):
"""your function, adjusted to take an array instead of a `Point`"""
from math import radians, cos, sqrt, atan2
# compute the distance for all possible pairs
dlat = radians(p2[0]-p1[0])
dlon = radians(p2[1]-p1[1])
a = sin(dlat/2) * sin(dlat/2) + cos(radians(p1[0]))*cos(radians(p2[0])) * sin(dlon/2) * sin(dlon/2)
c = 2 * atan2(sqrt(a), sqrt(1-a))
d = 6371 * c
return d
def query_iter(p):
return max(dist_slow(p1, p2) for p1, p2 in itertools.combinations(p, 2))
def query_orig(p):
dista=[]
for i in range(len(p)):
for j in range(i + 1, len(p)):
z = dist_slow(p[i], p[j])
dista.append(z)
return max(dista)
def query_mine(p):
return dist(p, p).max()
Run Code Online (Sandbox Code Playgroud)
然后:
>>> points = np.random.rand(1000, 2)
>>> timeit query_orig(points)
1 loops, best of 3: 7.94 s per loop
>>> timeit query_iter(points)
1 loops, best of 3: 7.35 s per loop
>>> timeit query_mine(points)
10 loops, best of 3: 150 ms per loop
Run Code Online (Sandbox Code Playgroud)