use*_*tar 5 python memory numpy
我有一个关于如何尽可能快地计算numpy距离的问题,
def getR1(VVm,VVs,HHm,HHs):
t0=time.time()
R=VVs.flatten()[numpy.newaxis,:]-VVm.flatten()[:,numpy.newaxis]
R*=R
R1=HHs.flatten()[numpy.newaxis,:]-HHm.flatten()[:,numpy.newaxis]
R1*=R1
R+=R1
del R1
print "R1\t",time.time()-t0, R.shape, #11.7576191425 (108225, 10500)
print numpy.max(R) #4176.26290975
# uses 17.5Gb ram
return R
def getR2(VVm,VVs,HHm,HHs):
t0=time.time()
precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
deltas = precomputed_flat[None,:,:] - measured_flat[:, None, :]
#print time.time()-t0, deltas.shape # 5.861109972 (108225, 10500, 2)
R = numpy.einsum('ijk,ijk->ij', deltas, deltas)
print "R2\t",time.time()-t0,R.shape, #14.5291359425 (108225, 10500)
print numpy.max(R) #4176.26290975
# uses 26Gb ram
return R
def getR3(VVm,VVs,HHm,HHs):
from numpy.core.umath_tests import inner1d
t0=time.time()
precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
deltas = precomputed_flat[None,:,:] - measured_flat[:, None, :]
#print time.time()-t0, deltas.shape # 5.861109972 (108225, 10500, 2)
R = inner1d(deltas, deltas)
print "R3\t",time.time()-t0, R.shape, #12.6972110271 (108225, 10500)
print numpy.max(R) #4176.26290975
#Uses 26Gb
return R
def getR4(VVm,VVs,HHm,HHs):
from scipy.spatial.distance import cdist
t0=time.time()
precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
R=spdist.cdist(precomputed_flat,measured_flat, 'sqeuclidean') #.T
print "R4\t",time.time()-t0, R.shape, #17.7022118568 (108225, 10500)
print numpy.max(R) #4176.26290975
# uses 9 Gb ram
return R
def getR5(VVm,VVs,HHm,HHs):
from scipy.spatial.distance import cdist
t0=time.time()
precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
R=spdist.cdist(precomputed_flat,measured_flat, 'euclidean') #.T
print "R5\t",time.time()-t0, R.shape, #15.6070930958 (108225, 10500)
print numpy.max(R) #64.6240118667
# uses only 9 Gb ram
return R
def getR6(VVm,VVs,HHm,HHs):
from scipy.weave import blitz
t0=time.time()
R=VVs.flatten()[numpy.newaxis,:]-VVm.flatten()[:,numpy.newaxis]
blitz("R=R*R") # R*=R
R1=HHs.flatten()[numpy.newaxis,:]-HHm.flatten()[:,numpy.newaxis]
blitz("R1=R1*R1") # R1*=R1
blitz("R=R+R1") # R+=R1
del R1
print "R6\t",time.time()-t0, R.shape, #11.7576191425 (108225, 10500)
print numpy.max(R) #4176.26290975
return R
Run Code Online (Sandbox Code Playgroud)
导致以下时间:
R1 11.7737319469 (108225, 10500) 4909.66881791
R2 15.1279799938 (108225, 10500) 4909.66881791
R3 12.7408981323 (108225, 10500) 4909.66881791
R4 17.3336868286 (10500, 108225) 4909.66881791
R5 15.7530870438 (10500, 108225) 70.0690289494
R6 11.670968771 (108225, 10500) 4909.66881791
Run Code Online (Sandbox Code Playgroud)
虽然最后一个给出sqrt((VVm-VVs)^ 2 +(HHm-HHs)^ 2),而其他给出(VVm-VVs)^ 2 +(HHm-HHs)^ 2,这不是很重要,因为在我的代码中另外进一步,我为每个i取最小值R [i,:],并且sqrt不会影响最小值,(如果我对距离感兴趣,我只需要取sqrt(值),而不是在整个阵列上执行sqrt,因此实际上没有时间差异.
问题仍然存在:为什么第一个解决方案是最好的,(第二个和第三个解决方案速度较慢的原因是因为增量= ...需要5.8秒,(这也是为什么这两个方法需要26Gb)),为什么sqeuclidean比欧几里德慢吗?
sqeuclidean应该做(VVm-VVs)^ 2 +(HHm-HHs)^ 2,而我认为它做了不同的事情.任何人都知道如何找到该方法的源代码(C或底部的任何内容)?我认为它确实是sqrt((VVm-VVs)^ 2 +(HHm-HHs)^ 2)^ 2(唯一的原因我能想到为什么它会慢于(VVm-VVs)^ 2 +(HHm-HHs) ^ 2 - 我知道这是一个愚蠢的原因,任何人都有一个更合乎逻辑的原因?)
既然我对C一无所知,我怎么用scipy.weave来内联呢?这个代码是否可以编译,就像你使用python一样?或者我需要安装特殊的东西吗?
编辑:好吧,我尝试使用scipy.weave.blitz,(R6方法),这稍微快一些,但我认为有人比我知道更多的C仍然可以提高这个速度?我只是采用了形式为+ = b或*=的行,然后查看了它们在C中的含义,并将它们放在闪电战语句中,但我想如果我将行与flatten和newaxis放在一起C也是,它应该更快,但我不知道我怎么能这样做(知道C的人可能会解释?).现在,闪电战和我的第一种方法之间的差异不足以真正由C vs numpy引起我猜?
我猜其他方法,比如deltas = ...也可以更快,当我把它放在C?
每当你有乘法和求和时,尝试使用其中一个点积函数或np.einsum.由于您要预先分配数组,而不是为水平和垂直坐标设置不同的数组,因此将它们堆叠在一起:
precomputed_flat = np.column_stack((svf.flatten(), shf.flatten()))
measured_flat = np.column_stack((VVmeasured.flatten(), HHmeasured.flatten()))
deltas = precomputed_flat - measured_flat[:, None, :]
Run Code Online (Sandbox Code Playgroud)
从这里开始,最简单的是:
dist = np.einsum('ijk,ijk->ij', deltas, deltas)
Run Code Online (Sandbox Code Playgroud)
您也可以尝试以下方法:
from numpy.core.umath_tests import inner1d
dist = inner1d(deltas, deltas)
Run Code Online (Sandbox Code Playgroud)
当然还有SciPy的空间模块cdist:
from scipy.spatial.distance import cdist
dist = cdist(precomputed_flat, measured_flat, 'euclidean')
Run Code Online (Sandbox Code Playgroud)
编辑 我无法在如此大的数据集上运行测试,但这些时间相当有启发性:
len_a, len_b = 10000, 1000
a = np.random.rand(2, len_a)
b = np.random.rand(2, len_b)
c = np.random.rand(len_a, 2)
d = np.random.rand(len_b, 2)
In [3]: %timeit a[:, None, :] - b[..., None]
10 loops, best of 3: 76.7 ms per loop
In [4]: %timeit c[:, None, :] - d
1 loops, best of 3: 221 ms per loop
Run Code Online (Sandbox Code Playgroud)
对于上面较小的数据集,通过在内存中以不同方式排列数据,我可以稍微加快您的方法并与之scipy.spatial.distance.cdist匹配inner1d:
precomputed_flat = np.vstack((svf.flatten(), shf.flatten()))
measured_flat = np.vstack((VVmeasured.flatten(), HHmeasured.flatten()))
deltas = precomputed_flat[:, None, :] - measured_flat
import scipy.spatial.distance as spdist
from numpy.core.umath_tests import inner1d
In [13]: %timeit r0 = a[0, None, :] - b[0, :, None]; r1 = a[1, None, :] - b[1, :, None]; r0 *= r0; r1 *= r1; r0 += r1
10 loops, best of 3: 146 ms per loop
In [14]: %timeit deltas = (a[:, None, :] - b[..., None]).T; inner1d(deltas, deltas)
10 loops, best of 3: 145 ms per loop
In [15]: %timeit spdist.cdist(a.T, b.T)
10 loops, best of 3: 124 ms per loop
In [16]: %timeit deltas = a[:, None, :] - b[..., None]; np.einsum('ijk,ijk->jk', deltas, deltas)
10 loops, best of 3: 163 ms per loop
Run Code Online (Sandbox Code Playgroud)