use*_*640 4 python loops numpy
我希望优化一些由两个嵌套循环组成的python代码.我对numpy并不那么熟悉,但据我所知,它应该能够帮助我提高这项任务的效率.下面是我编写的测试代码,它反映了实际代码中发生的情况.目前使用numpy范围和迭代器比通常的python更慢.我究竟做错了什么?这个问题的最佳解决方案是什么?
谢谢你的帮助!
import numpy
import time
# setup a problem analagous to that in the real code
npoints_per_plane = 1000
nplanes = 64
naxis = 1000
npoints3d = naxis + npoints_per_plane * nplanes
npoints = naxis + npoints_per_plane
specres = 1000
# this is where the data is being mapped to
sol = dict()
sol["ems"] = numpy.zeros(npoints3d)
sol["abs"] = numpy.zeros(npoints3d)
# this would normally be non-random input data
data = dict()
data["ems"] = numpy.zeros((npoints,specres))
data["abs"] = numpy.zeros((npoints,specres))
for ip in range(npoints):
data["ems"][ip,:] = numpy.random.random(specres)[:]
data["abs"][ip,:] = numpy.random.random(specres)[:]
ems_mod = numpy.random.random(1)[0]
abs_mod = numpy.random.random(1)[0]
ispec = numpy.random.randint(specres)
# this the code I want to optimize
t0 = time.time()
# usual python range and iterator
for ip in range(npoints_per_plane):
jp = naxis + ip
for ipl in range(nplanes):
ip3d = jp + npoints_per_plane * ipl
sol["ems"][ip3d] = data["ems"][jp,ispec] * ems_mod
sol["abs"][ip3d] = data["abs"][jp,ispec] * abs_mod
t1 = time.time()
# numpy ranges and iterator
ip_vals = numpy.arange(npoints_per_plane)
ipl_vals = numpy.arange(nplanes)
for ip in numpy.nditer(ip_vals):
jp = naxis + ip
for ipl in numpy.nditer(ipl_vals):
ip3d = jp + npoints_per_plane * ipl
sol["ems"][ip3d] = data["ems"][jp,ispec] * ems_mod
sol["abs"][ip3d] = data["abs"][jp,ispec] * abs_mod
t2 = time.time()
print "plain python: %0.3f seconds" % ( t1 - t0 )
print "numpy: %0.3f seconds" % ( t2 - t1 )
Run Code Online (Sandbox Code Playgroud)
编辑:将"jp = naxis + ip"放在第一个for循环中
附加说明:
我弄清楚如何快速做内部循环,但不是外循环:
# numpy vectorization
for ip in xrange(npoints_per_plane):
jp = naxis + ip
sol["ems"][jp:jp+npoints_per_plane*nplanes:npoints_per_plane] = data["ems"][jp,ispec] * ems_mod
sol["abs"][jp:jp+npoints_per_plane*nplanes:npoints_per_plane] = data["abs"][jp,ispec] * abs_mod
Run Code Online (Sandbox Code Playgroud)
Joe的解决方案显示了如何一起做两个,谢谢!
在numpy中编写循环的最佳方法不是编写循环而是使用矢量化操作.例如:
c = 0
for i in range(len(a)):
c += a[i] + b[i]
Run Code Online (Sandbox Code Playgroud)
变
c = np.sum(a + b, axis=0)
Run Code Online (Sandbox Code Playgroud)
对于a和b具有这种形状,(100000, 100)在第一变体中需要0.344秒,在第二变体中需要0.062秒.
在您的问题中提供的案例中,以下内容符合您的要求:
sol['ems'][naxis:] = numpy.ravel(
numpy.repeat(
data['ems'][naxis:,ispec,numpy.newaxis] * ems_mod,
nplanes,
axis=1
),
order='F'
)
Run Code Online (Sandbox Code Playgroud)
这可以通过一些技巧进一步优化,但这会降低清晰度,可能是过早的优化,因为:
普通python:0.064秒
numpy:0.002秒
解决方案的工作原理如下:
您的原始版本包含jp = naxis + ip仅跳过第一个naxis元素[naxis:]选择除第一个naxis元素之外的所有元素.您的内环重复的值data[jp,ispec]进行nplanes倍,并将其写入到多个位置ip3d = jp + npoints_per_plane * ipl,其相当于一个扁平的2D阵列由偏移naxis.因此,将第二维添加numpy.newaxis到(先前的1D)data['ems'][naxis:, ispec],nplanes沿着该新维度重复该值numpy.repeat.然后通过numpy.ravel(以Fortran顺序,即最低轴具有最小步幅)将所得到的2D阵列再次平坦化并写入适当的子阵列sol['ems'].如果目标阵列实际上是2D,则可以使用自动阵列广播跳过重复.
如果遇到无法避免使用循环的情况,可以使用Cython(它支持numpy数组上的高效缓冲区视图).