mag*_*gu_ 10 performance numpy python-3.x
NumPy 数组非常适合性能和易用性(比列表更容易切片,索引).
我试图建立一个数据容器出来的NumPy structured array,而不是dict的NumPy arrays.问题是性能要差得多.使用同类数据约为2.5倍,异构数据约为32倍(我在谈论NumPy数据类型).
有没有办法加快结构化阵列的速度?我尝试将记忆顺序从'c'更改为'f',但这没有任何影响.
这是我的分析代码:
import time
import numpy as np
NP_SIZE = 100000
N_REP = 100
np_homo = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.double)], order='c')
np_hetro = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.int32)], order='c')
dict_homo = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE)}
dict_hetro = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE, np.int32)}
t0 = time.time()
for i in range(N_REP):
np_homo['a'] += i
t1 = time.time()
for i in range(N_REP):
np_hetro['a'] += i
t2 = time.time()
for i in range(N_REP):
dict_homo['a'] += i
t3 = time.time()
for i in range(N_REP):
dict_hetro['a'] += i
t4 = time.time()
print('Homogeneous Numpy struct array took {:.4f}s'.format(t1 - t0))
print('Hetoregeneous Numpy struct array took {:.4f}s'.format(t2 - t1))
print('Homogeneous Dict of numpy arrays took {:.4f}s'.format(t3 - t2))
print('Hetoregeneous Dict of numpy arrays took {:.4f}s'.format(t4 - t3))
Run Code Online (Sandbox Code Playgroud)
编辑:忘记我的时间数字:
Homogenious Numpy struct array took 0.0101s
Hetoregenious Numpy struct array took 0.1367s
Homogenious Dict of numpy arrays took 0.0042s
Hetoregenious Dict of numpy arrays took 0.0042s
Run Code Online (Sandbox Code Playgroud)
Edit2:我在timit模块中添加了一些额外的测试用例:
import numpy as np
import timeit
NP_SIZE = 1000000
def time(data, txt, n_rep=1000):
def intern():
data['a'] += 1
time = timeit.timeit(intern, number=n_rep)
print('{} {:.4f}'.format(txt, time))
np_homo = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.double)], order='c')
np_hetro = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.int32)], order='c')
dict_homo = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE)}
dict_hetro = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE, np.int32)}
time(np_homo, 'Homogeneous Numpy struct array')
time(np_hetro, 'Hetoregeneous Numpy struct array')
time(dict_homo, 'Homogeneous Dict of numpy arrays')
time(dict_hetro, 'Hetoregeneous Dict of numpy arrays')
Run Code Online (Sandbox Code Playgroud)
结果是:
Homogeneous Numpy struct array 0.7989
Hetoregeneous Numpy struct array 13.5253
Homogeneous Dict of numpy arrays 0.3750
Hetoregeneous Dict of numpy arrays 0.3744
Run Code Online (Sandbox Code Playgroud)
运行之间的比率似乎相当稳定.使用两种方法和不同大小的数组.
对于offcase它很重要:python:3.4 NumPy:1.9.2
在我的快速计时测试中,差异并没有那么大:
\n\nIn [717]: dict_homo = {'a': np.zeros(10000), 'b': np.zeros(10000)}\nIn [718]: timeit dict_homo['a']+=1\n10000 loops, best of 3: 25.9 \xc2\xb5s per loop\nIn [719]: np_homo = np.zeros(10000, dtype=[('a', np.double), ('b', np.double)])\nIn [720]: timeit np_homo['a'] += 1\n10000 loops, best of 3: 29.3 \xc2\xb5s per loop\nRun Code Online (Sandbox Code Playgroud)\n\n在这种dict_homo情况下,数组嵌入到字典中这一事实只是一个小问题。像这样简单的字典访问速度很快,基本上与通过变量名访问数组相同。
+=所以第一种情况基本上是对一维数组的测试。
在结构化情况下,a和b值在数据缓冲区中交替,因此np_homo['a']“拉出”替代数字的视图也是如此。因此,速度会慢一些也就不足为奇了。
In [721]: np_homo\nOut[721]: \narray([(41111.0, 0.0), (41111.0, 0.0), (41111.0, 0.0), ..., (41111.0, 0.0),\n (41111.0, 0.0), (41111.0, 0.0)], \n dtype=[('a', '<f8'), ('b', '<f8')])\nRun Code Online (Sandbox Code Playgroud)\n\n二维数组也会交错列值。
\n\nIn [722]: np_twod=np.zeros((10000,2), np.double)\nIn [723]: timeit np_twod[:,0]+=1\n10000 loops, best of 3: 36.8 \xc2\xb5s per loop\nRun Code Online (Sandbox Code Playgroud)\n\n令人惊讶的是,它实际上比结构化情况慢一点。使用order='F'or (2,10000) 形状可以加快速度,但仍然不如结构化情况。
这些都是小测试时间,所以我不会做出宏大的主张。但结构化数组不会回头。
\n\n另一次测试,每一步都初始化数组或字典
\n\nIn [730]: %%timeit np.twod=np.zeros((10000,2), np.double)\nnp.twod[:,0] += 1\n .....: \n10000 loops, best of 3: 36.7 \xc2\xb5s per loop\nIn [731]: %%timeit np_homo = np.zeros(10000, dtype=[('a', np.double), ('b', np.double)])\nnp_homo['a'] += 1\n .....: \n10000 loops, best of 3: 38.3 \xc2\xb5s per loop\nIn [732]: %%timeit dict_homo = {'a': np.zeros(10000), 'b': np.zeros(10000)}\ndict_homo['a'] += 1\n .....: \n10000 loops, best of 3: 25.4 \xc2\xb5s per loop\nRun Code Online (Sandbox Code Playgroud)\n\n2d 和结构化更接近,对于字典 (1d) 情况有更好的性能。我np.ones也尝试过这个,因为np.zeros可以延迟分配,但行为上没有区别。