Mid*_*ter 4 python numpy shuffle
我正在使用numpy.random.shuffle以计算二维数组的随机列的统计数据。Python代码如下:
import numpy as np
def timeline_sample(series, num):
random = series.copy()
for i in range(num):
np.random.shuffle(random.T)
yield random
Run Code Online (Sandbox Code Playgroud)
我得到的速度是这样的:
import numpy as np
arr = np.random.sample((50, 5000))
Run Code Online (Sandbox Code Playgroud)
%%timeit
for series in timeline_sample(rnd, 100):
np.sum(series)
Run Code Online (Sandbox Code Playgroud)
1 个循环,最好的 3 个:每个循环 391 毫秒
我试图对这个函数进行 Cythonize,但我不确定如何替换调用,np.random.shuffle并且该函数慢了 3 倍。有谁知道如何加速或替换它?它目前是我程序中的瓶颈。
赛通代码:
1 loops, best of 3: 391 ms per loop
这很可能会带来不错的速度提升:
from timeit import Timer
import numpy as np
arr = np.random.sample((50, 5000))
def timeline_sample(series, num):
random = series.copy()
for i in range(num):
np.random.shuffle(random.T)
yield random
def timeline_sample_fast(series, num):
random = series.T.copy()
for i in range(num):
np.random.shuffle(random)
yield random.T
def timeline_sample_faster(series, num):
length = arr.shape[1]
for i in range(num):
yield series[:, np.random.permutation(length)]
def consume(iterable):
for s in iterable:
np.sum(s)
min(Timer(lambda: consume(timeline_sample(arr, 1))).repeat(10, 10))
min(Timer(lambda: consume(timeline_sample_fast(arr, 1))).repeat(10, 10))
min(Timer(lambda: consume(timeline_sample_faster(arr, 1))).repeat(10, 10))
#>>> 0.2585161680035526
#>>> 0.2416607110062614
#>>> 0.04835709399776533
Run Code Online (Sandbox Code Playgroud)
强制它是连续的确实会增加时间,但不会增加一吨:
def consume(iterable):
for s in iterable:
np.sum(np.ascontiguousarray(s))
min(Timer(lambda: consume(timeline_sample(arr, 1))).repeat(10, 10))
min(Timer(lambda: consume(timeline_sample_fast(arr, 1))).repeat(10, 10))
min(Timer(lambda: consume(timeline_sample_faster(arr, 1))).repeat(10, 10))
#>>> 0.2632228760048747
#>>> 0.25778737501241267
#>>> 0.07451769898761995
Run Code Online (Sandbox Code Playgroud)