mic*_*den 5 python arrays algorithm optimization numpy
我有一个a长度为numpy的数组n,其数字0通过n-1某种方式改组.我也有一个mask长度为<= 的numpy数组n,包含一些a不同顺序的元素子集.
我想要计算的查询是"给我的元素a也mask按照它们出现的顺序".
我在这里有一个类似的问题,但区别在于它mask是一个布尔掩码而不是单个元素上的掩码.
我已经概述并测试了以下4种方法:
import timeit
import numpy as np
import matplotlib.pyplot as plt
n_test = 100
n_coverages = 10
np.random.seed(0)
def method1():
return np.array([x for x in a if x in mask])
def method2():
s = set(mask)
return np.array([x for x in a if x in s])
def method3():
return a[np.in1d(a, mask, assume_unique=True)]
def method4():
bmask = np.full((n_samples,), False)
bmask[mask] = True
return a[bmask[a]]
methods = [
('naive membership', method1),
('python set', method2),
('in1d', method3),
('binary mask', method4)
]
p_space = np.linspace(0, 1, n_coverages)
for n_samples in [1000]:
a = np.arange(n_samples)
np.random.shuffle(a)
for label, method in methods:
if method == method1 and n_samples == 10000:
continue
times = []
for coverage in p_space:
mask = np.random.choice(a, size=int(n_samples * coverage), replace=False)
time = timeit.timeit(method, number=n_test)
times.append(time * 1e3)
plt.plot(p_space, times, label=label)
plt.xlabel(r'Coverage ($\frac{|\mathrm{mask}|}{|\mathrm{a}|}$)')
plt.ylabel('Time (ms)')
plt.title('Comparison of 1-D Intersection Methods for $n = {}$ samples'.format(n_samples))
plt.legend()
plt.show()
Run Code Online (Sandbox Code Playgroud)
其中产生了以下结果:
因此,毫无疑问,二元掩模是任何尺寸掩模的最快方法.
我的问题是,有更快的方法吗?
假设a是更大的那个。
def with_searchsorted(a, b):
sb = b.argsort()
bs = b[sb]
sa = a.argsort()
ia = np.arange(len(a))
ra = np.empty_like(sa)
ra[sa] = ia
ac = bs.searchsorted(ia) % b.size
return a[(bs[ac] == ia)[ra]]
Run Code Online (Sandbox Code Playgroud)
演示
a = np.arange(10)
np.random.shuffle(a)
b = np.random.choice(a, 5, False)
print(a)
print(b)
[7 2 9 3 0 4 8 5 6 1]
[0 8 5 4 6]
print(with_searchsorted(a, b))
[0 4 8 5 6]
Run Code Online (Sandbox Code Playgroud)
怎么运行的
# sort b for faster searchsorting
sb = b.argsort()
bs = b[sb]
# sort a for faster searchsorting
sa = a.argsort()
# this is the sorted a... we just cheat because we know what it will be
ia = np.arange(len(a))
# construct the reverse sort look up
ra = np.empty_like(sa)
ra[sa] = ia
# perform searchsort
ac = bs.searchsorted(ia) % b.size
return a[(bs[ac] == ia)[ra]]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
581 次 |
| 最近记录: |