the*_*att 51 python performance numpy
我有一个非常大的NumPy数组
1 40 3
4 50 4
5 60 7
5 49 6
6 70 8
8 80 9
8 72 1
9 90 7
....
Run Code Online (Sandbox Code Playgroud)
我想检查数组的第一列中是否存在值.我有一堆本土方式(例如遍历每一行并检查),但考虑到数组的大小,我想找到最有效的方法.
谢谢!
agf*_*agf 62
怎么样
if value in my_array[:, col_num]:
do_whatever
Run Code Online (Sandbox Code Playgroud)
编辑:我认为__contains__这是以与@ detly的版本相同的方式实现的
det*_*tly 37
对我来说最明显的是:
np.any(my_array[:, 0] == value)
Run Code Online (Sandbox Code Playgroud)
HYR*_*YRY 33
要检查多个值,可以使用numpy.in1d(),这是python关键字的元素功能版本.如果您的数据已排序,则可以使用numpy.searchsorted():
import numpy as np
data = np.array([1,4,5,5,6,8,8,9])
values = [2,3,4,6,7]
print np.in1d(values, data)
index = np.searchsorted(data, values)
print data[index] == values
Run Code Online (Sandbox Code Playgroud)
小智 13
迷人.我需要提高一系列循环的速度,这些循环必须以同样的方式执行匹配的索引确定.所以我决定在这里解决所有的解决方案,以及一些riff.
以下是我对Python 2.7.10的速度测试:
import timeit
timeit.timeit('N.any(N.in1d(sids, val))', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')
Run Code Online (Sandbox Code Playgroud)
18.86137104034424
timeit.timeit('val in sids', setup = 'import numpy as N; val = 20010401020091; sids = [20010401010101+x for x in range(1000)]')
Run Code Online (Sandbox Code Playgroud)
15.061666011810303
timeit.timeit('N.in1d(sids, val)', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')
Run Code Online (Sandbox Code Playgroud)
11.613027095794678
timeit.timeit('N.any(val == sids)', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')
Run Code Online (Sandbox Code Playgroud)
7.670552015304565
timeit.timeit('val in sids', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')
Run Code Online (Sandbox Code Playgroud)
5.610057830810547
timeit.timeit('val == sids', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')
Run Code Online (Sandbox Code Playgroud)
1.6632978916168213
timeit.timeit('val in sids', setup = 'import numpy as N; val = 20010401020091; sids = set([20010401010101+x for x in range(1000)])')
Run Code Online (Sandbox Code Playgroud)
0.0548710823059082
timeit.timeit('val in sids', setup = 'import numpy as N; val = 20010401020091; sids = dict(zip([20010401010101+x for x in range(1000)],[True,]*1000))')
Run Code Online (Sandbox Code Playgroud)
0.054754018783569336
非常令人惊讶!数量级差异的订单!
总而言之,如果您只是想知道某个列表中是否存在某些内容:
如果你想知道列表中的某些内容(顺序很重要):
| 归档时间: |
|
| 查看次数: |
115680 次 |
| 最近记录: |