Eri*_* So 2 python arrays numpy
我有一个Numpy一维数组1和0.例如
a = np.array([0,1,1,1,0,0,0,0,0,0,0,1,0,1,1,0,0,0,1,1,0,0])
Run Code Online (Sandbox Code Playgroud)
我想计算数组中连续的0和1,并输出这样的东西
[1,3,7,1,1,2,3,2,2]
Run Code Online (Sandbox Code Playgroud)
我做什么是atm
np.diff(np.where(np.abs(np.diff(a)) == 1)[0])
Run Code Online (Sandbox Code Playgroud)
它输出
array([3, 7, 1, 1, 2, 3, 2])
Run Code Online (Sandbox Code Playgroud)
你可以看到它缺少第一个计数1.
我已经尝试过np.split,然后得到每个细分的大小,但似乎并不乐观.
有更优雅的"pythonic"解决方案吗?
这是一种矢量化方法 -
np.diff(np.r_[0,np.flatnonzero(np.diff(a))+1,a.size])
Run Code Online (Sandbox Code Playgroud)
样品运行 -
In [208]: a = np.array([0,1,1,1,0,0,0,0,0,0,0,1,0,1,1,0,0,0,1,1,0,0])
In [209]: np.diff(np.r_[0,np.flatnonzero(np.diff(a))+1,a.size])
Out[209]: array([1, 3, 7, 1, 1, 2, 3, 2, 2])
Run Code Online (Sandbox Code Playgroud)
boolean连接速度更快-
np.diff(np.flatnonzero(np.concatenate(([True], a[1:]!= a[:-1], [True] ))))
Run Code Online (Sandbox Code Playgroud)
运行时测试
对于设置,让我们创建一个更大的数据集,其中岛屿为0s和,1s并且对于公平的基准测试,与给定的样本一样,让岛长度在1和之间变化7-
In [257]: n = 100000 # thus would create 100000 pair of islands
In [258]: a = np.repeat(np.arange(n)%2, np.random.randint(1,7,(n)))
# Approach #1 proposed in this post
In [259]: %timeit np.diff(np.r_[0,np.flatnonzero(np.diff(a))+1,a.size])
100 loops, best of 3: 2.13 ms per loop
# Approach #2 proposed in this post
In [260]: %timeit np.diff(np.flatnonzero(np.concatenate(([True], a[1:]!= a[:-1], [True] ))))
1000 loops, best of 3: 1.21 ms per loop
# @Vineet Jain's soln
In [261]: %timeit [ sum(1 for i in g) for k,g in groupby(a)]
10 loops, best of 3: 61.3 ms per loop
Run Code Online (Sandbox Code Playgroud)
使用groupby来自itertools
from itertools import groupby
a = np.array([0,1,1,1,0,0,0,0,0,0,0,1,0,1,1,0,0,0,1,1,0,0])
grouped_a = [ sum(1 for i in g) for k,g in groupby(a)]
Run Code Online (Sandbox Code Playgroud)