在numpy数组中查找连续的

Question

在numpy数组中查找连续的

1如何找到以下 numpy 数组的每一行中连续 s （或任何其他值）的数量？我需要一个纯 numpy 解决方案。

array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
       [0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
       [0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])

Run Code Online (Sandbox Code Playgroud)

我的问题有两个部分，第一部分：1连续的 s 的最大数量是多少？应该

array([2,3,2])

Run Code Online (Sandbox Code Playgroud)

在示例情况中。

1其次，连续的第一组多个连续 s 的开始索引是多少？对于示例情况，这将是

array([3,9,9])

Run Code Online (Sandbox Code Playgroud)

在这个例子中，我将 2 个连续的1s 放在一行中。但应该可以将其更改为1连续 5 个连续的 s，这很重要。

使用回答了类似的问题，np.unique但它仅适用于一行，不适用于多行数组，因为结果将具有不同的长度。

Answer 1

Div*_*kar 6

这是基于的矢量化方法differentiation-

import numpy as np
import pandas  as pd

# Append zeros columns at either sides of counts
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))

# Get start and stop indices with 1s as triggers
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)

# Get intervals using differences between start and stop indices
start_stop = np.column_stack((starts[:,0], stops[:,1] - starts[:,1]))

# Get indices corresponding to max. interval lens and thus lens themselves
SS_df = pd.DataFrame(start_stop)
out = start_stop[SS_df.groupby([0],sort=False)[1].idxmax(),1]

Run Code Online (Sandbox Code Playgroud)

样本输入、输出 -

原始样例：

In [574]: counts
Out[574]: 
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
       [0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
       [0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])

In [575]: out
Out[575]: array([2, 3, 2], dtype=int64)

Run Code Online (Sandbox Code Playgroud)

修改案例：

In [577]: counts
Out[577]: 
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
   [0, 0, 1, 0, 0, 1, 2, 0, 1, 1, 1, 1],
   [0, 0, 0, 4, 1, 1, 1, 1, 1, 0, 1, 0]])

In [578]: out
Out[578]: array([2, 4, 5], dtype=int64)

Run Code Online (Sandbox Code Playgroud)

这是一个纯 NumPy 版本，在开始、停止之前与之前的版本相同。这是完整的实现 -

# Append zeros columns at either sides of counts
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))

# Get start and stop indices with 1s as triggers
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)

# Get intervals using differences between start and stop indices
intvs = stops[:,1] - starts[:,1]

# Store intervals as a 2D array for further vectorized ops to make.
c = np.bincount(starts[:,0])
mask = np.arange(c.max()) < c[:,None]
intvs2D = mask.astype(float)
intvs2D[mask] = intvs

# Get max along each row as final output
out = intvs2D.max(1)

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，8 月前
查看次数：	4090 次
最近记录：	4 年，3 月前