1如何找到以下 numpy 数组的每一行中连续 s (或任何其他值)的数量?我需要一个纯 numpy 解决方案。
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
[0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])
Run Code Online (Sandbox Code Playgroud)
我的问题有两个部分,第一部分:1连续的 s 的最大数量是多少?应该
array([2,3,2])
Run Code Online (Sandbox Code Playgroud)
在示例情况中。
1其次,连续的第一组多个连续 s 的开始索引是多少?对于示例情况,这将是
array([3,9,9])
Run Code Online (Sandbox Code Playgroud)
在这个例子中,我将 2 个连续的1s 放在一行中。但应该可以将其更改为1连续 5 个连续的 s,这很重要。
使用 回答了类似的问题,np.unique但它仅适用于一行,不适用于多行数组,因为结果将具有不同的长度。
这是基于的矢量化方法differentiation-
import numpy as np
import pandas as pd
# Append zeros columns at either sides of counts
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))
# Get start and stop indices with 1s as triggers
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)
# Get intervals using differences between start and stop indices
start_stop = np.column_stack((starts[:,0], stops[:,1] - starts[:,1]))
# Get indices corresponding to max. interval lens and thus lens themselves
SS_df = pd.DataFrame(start_stop)
out = start_stop[SS_df.groupby([0],sort=False)[1].idxmax(),1]
Run Code Online (Sandbox Code Playgroud)
样本输入、输出 -
原始样例:
In [574]: counts
Out[574]:
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
[0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])
In [575]: out
Out[575]: array([2, 3, 2], dtype=int64)
Run Code Online (Sandbox Code Playgroud)
修改案例:
In [577]: counts
Out[577]:
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 1, 1, 1, 1],
[0, 0, 0, 4, 1, 1, 1, 1, 1, 0, 1, 0]])
In [578]: out
Out[578]: array([2, 4, 5], dtype=int64)
Run Code Online (Sandbox Code Playgroud)
这是一个纯 NumPy 版本,在开始、停止之前与之前的版本相同。这是完整的实现 -
# Append zeros columns at either sides of counts
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))
# Get start and stop indices with 1s as triggers
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)
# Get intervals using differences between start and stop indices
intvs = stops[:,1] - starts[:,1]
# Store intervals as a 2D array for further vectorized ops to make.
c = np.bincount(starts[:,0])
mask = np.arange(c.max()) < c[:,None]
intvs2D = mask.astype(float)
intvs2D[mask] = intvs
# Get max along each row as final output
out = intvs2D.max(1)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4090 次 |
| 最近记录: |