累积逻辑或箱内

piR*_*red 5 python numpy

问题

我想确定何时遇到了真实值,并为数组的其余部分维护该值...对于特定的bin。从NumPy的角度来看,它会像的组合numpy.logical_or.accumulatenumpy.logical_or.at

考虑中的真值a,中的箱数b和中的预期输出c
我已经使用0False1用于True再转换为bool以对准所述阵列的值。

a = np.array([0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0]).astype(bool)
b = np.array([0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 2, 3, 3, 0, 1, 2, 3])
# zeros       ?  ?  ?              ?  ?  ?           ?
# ones                 ?  ?  ?  ?                       ?
# twos                                      ?              ?
# threes                                       ?  ?           ?
c = np.array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1]).astype(bool)
#             ???????     ?           ?           ?        ?
#  zero bin no True yet   ?           ?           ?        two never had a True
#                one bin first True   ?     three bin first True
#                           zero bin first True
Run Code Online (Sandbox Code Playgroud)

我尝试过的

我可以遍历每个值,并跟踪关联的bin是否已经看到一个True值。

tracker = np.zeros(4, bool)
result = np.zeros(len(b), bool)

for i, (truth, bin_) in enumerate(zip(a, b)):
    tracker[bin_] |= truth
    result[i] = tracker[bin_]

result * 1

array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1])
Run Code Online (Sandbox Code Playgroud)

但是我希望有一个O(n)时间的Numpy解决方案。我可以选择使用像Numba这样的JIT包装器,但我宁愿保持它为Numpy。

use*_*203 2

O(n)


def cumulative_linear_seen(seen, bins):
    """
    Tracks whether or not a value has been observed as
    True in a 1D array, and marks all future values as
    True for these each individual value.

    Parameters
    ----------
    seen: ndarray
      One-hot array marking an occurence of a value
    bins: ndarray
      Array of bins to which occurences belong

    Returns
    -------
    One-hot array indicating if the corresponding bin has
    been observed at a point in time
    """

    # zero indexing won't work with logical and, need to 1-index
    one_up = bins + 1

    # Next step is finding where each unique value is seen
    occ = np.flatnonzero(a)
    v_obs = one_up[a]

    # We can fill another mapping array with these occurences.
    # then map by corresponding index
    i_obs = np.full(one_up.max() + 1, seen.shape[0] + 1)
    i_obs[v_obs] = occ

    # Finally, we create the map and compare to an array of
    # indices from the original seen array
    seen_idx = i_obs[one_up]

    return (seen_idx <= np.arange(seen_idx.shape[0])).astype(int)
Run Code Online (Sandbox Code Playgroud)

array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1])
Run Code Online (Sandbox Code Playgroud)

PiR的贡献

基于上述见解

r = np.arange(len(b))
one_hot = np.eye(b.max() + 1, dtype=bool)[b]
np.logical_or.accumulate(one_hot & a[:, None], axis=0)[r, b] * 1

array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1])
Run Code Online (Sandbox Code Playgroud)

较早的尝试

首先,这里有一个解决方案,虽然是矢量化的,但不是O (n)。我相信存在与此类似的 O(n) 解决方案,我将研究复杂性:-)


尝试1

q = b + 1

u = sparse.csr_matrix(
    (a, q, np.arange(a.shape[0] + 1)), (a.shape[0], q.max()+1)
)

m = np.maximum.accumulate(u.A) * np.arange(u.shape[1])
r = np.where(m[:, 1:] == 0, np.nan, m[:, 1:])

(r == q[:, None]).any(1).view(np.int8)
Run Code Online (Sandbox Code Playgroud)

array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1], dtype=int8)
Run Code Online (Sandbox Code Playgroud)

尝试2

q = b + 1
m = np.logical_and(a, q)
r = np.flatnonzero(u)
t = q[m]
f = np.zeros((a.shape[0], q.max()))
f[r, t-1] = 1
v = np.maximum.accumulate(f) * np.arange(1, q.max()+1)
(v == q[:, None]).any(1).view(np.int8)
Run Code Online (Sandbox Code Playgroud)

array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1], dtype=int8)
Run Code Online (Sandbox Code Playgroud)